Skip to content

Commit 86fec7d

Browse files
Add ACE Step notebook (#3051)
Co-authored-by: Aleksandr Mokrov <[email protected]>
1 parent b20c018 commit 86fec7d

File tree

7 files changed

+2762
-1
lines changed

7 files changed

+2762
-1
lines changed

.ci/ignore_treon_docker.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,4 +90,5 @@ notebooks/kokoro/kokoro.ipynb
9090
notebooks/qwen2.5-omni-chatbot/qwen2.5-omni-chatbot.ipynb
9191
notebooks/intern-video2-classiciation/intern-video2-classification.ipynb
9292
notebooks/flex.2-image-generation/flex.2-image-generation.ipynb
93-
notebooks/wan2.1-text-to-video/wan2.1-text-to-video.ipynb
93+
notebooks/wan2.1-text-to-video/wan2.1-text-to-video.ipynb
94+
notebooks/ace-step-music-generation/ace-step-music-generation.ipynb

.ci/skipped_notebooks.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -574,3 +574,9 @@
574574
skips:
575575
- os:
576576
- macos-13
577+
- notebook: notebooks/ace-step-music-generation/ace-step-music-generation.ipynb
578+
skips:
579+
- os:
580+
- macos-13
581+
- ubuntu-22.04
582+
- windows-2022

.ci/spellcheck/.pyspelling.wordlist.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ autogenerated
5252
AutoModelForXxx
5353
autoregressive
5454
autoregressively
55+
AutoEncoder
56+
AutoEncoders
5557
AutoTokenizer
5658
AWQ
5759
awq
@@ -201,6 +203,7 @@ denoises
201203
denoising
202204
denormalization
203205
denormalized
206+
demucs
204207
depainting
205208
deployable
206209
DepthAnything
@@ -231,6 +234,7 @@ DIT
231234
DiT
232235
DiT’s
233236
DiT’s
237+
DiTs
234238
DL
235239
DocLayNet
236240
docling
@@ -291,6 +295,8 @@ FastDraft
291295
FastSAM
292296
FC
293297
feedforward
298+
FeedForward
299+
FFN
294300
FFmpeg
295301
FIL
296302
FEIL
@@ -608,6 +614,7 @@ MRPC
608614
mRoPE
609615
msi
610616
MTVQA
617+
mT
611618
multiarchitecture
612619
Multiclass
613620
multiclass
@@ -705,6 +712,7 @@ opset
705712
optimizable
706713
Orca
707714
otsl
715+
OSNet
708716
OTSL
709717
OuteTTS
710718
outpainting
@@ -780,6 +788,7 @@ PowerShell
780788
PPYOLOv
781789
PR
782790
Prateek
791+
PLR
783792
pre
784793
Precisions
785794
precomputed
@@ -945,6 +954,7 @@ SmolVLM
945954
softmax
946955
softvc
947956
SoftVC
957+
SongGen
948958
SOTA
949959
SoTA
950960
soundfile
@@ -1125,6 +1135,7 @@ Vladlen
11251135
VOC
11261136
Vocoder
11271137
vocoder
1138+
vocoding
11281139
VQ
11291140
VQA
11301141
VQGAN
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Music generation using ACE Step and OpenVINO
2+
3+
[ACE-Step](https://ace-step.github.io/) is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches and achieves state-of-the-art performance through a holistic architectural design. Current methods face inherent trade-offs between generation speed, musical coherence, and controllability. ACE-Step bridges this gap by integrating diffusion-based generation with Sana’s Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer. The model achieving superior musical coherence and lyric alignment across melody, harmony, and rhythm metrics. Moreover, ACE-Step preserves fine-grained acoustic details, enabling advanced control mechanisms such as voice cloning, lyric editing, remixing, and track generation (e.g., lyric2vocal, singing2accompaniment).
4+
5+
ACE-Step adapts a text-to-image diffusion framework for music generation. The core generative model is a diffusion model operating on a compressed mel spectrogram latent representation. This process is guided by conditioning information from three specialized encoders: a text prompt encoder, a lyric encoder, and a speaker encoder. Embeddings from these encoders are concatenated and integrated into the diffusion model via cross-attention mechanisms
6+
7+
ACE-Step can be used for generating original music from text descriptions, music remixing and style transfer, edit song lyrics. The model offers a set of controllable features that allow users to precisely control the generation process and enable targeted modifications to existing audio material, as well as perform specialized generation tasks through fine-tuning.
8+
9+
<img src="https://raw.githubusercontent.com/ACE-Step/ACE-Step/main/assets/ACE-Step_framework.png" width=90% style="display: block; margin: auto;" />
10+
11+
More details about the model can be found using the following resources: [project page](https://ace-step.github.io/), [paper](https://arxiv.org/abs/2506.00045), [original repository](https://github.com/ace-step/ACE-Step).
12+
13+
14+
## Notebook Contents
15+
16+
This notebook demonstrates how to convert and run music generation or editing with ACE Step using OpenVINO.
17+
18+
The tutorial consists of the following steps:
19+
20+
- Install prerequisites
21+
- Download and run inference of ACE Step model
22+
- Convert the model to IR format and run inference with OpenVINO
23+
- Download, apply and generate audio with LoRA
24+
- Interactive demo
25+
26+
27+
## Installation Instructions
28+
29+
This is a self-contained example that relies solely on its own code.</br>
30+
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
31+
For details, please refer to [Installation Guide](../../README.md).
32+
33+
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/ace-step-music-generation/README.md" />

notebooks/ace-step-music-generation/ace-step-music-generation.ipynb

Lines changed: 881 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)