openvinotoolkit · aleksandr-mokrov · Aug 26, 2025 · Aug 18, 2025 · Aug 25, 2025 · Aug 26, 2025
diff --git a/.ci/ignore_treon_docker.txt b/.ci/ignore_treon_docker.txt
@@ -90,4 +90,5 @@ notebooks/kokoro/kokoro.ipynb
 notebooks/qwen2.5-omni-chatbot/qwen2.5-omni-chatbot.ipynb
 notebooks/intern-video2-classiciation/intern-video2-classification.ipynb
 notebooks/flex.2-image-generation/flex.2-image-generation.ipynb
-notebooks/wan2.1-text-to-video/wan2.1-text-to-video.ipynb
+notebooks/wan2.1-text-to-video/wan2.1-text-to-video.ipynb
+notebooks/ace-step-music-generation/ace-step-music-generation.ipynb
diff --git a/.ci/skipped_notebooks.yml b/.ci/skipped_notebooks.yml
@@ -574,3 +574,9 @@
   skips:
     - os:
         - macos-13
+- notebook: notebooks/ace-step-music-generation/ace-step-music-generation.ipynb
+  skips:
+    - os:
+        - macos-13
+        - ubuntu-22.04
+        - windows-2022
diff --git a/.ci/spellcheck/.pyspelling.wordlist.txt b/.ci/spellcheck/.pyspelling.wordlist.txt
@@ -52,6 +52,8 @@ autogenerated
 AutoModelForXxx
 autoregressive
 autoregressively
+AutoEncoder
+AutoEncoders
 AutoTokenizer
 AWQ
 awq
@@ -201,6 +203,7 @@ denoises
 denoising
 denormalization
 denormalized
+demucs
 depainting
 deployable
 DepthAnything
@@ -231,6 +234,7 @@ DIT
 DiT
 DiT’s
 DiT’s
+DiTs
 DL
 DocLayNet
 docling
@@ -291,6 +295,8 @@ FastDraft
 FastSAM
 FC
 feedforward
+FeedForward
+FFN
 FFmpeg
 FIL
 FEIL
@@ -608,6 +614,7 @@ MRPC
 mRoPE
 msi
 MTVQA
+mT
 multiarchitecture
 Multiclass
 multiclass
@@ -705,6 +712,7 @@ opset
 optimizable
 Orca
 otsl
+OSNet
 OTSL
 OuteTTS
 outpainting
@@ -780,6 +788,7 @@ PowerShell
 PPYOLOv
 PR
 Prateek
+PLR
 pre
 Precisions
 precomputed
@@ -945,6 +954,7 @@ SmolVLM
 softmax
 softvc
 SoftVC
+SongGen
 SOTA
 SoTA
 soundfile
@@ -1125,6 +1135,7 @@ Vladlen
 VOC
 Vocoder
 vocoder
+vocoding
 VQ
 VQA
 VQGAN

diff --git a/notebooks/ace-step-music-generation/README.md b/notebooks/ace-step-music-generation/README.md
@@ -0,0 +1,33 @@
+# Music generation using ACE Step and OpenVINO
+
+[ACE-Step](https://ace-step.github.io/) is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches and achieves state-of-the-art performance through a holistic architectural design. Current methods face inherent trade-offs between generation speed, musical coherence, and controllability. ACE-Step bridges this gap by integrating diffusion-based generation with Sana’s Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer. The model achieving superior musical coherence and lyric alignment across melody, harmony, and rhythm metrics. Moreover, ACE-Step preserves fine-grained acoustic details, enabling advanced control mechanisms such as voice cloning, lyric editing, remixing, and track generation (e.g., lyric2vocal, singing2accompaniment). 
+
+ACE-Step adapts a text-to-image diffusion framework for music generation. The core generative model is a diffusion model operating on a compressed mel spectrogram latent representation. This process is guided by conditioning information from three specialized encoders: a text prompt encoder, a lyric encoder, and a speaker encoder. Embeddings from these encoders are concatenated and integrated into the diffusion model via cross-attention mechanisms
+
+ACE-Step can be used for generating original music from text descriptions, music remixing and style transfer, edit song lyrics. The model offers a set of controllable features that allow users to precisely control the generation process and enable targeted modifications to existing audio material, as well as perform specialized generation tasks through fine-tuning.
+
+<img src="https://raw.githubusercontent.com/ACE-Step/ACE-Step/main/assets/ACE-Step_framework.png" width=90% style="display: block; margin: auto;" />
+
+More details about the model can be found using the following resources: [project page](https://ace-step.github.io/), [paper](https://arxiv.org/abs/2506.00045), [original repository](https://github.com/ace-step/ACE-Step).
+
+
+## Notebook Contents
+
+This notebook demonstrates how to convert and run music generation or editing with ACE Step using OpenVINO.
+
+The tutorial consists of the following steps:
+
+- Install prerequisites
+- Download and run inference of ACE Step model
+- Convert the model to IR format and run inference with OpenVINO
+- Download, apply and generate audio with LoRA
+- Interactive demo
+
+
+## Installation Instructions
+
+This is a self-contained example that relies solely on its own code.</br>
+We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
+For details, please refer to [Installation Guide](../../README.md).
+
+<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/ace-step-music-generation/README.md" />
diff --git a/notebooks/ace-step-music-generation/ace-step-music-generation.ipynb b/notebooks/ace-step-music-generation/ace-step-music-generation.ipynb