Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .ci/ignore_treon_docker.txt
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,5 @@ notebooks/kokoro/kokoro.ipynb
notebooks/qwen2.5-omni-chatbot/qwen2.5-omni-chatbot.ipynb
notebooks/intern-video2-classiciation/intern-video2-classification.ipynb
notebooks/flex.2-image-generation/flex.2-image-generation.ipynb
notebooks/wan2.1-text-to-video/wan2.1-text-to-video.ipynb
notebooks/wan2.1-text-to-video/wan2.1-text-to-video.ipynb
notebooks/ace-step-music-generation/ace-step-music-generation.ipynb
6 changes: 6 additions & 0 deletions .ci/skipped_notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -574,3 +574,9 @@
skips:
- os:
- macos-13
- notebook: notebooks/ace-step-music-generation/ace-step-music-generation.ipynb
skips:
- os:
- macos-13
- ubuntu-22.04
- windows-2022
11 changes: 11 additions & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ autogenerated
AutoModelForXxx
autoregressive
autoregressively
AutoEncoder
AutoEncoders
AutoTokenizer
AWQ
awq
Expand Down Expand Up @@ -201,6 +203,7 @@ denoises
denoising
denormalization
denormalized
demucs
depainting
deployable
DepthAnything
Expand Down Expand Up @@ -231,6 +234,7 @@ DIT
DiT
DiT’s
DiT’s
DiTs
DL
DocLayNet
docling
Expand Down Expand Up @@ -291,6 +295,8 @@ FastDraft
FastSAM
FC
feedforward
FeedForward
FFN
FFmpeg
FIL
FEIL
Expand Down Expand Up @@ -608,6 +614,7 @@ MRPC
mRoPE
msi
MTVQA
mT
multiarchitecture
Multiclass
multiclass
Expand Down Expand Up @@ -705,6 +712,7 @@ opset
optimizable
Orca
otsl
OSNet
OTSL
OuteTTS
outpainting
Expand Down Expand Up @@ -780,6 +788,7 @@ PowerShell
PPYOLOv
PR
Prateek
PLR
pre
Precisions
precomputed
Expand Down Expand Up @@ -945,6 +954,7 @@ SmolVLM
softmax
softvc
SoftVC
SongGen
SOTA
SoTA
soundfile
Expand Down Expand Up @@ -1125,6 +1135,7 @@ Vladlen
VOC
Vocoder
vocoder
vocoding
VQ
VQA
VQGAN
Expand Down
33 changes: 33 additions & 0 deletions notebooks/ace-step-music-generation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Music generation using ACE Step and OpenVINO

[ACE-Step](https://ace-step.github.io/) is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches and achieves state-of-the-art performance through a holistic architectural design. Current methods face inherent trade-offs between generation speed, musical coherence, and controllability. ACE-Step bridges this gap by integrating diffusion-based generation with Sana’s Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer. The model achieving superior musical coherence and lyric alignment across melody, harmony, and rhythm metrics. Moreover, ACE-Step preserves fine-grained acoustic details, enabling advanced control mechanisms such as voice cloning, lyric editing, remixing, and track generation (e.g., lyric2vocal, singing2accompaniment).

ACE-Step adapts a text-to-image diffusion framework for music generation. The core generative model is a diffusion model operating on a compressed mel spectrogram latent representation. This process is guided by conditioning information from three specialized encoders: a text prompt encoder, a lyric encoder, and a speaker encoder. Embeddings from these encoders are concatenated and integrated into the diffusion model via cross-attention mechanisms

ACE-Step can be used for generating original music from text descriptions, music remixing and style transfer, edit song lyrics. The model offers a set of controllable features that allow users to precisely control the generation process and enable targeted modifications to existing audio material, as well as perform specialized generation tasks through fine-tuning.

<img src="https://raw.githubusercontent.com/ACE-Step/ACE-Step/main/assets/ACE-Step_framework.png" width=90% style="display: block; margin: auto;" />

More details about the model can be found using the following resources: [project page](https://ace-step.github.io/), [paper](https://arxiv.org/abs/2506.00045), [original repository](https://github.com/ace-step/ACE-Step).


## Notebook Contents

This notebook demonstrates how to convert and run music generation or editing with ACE Step using OpenVINO.

The tutorial consists of the following steps:

- Install prerequisites
- Download and run inference of ACE Step model
- Convert the model to IR format and run inference with OpenVINO
- Download, apply and generate audio with LoRA
- Interactive demo


## Installation Instructions

This is a self-contained example that relies solely on its own code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).

<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/ace-step-music-generation/README.md" />
881 changes: 881 additions & 0 deletions notebooks/ace-step-music-generation/ace-step-music-generation.ipynb

Large diffs are not rendered by default.

Loading
Loading