New here, I have questions: 1) what are the green and red things? 2) Do you have prompt and parap examples? 3) Did you figure out how to use SFT to get good resuls? #46
Replies: 1 comment 2 replies
-
The red areas simply indicate a selected area for repainting function (use base model).
I didn't keep the JSON files because the WebUI didn't even exist; I made the music using command lines during dev! Dice -> [then "Inspire" button for the one with only Caption filled] -> Compose -> Synthesize -> Listen "Compose" launch a LLM inference, "Synthesize" launch a DiT inference.
The base model is used for Repaint, and when you open an MP3 for understanding mode (it generates metadata with more or less hallucinatory lyrics that often stick to the syllabic rhythm, the model is not a speech recognition system).
It's not comparable; in fact, Ace-Step's LLM is specialized in generating the audio codes that guide the DiT. It has been trained on tons of metadata versus music; it's a composer.
You really need to refer to the original Ace-Step project documentation to understand the principles; I will also improve my documentation as I go along! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi.

So in this image:
Thanks
Beta Was this translation helpful? Give feedback.
All reactions