Files + Infinite content in Gradio UI #148

xdevfaheem · 2025-02-22T15:28:10Z

Added support for files (PDF, XLSX, DOCX, etc) and/with unlimited content (via chunked generation and accumulation)

as progress bar takes up streamed audio box (preventing access), it feel better to show the generated audio finally all at once

FurkanGozukara · 2025-02-22T15:40:06Z

nice

darkacorn · 2025-02-22T16:16:28Z

chunking by wordcount is sub optimal .. gotta be by sentence separation aka !/ , . ; or by line what every is longer and still fit to stabilze generation

also needs a way to regen a chunk
maybe even audio judgement and auto regen via https://github.com/facebookresearch/audiobox-aesthetics

if we do batching / chunking .. lets do that right from the gecko

Ph0rk0z · 2025-02-22T16:23:20Z

This chunking was better: #101

xdevfaheem · 2025-02-22T17:49:18Z

chunking by wordcount is sub optimal .. gotta be by sentence separation aka !/ , . ; or by line what every is longer and still fit to stabilze generation

IMHO, I believe the current approach already respects sentence boundaries. Obviously, here we are splitting the parsed text into sensible sentences using SpaCy's LM and the chunking logic ensures that each chunk remains within the max_word_limit (which when exceeds above ~50 will likely produced inconsistent outputs) while maintaining sentence integrity.

I’d be happy to refine the approach further. Let me know what you think!

xdevfaheem · 2025-02-22T18:11:10Z

I believe, rather than having sentences cutoff in between (word level chunk), chunks having complete and optimal sized sentences would be better fit for chunked generation and yield better, consistent outputs (smoother transition between sentences via sub-second silence)

darkacorn · 2025-02-22T18:14:15Z

the best way to get smoth transition with similar ish vocalisation is to feed a part of the last gen into it as prefix .. with transcription prefix and cut that out after

(source - zonos devs)

xdevfaheem · 2025-02-22T18:23:42Z

the best way to get smoth transition with similar ish vocalisation is to feed a part of the last gen into it as prefix .. with transcription prefix and cut that out after

(source - zonos devs)

oh that makes perfect sense and sounds practical as well

xdevfaheem · 2025-02-22T20:09:02Z

I started trying to implement that. Boy, was i wrong! its not easy. and i didn't understand well wym by transcription prefix. do you meant samples to trim based on text prefix length? Is this why you mentioned?

also needs a way to regen a chunk

darkacorn · 2025-02-22T21:03:17Z

say last 1-2 words of the first generation go as start prefix audio into gen 2 -> also the text of it .. you just prefix it with that

after that last 1-2 words of gen 2 goes into 3 .. and so on and so forth

you will need an asr with word timestamps to know what to cut out

we hanging out in discord if you want to brainstorm - link is in the readme

InconsolableCellist · 2025-02-22T21:51:34Z

I tried doing chunking using the last part of a sentence as the prefix for the next but got really weird results. I discarded that code but maybe it can be made to work

xdevfaheem · 2025-02-23T04:13:44Z

say last 1-2 words of the first generation go as start prefix audio into gen 2 -> also the text of it .. you just prefix it with that

after that last 1-2 words of gen 2 goes into 3 .. and so on and so forth

you will need an asr with word timestamps to know what to cut out

we hanging out in discord if you want to brainstorm - link is in the readme

Got it, I'm in!

xdevfaheem · 2025-02-23T04:19:57Z

I tried doing chunking using the last part of a sentence as the prefix for the next but got really weird results. I discarded that code but maybe it can be made to work

Same here, I tried slicing last 3 seconds of gen1 to feed into gen2 prefix audio and trimmed it from the generated codes (as well as from post processed wav out), output is not pleasant. First sentence is fine, second audio chunk starts from second part of the second sentence and so on. After the first audio chunk, all the chunk starts from in between. There was murmuring too... Disaster...

darkacorn · 2025-02-23T11:14:42Z

did you prefix the text too ? with the part you feed in as prefix audio ? ping me in discord when you are .. in the mrdragonfox dude

xdevfaheem · 2025-02-23T12:39:39Z

did you prefix the text too ? with the part you feed in as prefix audio ? ping me in discord when you are .. in the mrdragonfox dude

I'm there, Faheem⚡

used asr for word timestamp to be used to get few word from previous text chunk and it's appropriate waveform to generate prefix codes (and cut that out later). So? probably, smoother and consistent transition with similarish vocalisation

xdevfaheem added 9 commits February 21, 2025 13:12

added 'spacy-layout' lib to dependencies

ecba91b

moved gradio_interface.py to its specific dir

4a1d624

Added file input and streaming support

ed3503a

added spacy language model to dependency

b2145fe

fix import

1a04675

bug fixes + tweaks

3bcfae5

update readme

dd7edbf

no streaming, accumulation+concat insteadg

94efa55

as progress bar takes up streamed audio box (preventing access), it feel better to show the generated audio finally all at once

cute little bug

5ab6c21

improved transition between audio chunks

3b2b147

used asr for word timestamp to be used to get few word from previous text chunk and it's appropriate waveform to generate prefix codes (and cut that out later). So? probably, smoother and consistent transition with similarish vocalisation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files + Infinite content in Gradio UI #148

Files + Infinite content in Gradio UI #148

xdevfaheem commented Feb 22, 2025

FurkanGozukara commented Feb 22, 2025

darkacorn commented Feb 22, 2025 •

edited

Loading

Ph0rk0z commented Feb 22, 2025

xdevfaheem commented Feb 22, 2025 •

edited

Loading

xdevfaheem commented Feb 22, 2025

darkacorn commented Feb 22, 2025

xdevfaheem commented Feb 22, 2025 •

edited

Loading

xdevfaheem commented Feb 22, 2025

darkacorn commented Feb 22, 2025

InconsolableCellist commented Feb 22, 2025

xdevfaheem commented Feb 23, 2025

xdevfaheem commented Feb 23, 2025

darkacorn commented Feb 23, 2025 •

edited

Loading

xdevfaheem commented Feb 23, 2025 •

edited

Loading

Files + Infinite content in Gradio UI #148

Are you sure you want to change the base?

Files + Infinite content in Gradio UI #148

Conversation

xdevfaheem commented Feb 22, 2025

FurkanGozukara commented Feb 22, 2025

darkacorn commented Feb 22, 2025 • edited Loading

Ph0rk0z commented Feb 22, 2025

xdevfaheem commented Feb 22, 2025 • edited Loading

xdevfaheem commented Feb 22, 2025

darkacorn commented Feb 22, 2025

xdevfaheem commented Feb 22, 2025 • edited Loading

xdevfaheem commented Feb 22, 2025

darkacorn commented Feb 22, 2025

InconsolableCellist commented Feb 22, 2025

xdevfaheem commented Feb 23, 2025

xdevfaheem commented Feb 23, 2025

darkacorn commented Feb 23, 2025 • edited Loading

xdevfaheem commented Feb 23, 2025 • edited Loading

darkacorn commented Feb 22, 2025 •

edited

Loading

xdevfaheem commented Feb 22, 2025 •

edited

Loading

xdevfaheem commented Feb 22, 2025 •

edited

Loading

darkacorn commented Feb 23, 2025 •

edited

Loading

xdevfaheem commented Feb 23, 2025 •

edited

Loading