-
Notifications
You must be signed in to change notification settings - Fork 11
Added option to manually choose model size #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Added option for cli/gui to manually define model size.
|
Though in current version a user can manually force model size through creating a file, it's not obvious and is also populated automatically (if I'm understanding). I found it very useful to be able to choose the model size, as in my casual testing "medium" performs quite well on English language while performing several times as fast, and even small and tiny often do pretty good at very high speeds. |
Prevents the following error on Windows: UnicodeEncodeError: 'charmap' codec can't encode character '\u0131' in position 19: character maps to <undefined>
synexo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoids error such as below:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Python311\Scripts\subtitler_main.py", line 7, in
File "C:\Python311\Lib\site-packages\subtitler_util\subtitler.py", line 294, in main
gui()
File "C:\Python311\Lib\site-packages\gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\subtitler_util\subtitler.py", line 287, in gui
process_args(args)
File "C:\Python311\Lib\site-packages\subtitler_util\subtitler.py", line 239, in process_args
subtitle(vid_file_map,audio_files,args.video_language,args.translation_languages, translation_service=args.translation_service, translation_service_api_key=args.translation_service_api_key, model_size=args.model_size, mode=args.mode)
File "C:\Python311\Lib\site-packages\subtitler_util\subtitler.py", line 215, in subtitle
saved_file = save_result_as_srt(r,video_language,vid_file_map[audio_file],True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\subtitler_util\subtitler.py", line 168, in save_result_as_srt
f.write(result_obj["text"].encode('cp850','replace').decode('cp850'))
File "C:\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u0131' in position 19: character maps to
Thanks for fixing this bug. I don't use Windows. So, having someone actually use it and figure out issues with encoding on Windows is SUPER DUPER HELPFUL!! THANKS AGAIN! |
The original idea was to let the system choose the largest possible model it can use without overwhelming the system's main-memory, i.e VRAM if you're using GPU/CUDA or RAM if you're using CPU... Might be cool if we let the user choose if they want to 'choose' their model themselves at the start or let the app try and decide the largest one that works for them... I'll see if I can enhance and patch the code to do that. |
anupamkumar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super! Thanks for finding and fixing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original idea was to let the system choose the largest possible model it can use without overwhelming the system's main-memory, i.e VRAM if you're using GPU/CUDA or RAM if you're using CPU...
Might be cool if we let the user choose if they want to 'choose' their model themselves at the start or let the app try and decide the largest one that works for them... At the very least if the user is given the choice of choosing the model .. that would atleast warrant that we very briefly explain what each model type can do for then and what the trade-offs are...
What are your thoughts ?
No problem and thank you very much for this software, I've probably used it for over 1,000 TV episodes and movies at this point. Fixed this bug because if it was hit while processing a directory the whole batch would stop. Appreciate all your work! |
The way I added the "model_size" option is meant to fall back to your original behavior of loading the largest model the machine can handle if an option isn't chosen. Maybe naming it "force_model_size" instead would make that more obvious. More explanation certainly wouldn't hurt, even something along the lines of "larger model sizes will generally produce better results at the expense of longer processing time". For my personal use I was simply running these on my laptop w/ a GTX 1660 Ti, using CUDA. By default the "large-v3" model was being used. I didn't test for hard numbers, but videos in the ~ 23 minute range were taking well over an hour to transcribe. After doing some research and coming across info such as the following: It seemed worthwhile to try a smaller model size. As you can see there, for English at least even the "tiny" model outperforms the "large" model on some other languages like French. After some ad-hoc testing of my own, I did find that "small" and "tiny" didn't seem to perform as well on lower quality audio (I'm using for a lot of old TV shows without great sources), but didn't observe any meaningful benefit to using a model bigger than "medium". In terms of speed, "medium" performed for me a bit faster than real-time, "small" better than 2x real-time, and "tiny" processes an ~ 23 minute video in < 2 minutes. If the audio quality is good, "tiny" still performs pretty well, better than the auto-captioning I've seen on youtube for instance. |
Added option for cli/gui to manually define model size.