When a non-english dub exists of an anime, it can still be transcribed with the English model. 

Example: If there is only a Japanese audio file then the Japanese audio file is transcribed as English and saved to the eSRT file. 

The issue could be solved by checking for multiple streams and by checking for English streams.


This code block expects an error thrown from FFMPEG
https://github.com/Juxsta/whisper_grpc/blob/e5a1690a770a828127fc28217360ce2aee75f2a5/whisper_grpc/transcribe.py#L20-L28

The error should be thrown here
https://github.com/Juxsta/whisper_grpc/blob/e5a1690a770a828127fc28217360ce2aee75f2a5/whisper_grpc/transcribe.py#L35-L43

--Except, I'm pretty sure this [library](https://github.com/jonghwanhyeon/python-ffmpeg) doesn't support English-stream identification
https://github.com/Juxsta/whisper_grpc/blob/e5a1690a770a828127fc28217360ce2aee75f2a5/Pipfile#L11

suggested actions:
- [ ] migrate to working(?) [library](https://github.com/kkroening/ffmpeg-python)
- [ ] properly probe file for English audio streams using language meta-data
- [ ] make sure exception is properly thrown

	try:
	output_file = f'{tmpdir}/{file}.wav'
	transcribe_audio_file(file, output_file)
	result = whisper.transcribe(
	model, output_file, beam_size=5, best_of=5, verbose=logger.getEffectiveLevel() <= logging.DEBUG,
	language="en")
	except ffmpeg.Error as e:
	logger.error(f'No English audio track found in {file}, error: {e.stderr}')
	raise ValueError(f'No English audio track found in {file}')

	def transcribe_audio_file(input_file, output_file):
	(
	ffmpeg
	.input(input_file)
	.audio(metadata='language=eng')
	.output(output_file, output_format='mp3')
	.overwrite_output()
	.run()
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When a non-english dub exists of an anime, it can still be transcribed with the English model. #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

When a non-english dub exists of an anime, it can still be transcribed with the English model. #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions