-
-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/pyannote-offline/config.yaml'. Use repo_type
argument if needed.
#1800
Comments
Haven't ran things on my side, but your trace runs through this line: pyannote-audio/pyannote/audio/core/pipeline.py Lines 132 to 133 in 3e39edd
And to get to this line I think you need os.path.isfile(checkpoint_path) to be false (and checkpoint_path is your config file path).The most likely culprit to me is that your config.yaml path (/home/ubuntu/whisperx/whisperx/pyannote-offline/config.yaml ) might be wrong / does not exist.
|
Thanks @FrenchKrab, I double checked the path and it was probalby wrong, but now I just got a different error message:
My machine does not have access to the internet, but according to the tutorial, it was supposed to work offline, right? |
I think it's still a problem with paths not existing. Would need MRE to properly debug. I think you can set the environment variable |
Tested versions
System information
Ubuntu 24.04
Issue description
Hi,
I'm trying to use Pyannote for offline diarization without requiring a Hugging Face token. I'm integrating it through the WhisperX repository, following the Pyannote tutorial for offline usage. Specifically, I created a config.yaml file and downloaded the pytorch.bin model as instructed. However, I am encountering the following error when initializing the DiarizationPipeline:
ERROR: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan async with self.lifespan_context(app) as maybe_state: File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan async with original_context(app) as maybe_original_state: File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/workspace/transcription_service/main.py", line 33, in startup get_transcriptor() # Makes sure the transcriptor was initialized File "/workspace/transcription_service/services/whisperx.py", line 95, in get_transcriptor return WhisperXTranscriptor() File "/workspace/transcription_service/services/whisperx.py", line 57, in __init__ self.diarization_model = whisperx.DiarizationPipeline(device="cuda") File "/workspace/whisperx/whisperx/diarize.py", line 19, in __init__ self.model = Pipeline.from_pretrained(model_name) File "/opt/conda/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 88, in from_pretrained config_yml = hf_hub_download( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn validate_repo_id(arg_value) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id raise HFValidationError( huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/whisperx/pyannote-offline/config.yaml'. Use
repo_typeargument if needed.
The error seems to indicate that from_pretrained is expecting a repo_id in the form 'repo_name' or 'namespace/repo_name'. However, the config.yaml file's path is being passed instead. This issue may stem from how the WhisperX repository initializes the diarization pipeline, but I'm unsure if there's a workaround or adjustment needed on the Pyannote side for offline usage.
Question
Is this a configuration issue with the config.yaml file?
Should WhisperX handle this differently, or is there a specific way Pyannote should interpret the local model path for offline usage?
I would appreciate any guidance or advice on resolving this issue. Thank you!
Here are the contents of config.yaml:
`version: 3.1.0
pipeline:
name: pyannote.audio.pipelines.SpeakerDiarization
params:
clustering: AgglomerativeClustering
embedding: pyannote/wespeaker-voxceleb-resnet34-LM
embedding_batch_size: 32
embedding_exclude_overlap: true
segmentation: pyannote-offline/pytorch_model.bin
segmentation_batch_size: 32
params:
clustering:
method: centroid
min_cluster_size: 12
threshold: 0.7045654963945799
segmentation:
min_duration_off: 0.0
`
Minimal reproduction example (MRE)
whisperx
The text was updated successfully, but these errors were encountered: