huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/pyannote-offline/config.yaml'. Use `repo_type` argument if needed. #1800

lucasmirachi · 2024-11-28T23:04:07Z

Tested versions

Reproducible in the last version of pyannote.

System information

Ubuntu 24.04

Issue description

Hi,

I'm trying to use Pyannote for offline diarization without requiring a Hugging Face token. I'm integrating it through the WhisperX repository, following the Pyannote tutorial for offline usage. Specifically, I created a config.yaml file and downloaded the pytorch.bin model as instructed. However, I am encountering the following error when initializing the DiarizationPipeline:

ERROR: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan async with self.lifespan_context(app) as maybe_state: File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan async with original_context(app) as maybe_original_state: File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/workspace/transcription_service/main.py", line 33, in startup get_transcriptor() # Makes sure the transcriptor was initialized File "/workspace/transcription_service/services/whisperx.py", line 95, in get_transcriptor return WhisperXTranscriptor() File "/workspace/transcription_service/services/whisperx.py", line 57, in __init__ self.diarization_model = whisperx.DiarizationPipeline(device="cuda") File "/workspace/whisperx/whisperx/diarize.py", line 19, in __init__ self.model = Pipeline.from_pretrained(model_name) File "/opt/conda/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 88, in from_pretrained config_yml = hf_hub_download( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn validate_repo_id(arg_value) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id raise HFValidationError( huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/whisperx/pyannote-offline/config.yaml'. Use repo_typeargument if needed.
The error seems to indicate that from_pretrained is expecting a repo_id in the form 'repo_name' or 'namespace/repo_name'. However, the config.yaml file's path is being passed instead. This issue may stem from how the WhisperX repository initializes the diarization pipeline, but I'm unsure if there's a workaround or adjustment needed on the Pyannote side for offline usage.

Question
Is this a configuration issue with the config.yaml file?
Should WhisperX handle this differently, or is there a specific way Pyannote should interpret the local model path for offline usage?
I would appreciate any guidance or advice on resolving this issue. Thank you!

Here are the contents of config.yaml:

`version: 3.1.0

pipeline:
name: pyannote.audio.pipelines.SpeakerDiarization
params:
clustering: AgglomerativeClustering
embedding: pyannote/wespeaker-voxceleb-resnet34-LM
embedding_batch_size: 32
embedding_exclude_overlap: true
segmentation: pyannote-offline/pytorch_model.bin
segmentation_batch_size: 32

params:
clustering:
method: centroid
min_cluster_size: 12
threshold: 0.7045654963945799
segmentation:
min_duration_off: 0.0
`

Minimal reproduction example (MRE)

whisperx

The text was updated successfully, but these errors were encountered:

FrenchKrab · 2024-11-29T09:15:05Z

Haven't ran things on my side, but your trace runs through this line:

pyannote-audio/pyannote/audio/core/pipeline.py

Lines 132 to 133 in 3e39edd

    
           try: 
        
               config_yml = hf_hub_download(

And to get to this line I think you need os.path.isfile(checkpoint_path) to be false (and checkpoint_path is your config file path).
The most likely culprit to me is that your config.yaml path (/home/ubuntu/whisperx/whisperx/pyannote-offline/config.yaml) might be wrong / does not exist.

lucasmirachi · 2024-11-29T20:17:17Z

Thanks @FrenchKrab, I double checked the path and it was probalby wrong, but now I just got a different error message:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7ecd00a70e20>: Failed to resolve 'huggingface.co' ([Errno -3] Temporary failure in name resolution)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/requests/adapters.py", line 589, in send
    resp = conn.urlopen(
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/opt/conda/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /pyannote-offline/config.yaml/resolve/main/config.yaml (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ecd00a70e20>: Failed to resolve 'huggingface.co' ([Errno -3] Temporary failure in name resolution)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1376, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1296, in get_hf_file_metadata
    r = _request_wrapper(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 277, in _request_wrapper
    response = _request_wrapper(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 300, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
  File "/opt/conda/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 93, in send
    return super().send(request, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/requests/adapters.py", line 622, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError('HTTPSConnectionPool(host=\'huggingface.co\', port=443): Max retries exceeded with url: /pyannote-offline/config.yaml/resolve/main/config.yaml (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ecd00a70e20>: Failed to resolve \'huggingface.co\' ([Errno -3] Temporary failure in name resolution)"))'), '(Request ID: 2995acd9-4dda-4e3f-b327-25bedfb36471)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/workspace/transcription_service/main.py", line 33, in startup
    get_transcriptor()  # Makes sure the transcriptor was initialized
  File "/workspace/transcription_service/services/whisperx.py", line 95, in get_transcriptor
    return WhisperXTranscriptor()
  File "/workspace/transcription_service/services/whisperx.py", line 57, in __init__
    self.diarization_model = whisperx.DiarizationPipeline(device="cuda")
  File "/workspace/whisperx/whisperx/diarize.py", line 19, in __init__
    self.model = Pipeline.from_pretrained(model_name)
  File "/opt/conda/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 88, in from_pretrained
    config_yml = hf_hub_download(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 969, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1487, in _raise_on_head_call_error
    raise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

My machine does not have access to the internet, but according to the tutorial, it was supposed to work offline, right?

FrenchKrab · 2024-12-02T09:13:20Z

I think it's still a problem with paths not existing. Would need MRE to properly debug.

I think you can set the environment variable HF_HUB_OFFLINE=1 to prevent it from going online. Maybe it will raise clearer issues to indicate which paths are incorrect ? (not sure, but it doesn't cost much to try)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/pyannote-offline/config.yaml'. Use `repo_type` argument if needed. #1800

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/pyannote-offline/config.yaml'. Use `repo_type` argument if needed. #1800

lucasmirachi commented Nov 28, 2024

FrenchKrab commented Nov 29, 2024

lucasmirachi commented Nov 29, 2024

FrenchKrab commented Dec 2, 2024

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/pyannote-offline/config.yaml'. Use repo_type argument if needed. #1800

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/pyannote-offline/config.yaml'. Use repo_type argument if needed. #1800

Comments

lucasmirachi commented Nov 28, 2024

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

FrenchKrab commented Nov 29, 2024

lucasmirachi commented Nov 29, 2024

FrenchKrab commented Dec 2, 2024

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/pyannote-offline/config.yaml'. Use `repo_type` argument if needed. #1800

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/pyannote-offline/config.yaml'. Use `repo_type` argument if needed. #1800