Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/pyannote-offline/config.yaml'. Use repo_type argument if needed. #1800

Open
lucasmirachi opened this issue Nov 28, 2024 · 3 comments

Comments

@lucasmirachi
Copy link

Tested versions

  • Reproducible in the last version of pyannote.

System information

Ubuntu 24.04

Issue description

Hi,

I'm trying to use Pyannote for offline diarization without requiring a Hugging Face token. I'm integrating it through the WhisperX repository, following the Pyannote tutorial for offline usage. Specifically, I created a config.yaml file and downloaded the pytorch.bin model as instructed. However, I am encountering the following error when initializing the DiarizationPipeline:

ERROR: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan async with self.lifespan_context(app) as maybe_state: File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan async with original_context(app) as maybe_original_state: File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/workspace/transcription_service/main.py", line 33, in startup get_transcriptor() # Makes sure the transcriptor was initialized File "/workspace/transcription_service/services/whisperx.py", line 95, in get_transcriptor return WhisperXTranscriptor() File "/workspace/transcription_service/services/whisperx.py", line 57, in __init__ self.diarization_model = whisperx.DiarizationPipeline(device="cuda") File "/workspace/whisperx/whisperx/diarize.py", line 19, in __init__ self.model = Pipeline.from_pretrained(model_name) File "/opt/conda/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 88, in from_pretrained config_yml = hf_hub_download( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn validate_repo_id(arg_value) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id raise HFValidationError( huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/whisperx/whisperx/pyannote-offline/config.yaml'. Use repo_typeargument if needed.
The error seems to indicate that from_pretrained is expecting a repo_id in the form 'repo_name' or 'namespace/repo_name'. However, the config.yaml file's path is being passed instead. This issue may stem from how the WhisperX repository initializes the diarization pipeline, but I'm unsure if there's a workaround or adjustment needed on the Pyannote side for offline usage.

Question
Is this a configuration issue with the config.yaml file?
Should WhisperX handle this differently, or is there a specific way Pyannote should interpret the local model path for offline usage?
I would appreciate any guidance or advice on resolving this issue. Thank you!

Here are the contents of config.yaml:

`version: 3.1.0

pipeline:
name: pyannote.audio.pipelines.SpeakerDiarization
params:
clustering: AgglomerativeClustering
embedding: pyannote/wespeaker-voxceleb-resnet34-LM
embedding_batch_size: 32
embedding_exclude_overlap: true
segmentation: pyannote-offline/pytorch_model.bin
segmentation_batch_size: 32

params:
clustering:
method: centroid
min_cluster_size: 12
threshold: 0.7045654963945799
segmentation:
min_duration_off: 0.0
`

Minimal reproduction example (MRE)

whisperx

@FrenchKrab
Copy link
Contributor

Haven't ran things on my side, but your trace runs through this line:

try:
config_yml = hf_hub_download(

And to get to this line I think you need os.path.isfile(checkpoint_path) to be false (and checkpoint_path is your config file path).
The most likely culprit to me is that your config.yaml path (/home/ubuntu/whisperx/whisperx/pyannote-offline/config.yaml) might be wrong / does not exist.

@lucasmirachi
Copy link
Author

Thanks @FrenchKrab, I double checked the path and it was probalby wrong, but now I just got a different error message:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7ecd00a70e20>: Failed to resolve 'huggingface.co' ([Errno -3] Temporary failure in name resolution)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/requests/adapters.py", line 589, in send
    resp = conn.urlopen(
  File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/opt/conda/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /pyannote-offline/config.yaml/resolve/main/config.yaml (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ecd00a70e20>: Failed to resolve 'huggingface.co' ([Errno -3] Temporary failure in name resolution)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1376, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1296, in get_hf_file_metadata
    r = _request_wrapper(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 277, in _request_wrapper
    response = _request_wrapper(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 300, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
  File "/opt/conda/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 93, in send
    return super().send(request, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/requests/adapters.py", line 622, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError('HTTPSConnectionPool(host=\'huggingface.co\', port=443): Max retries exceeded with url: /pyannote-offline/config.yaml/resolve/main/config.yaml (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ecd00a70e20>: Failed to resolve \'huggingface.co\' ([Errno -3] Temporary failure in name resolution)"))'), '(Request ID: 2995acd9-4dda-4e3f-b327-25bedfb36471)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/workspace/transcription_service/main.py", line 33, in startup
    get_transcriptor()  # Makes sure the transcriptor was initialized
  File "/workspace/transcription_service/services/whisperx.py", line 95, in get_transcriptor
    return WhisperXTranscriptor()
  File "/workspace/transcription_service/services/whisperx.py", line 57, in __init__
    self.diarization_model = whisperx.DiarizationPipeline(device="cuda")
  File "/workspace/whisperx/whisperx/diarize.py", line 19, in __init__
    self.model = Pipeline.from_pretrained(model_name)
  File "/opt/conda/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 88, in from_pretrained
    config_yml = hf_hub_download(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 969, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1487, in _raise_on_head_call_error
    raise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

My machine does not have access to the internet, but according to the tutorial, it was supposed to work offline, right?

@FrenchKrab
Copy link
Contributor

I think it's still a problem with paths not existing. Would need MRE to properly debug.

I think you can set the environment variable HF_HUB_OFFLINE=1 to prevent it from going online. Maybe it will raise clearer issues to indicate which paths are incorrect ? (not sure, but it doesn't cost much to try)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants