Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InferenceClient] Provide a way to deal with content-type header when sending raw bytes #2706

Closed
Wauplin opened this issue Dec 11, 2024 · 4 comments · Fixed by #2826
Closed

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Dec 11, 2024

First reported by @freddyaboulton on slack (private).

For some tasks (e.g. automatic-speech-recognition) we are sending raw bytes in the HTTP request. On InferenceAPI, there is a "content-type-guess" logic to implicitly interpret the data that is sent. However, this logic can be flawed and moreover, the Inference Endpoints don't have this logic.

It currently possible to provide the Content-Type headers by providing a value when initializing the InferenceClient :

client = InferenceClient(url, headers={"Content-Type": "audio/mpeg"})
response = client.automatic_speech_recognition("audio.mp3")

However this is not well documented + it doesn't feel correct to initialize a client with a specific content type (it would mean that all requests made with the client have to send the same content type).


Opening the issue here but not clear to me what we'd like to do. A few solutions could be:

  • (low hanging fruit) keep what we have but document how to set content-type headers in each method docstring
  • infer the content type header based on file extension (but don't work if raw bytes are passed to the method)
  • add a parameter? (but how would it work with auto-generation?)

Open to suggestions on this

@kiansierra
Copy link

kiansierra commented Jan 31, 2025

Hi there.
I received the following warning, when calling directly the post endpoint for ASR

/home/kian/coding/rag-app/rag-app-backend/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'post' (from 'huggingface_hub.inference._client') is deprecated and will be removed from version '0.31.0'. Making direct POST requests to the inference server is not supported anymore. Please use task methods instead (e.g. InferenceClient.chat_completion). If your use case is not supported, please open an issue in https://github.com/huggingface/huggingface_hub.

Currently my call to the endpoint follows the approach of

client = InferenceClient()
model = "openai/whisper-large-v3-turbo"
response = client.post(
    json={
        "inputs": base64.b64encode(open(file_path, "rb").read()).decode(),
        "parameters": {"return_timestamps": True},
    },
    model=model,
    task="automatic-speech-recognition",
)
response_data = json.loads(response)

This is because the endpoint currently doesn't support parameters https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_client.py#L461-L502
I believe a simple change to the endpoint to accept parameters as a dict, should solve this, this could be also implemented in the above, to ensure the headers passed when calling the endpoint override the ones from the instance

@kiansierra
Copy link

I created a fast PR here kiansierra#1
and an example on how that would work https://github.com/kiansierra/huggingface_hub/blob/dae9625fed13668020504c3fbace2d9b4f18791c/test.ipynb.
This would also tie in with #2800 @Wauplin @hanouticelina

@julien-c
Copy link
Member

julien-c commented Feb 3, 2025

maybe solved by #2812 + #2821, @kiansierra?

@kiansierra
Copy link

kiansierra commented Feb 3, 2025

Almost @julien-c , @hanouticelina
The signature is still

    def automatic_speech_recognition(
        self,
        audio: ContentT,
        *,
        model: Optional[str] = None,
    ) -> AutomaticSpeechRecognitionOutput:

I believe if the same modification was applied it would almost be there

    def automatic_speech_recognition(
        self,
        audio: ContentT,
        *,
        model: Optional[str] = None,
        extra_body: Optional[Dict[str, Any]] = None,
    ) -> AutomaticSpeechRecognitionOutput:
        """
        Perform automatic speech recognition (ASR or audio-to-text) on the given audio content.

        Args:
            audio (Union[str, Path, bytes, BinaryIO]):
                The content to transcribe. It can be raw audio bytes, local audio file, or a URL to an audio file.
            model (`str`, *optional*):
                The model to use for ASR. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
                Inference Endpoint. If not provided, the default recommended model for ASR will be used.
        Returns:
            [`AutomaticSpeechRecognitionOutput`]: An item containing the transcribed text and optionally the timestamp chunks.

        Raises:
            [`InferenceTimeoutError`]:
                If the model is unavailable or the request times out.
            `HTTPError`:
                If the request fails with an HTTP error status code other than HTTP 503.

        Example:
        ```py
        >>> from huggingface_hub import InferenceClient
        >>> client = InferenceClient()
        >>> client.automatic_speech_recognition("hello_world.flac").text
        "hello world"
        ```
        """
        provider_helper = get_provider_helper(self.provider, task="automatic-speech-recognition")
        request_parameters = provider_helper.prepare_request(
            inputs=audio,
            parameters=extra_body or {},
            headers=self.headers,
            model=model or self.model,
            api_key=self.token,
        )
        response = self._inner_post(request_parameters)
        return AutomaticSpeechRecognitionOutput.parse_obj_as_instance(response)

The issue later is that while the response does have the timestamps, once it is parsed the returning object has nulls for them.

import requests 
from huggingface_hub import InferenceClient
client = InferenceClient()
response = requests.get("https://rss.art19.com/episodes/d826f550-8547-4552-9c11-396c4537b349.mp3")
response.raise_for_status()
with open('test.mp3', "wb") as f:
    f.write(response.content)
result = client.automatic_speech_recognition('test.mp3',
                                              extra_body={'return_timestamps':True},
                                              model="openai/whisper-large-v3-turbo")
result.chunks

### Output 

[AutomaticSpeechRecognitionOutputChunk(text=" It's the Word of the Day podcast for January 31st.", timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=" Today's word is encroach, spelled E-N-C-R-O-A-C-H.", timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' Encroach is a verb.', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' To encroach is to gradually move or go into an area that is beyond the usual or desired', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' limits, or to gradually take or begin to use or affect something that belongs to or is being used', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=" by someone else. Encroach is often followed by the words on or upon. Here's the word used in a", timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' sentence from the Hollywood Reporter. In their young adult years, Mufasa and Taka find their', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' courage and loyalty tested when a group of white lions encroach upon the pride. The history behind', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' the word encroach is likely to hook you in. The word comes from the Middle English verb encrochen,', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' which means to get or seize. The Anglo-French predecessor of encrochen is encroché, which was', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' formed by combining the prefix en-i-n, meaning in, with the noun coche, meaning hook. Coche is also', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' an ancestor of our word crochet. That word first referred to a crochet hook or to the needlework', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' done with it. Encroach carries the meaning of intrude, both in terms of privilege and property.', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' The word can also hop over legal barriers to describe a general advancement beyond desirable', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' or normal limits, such as a hurricane that encroaches on the mainland.', timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=" With your Word of the Day, I'm Peter Sokolowski.", timestamps=None),
 AutomaticSpeechRecognitionOutputChunk(text=' visit merriamwebster.com today for definitions wordplay and trending word lookups', timestamps=None)]

While

from huggingface_hub.inference._providers import PROVIDER_T, HFInferenceTask, get_provider_helper
import json 
provider_helper = get_provider_helper(client.provider, task="automatic-speech-recognition")
request_parameters = provider_helper.prepare_request(
    inputs='test.mp3',
    parameters={'return_timestamps':True},
    headers=client.headers,
    model="openai/whisper-large-v3-turbo",
    api_key=client.token,
)
response = client._inner_post(request_parameters)
json.loads(response)
### Output
{'text': " It's the Word of the Day podcast for January 31st. Today's word is encroach, spelled E-N-C-R-O-A-C-H. Encroach is a verb. To encroach is to gradually move or go into an area that is beyond the usual or desired limits, or to gradually take or begin to use or affect something that belongs to or is being used by someone else. Encroach is often followed by the words on or upon. Here's the word used in a sentence from the Hollywood Reporter. In their young adult years, Mufasa and Taka find their courage and loyalty tested when a group of white lions encroach upon the pride. The history behind the word encroach is likely to hook you in. The word comes from the Middle English verb encrochen, which means to get or seize. The Anglo-French predecessor of encrochen is encroché, which was formed by combining the prefix en-i-n, meaning in, with the noun coche, meaning hook. Coche is also an ancestor of our word crochet. That word first referred to a crochet hook or to the needlework done with it. Encroach carries the meaning of intrude, both in terms of privilege and property. The word can also hop over legal barriers to describe a general advancement beyond desirable or normal limits, such as a hurricane that encroaches on the mainland. With your Word of the Day, I'm Peter Sokolowski. visit merriamwebster.com today for definitions wordplay and trending word lookups",
 'chunks': [{'timestamp': [0.0, 9.16],
   'text': " It's the Word of the Day podcast for January 31st."},
  {'timestamp': [11.48, 17.04],
   'text': " Today's word is encroach, spelled E-N-C-R-O-A-C-H."},
  {'timestamp': [17.28, 18.34], 'text': ' Encroach is a verb.'},
  {'timestamp': [18.8, 24.56],
   'text': ' To encroach is to gradually move or go into an area that is beyond the usual or desired'},
  {'timestamp': [0.0, 6.5],
   'text': ' limits, or to gradually take or begin to use or affect something that belongs to or is being used'},
  {'timestamp': [6.5, 13.4],
   'text': " by someone else. Encroach is often followed by the words on or upon. Here's the word used in a"},
  {'timestamp': [13.4, 20.18],
   'text': ' sentence from the Hollywood Reporter. In their young adult years, Mufasa and Taka find their'},
  {'timestamp': [20.18, 27.12],
   'text': ' courage and loyalty tested when a group of white lions encroach upon the pride. The history behind'},
  {'timestamp': [0.0, 6.4],
   'text': ' the word encroach is likely to hook you in. The word comes from the Middle English verb encrochen,'},
  {'timestamp': [6.94, 14.18],
   'text': ' which means to get or seize. The Anglo-French predecessor of encrochen is encroché, which was'},
  {'timestamp': [14.18, 21.9],
   'text': ' formed by combining the prefix en-i-n, meaning in, with the noun coche, meaning hook. Coche is also'},
  {'timestamp': [21.9, 28.26],
   'text': ' an ancestor of our word crochet. That word first referred to a crochet hook or to the needlework'},
  {'timestamp': [0.0, 6.38],
...
   'text': ' or normal limits, such as a hurricane that encroaches on the mainland.'},
  {'timestamp': [18.98, 20.88],
   'text': " With your Word of the Day, I'm Peter Sokolowski."},
  {'timestamp': [0.0, 9.96],
   'text': ' visit merriamwebster.com today for definitions wordplay and trending word lookups'}]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants