-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InferenceClient] Provide a way to deal with content-type header when sending raw bytes #2706
Comments
Hi there.
Currently my call to the endpoint follows the approach of client = InferenceClient()
model = "openai/whisper-large-v3-turbo"
response = client.post(
json={
"inputs": base64.b64encode(open(file_path, "rb").read()).decode(),
"parameters": {"return_timestamps": True},
},
model=model,
task="automatic-speech-recognition",
)
response_data = json.loads(response) This is because the endpoint currently doesn't support parameters https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_client.py#L461-L502 |
I created a fast PR here kiansierra#1 |
maybe solved by #2812 + #2821, @kiansierra? |
Almost @julien-c , @hanouticelina def automatic_speech_recognition(
self,
audio: ContentT,
*,
model: Optional[str] = None,
) -> AutomaticSpeechRecognitionOutput: I believe if the same modification was applied it would almost be there def automatic_speech_recognition(
self,
audio: ContentT,
*,
model: Optional[str] = None,
extra_body: Optional[Dict[str, Any]] = None,
) -> AutomaticSpeechRecognitionOutput:
"""
Perform automatic speech recognition (ASR or audio-to-text) on the given audio content.
Args:
audio (Union[str, Path, bytes, BinaryIO]):
The content to transcribe. It can be raw audio bytes, local audio file, or a URL to an audio file.
model (`str`, *optional*):
The model to use for ASR. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
Inference Endpoint. If not provided, the default recommended model for ASR will be used.
Returns:
[`AutomaticSpeechRecognitionOutput`]: An item containing the transcribed text and optionally the timestamp chunks.
Raises:
[`InferenceTimeoutError`]:
If the model is unavailable or the request times out.
`HTTPError`:
If the request fails with an HTTP error status code other than HTTP 503.
Example:
```py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.automatic_speech_recognition("hello_world.flac").text
"hello world"
```
"""
provider_helper = get_provider_helper(self.provider, task="automatic-speech-recognition")
request_parameters = provider_helper.prepare_request(
inputs=audio,
parameters=extra_body or {},
headers=self.headers,
model=model or self.model,
api_key=self.token,
)
response = self._inner_post(request_parameters)
return AutomaticSpeechRecognitionOutput.parse_obj_as_instance(response) The issue later is that while the response does have the timestamps, once it is parsed the returning object has nulls for them. import requests
from huggingface_hub import InferenceClient
client = InferenceClient()
response = requests.get("https://rss.art19.com/episodes/d826f550-8547-4552-9c11-396c4537b349.mp3")
response.raise_for_status()
with open('test.mp3', "wb") as f:
f.write(response.content)
result = client.automatic_speech_recognition('test.mp3',
extra_body={'return_timestamps':True},
model="openai/whisper-large-v3-turbo")
result.chunks
### Output
[AutomaticSpeechRecognitionOutputChunk(text=" It's the Word of the Day podcast for January 31st.", timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=" Today's word is encroach, spelled E-N-C-R-O-A-C-H.", timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' Encroach is a verb.', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' To encroach is to gradually move or go into an area that is beyond the usual or desired', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' limits, or to gradually take or begin to use or affect something that belongs to or is being used', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=" by someone else. Encroach is often followed by the words on or upon. Here's the word used in a", timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' sentence from the Hollywood Reporter. In their young adult years, Mufasa and Taka find their', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' courage and loyalty tested when a group of white lions encroach upon the pride. The history behind', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' the word encroach is likely to hook you in. The word comes from the Middle English verb encrochen,', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' which means to get or seize. The Anglo-French predecessor of encrochen is encroché, which was', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' formed by combining the prefix en-i-n, meaning in, with the noun coche, meaning hook. Coche is also', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' an ancestor of our word crochet. That word first referred to a crochet hook or to the needlework', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' done with it. Encroach carries the meaning of intrude, both in terms of privilege and property.', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' The word can also hop over legal barriers to describe a general advancement beyond desirable', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' or normal limits, such as a hurricane that encroaches on the mainland.', timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=" With your Word of the Day, I'm Peter Sokolowski.", timestamps=None),
AutomaticSpeechRecognitionOutputChunk(text=' visit merriamwebster.com today for definitions wordplay and trending word lookups', timestamps=None)] While from huggingface_hub.inference._providers import PROVIDER_T, HFInferenceTask, get_provider_helper
import json
provider_helper = get_provider_helper(client.provider, task="automatic-speech-recognition")
request_parameters = provider_helper.prepare_request(
inputs='test.mp3',
parameters={'return_timestamps':True},
headers=client.headers,
model="openai/whisper-large-v3-turbo",
api_key=client.token,
)
response = client._inner_post(request_parameters)
json.loads(response)
### Output
{'text': " It's the Word of the Day podcast for January 31st. Today's word is encroach, spelled E-N-C-R-O-A-C-H. Encroach is a verb. To encroach is to gradually move or go into an area that is beyond the usual or desired limits, or to gradually take or begin to use or affect something that belongs to or is being used by someone else. Encroach is often followed by the words on or upon. Here's the word used in a sentence from the Hollywood Reporter. In their young adult years, Mufasa and Taka find their courage and loyalty tested when a group of white lions encroach upon the pride. The history behind the word encroach is likely to hook you in. The word comes from the Middle English verb encrochen, which means to get or seize. The Anglo-French predecessor of encrochen is encroché, which was formed by combining the prefix en-i-n, meaning in, with the noun coche, meaning hook. Coche is also an ancestor of our word crochet. That word first referred to a crochet hook or to the needlework done with it. Encroach carries the meaning of intrude, both in terms of privilege and property. The word can also hop over legal barriers to describe a general advancement beyond desirable or normal limits, such as a hurricane that encroaches on the mainland. With your Word of the Day, I'm Peter Sokolowski. visit merriamwebster.com today for definitions wordplay and trending word lookups",
'chunks': [{'timestamp': [0.0, 9.16],
'text': " It's the Word of the Day podcast for January 31st."},
{'timestamp': [11.48, 17.04],
'text': " Today's word is encroach, spelled E-N-C-R-O-A-C-H."},
{'timestamp': [17.28, 18.34], 'text': ' Encroach is a verb.'},
{'timestamp': [18.8, 24.56],
'text': ' To encroach is to gradually move or go into an area that is beyond the usual or desired'},
{'timestamp': [0.0, 6.5],
'text': ' limits, or to gradually take or begin to use or affect something that belongs to or is being used'},
{'timestamp': [6.5, 13.4],
'text': " by someone else. Encroach is often followed by the words on or upon. Here's the word used in a"},
{'timestamp': [13.4, 20.18],
'text': ' sentence from the Hollywood Reporter. In their young adult years, Mufasa and Taka find their'},
{'timestamp': [20.18, 27.12],
'text': ' courage and loyalty tested when a group of white lions encroach upon the pride. The history behind'},
{'timestamp': [0.0, 6.4],
'text': ' the word encroach is likely to hook you in. The word comes from the Middle English verb encrochen,'},
{'timestamp': [6.94, 14.18],
'text': ' which means to get or seize. The Anglo-French predecessor of encrochen is encroché, which was'},
{'timestamp': [14.18, 21.9],
'text': ' formed by combining the prefix en-i-n, meaning in, with the noun coche, meaning hook. Coche is also'},
{'timestamp': [21.9, 28.26],
'text': ' an ancestor of our word crochet. That word first referred to a crochet hook or to the needlework'},
{'timestamp': [0.0, 6.38],
...
'text': ' or normal limits, such as a hurricane that encroaches on the mainland.'},
{'timestamp': [18.98, 20.88],
'text': " With your Word of the Day, I'm Peter Sokolowski."},
{'timestamp': [0.0, 9.96],
'text': ' visit merriamwebster.com today for definitions wordplay and trending word lookups'}]} |
First reported by @freddyaboulton on slack (private).
For some tasks (e.g. automatic-speech-recognition) we are sending raw bytes in the HTTP request. On InferenceAPI, there is a "content-type-guess" logic to implicitly interpret the data that is sent. However, this logic can be flawed and moreover, the Inference Endpoints don't have this logic.
It currently possible to provide the
Content-Type
headers by providing a value when initializing theInferenceClient
:However this is not well documented + it doesn't feel correct to initialize a client with a specific content type (it would mean that all requests made with the client have to send the same content type).
Opening the issue here but not clear to me what we'd like to do. A few solutions could be:
Open to suggestions on this
The text was updated successfully, but these errors were encountered: