|
1 |
| -# Rate Limit Headers |
| 1 | +# Response Headers |
2 | 2 |
|
3 |
| -When you make a request to the proxy, the proxy will return the following [OpenAI-compatible headers](https://platform.openai.com/docs/guides/rate-limits/rate-limits-in-headers): |
| 3 | +When you make a request to the proxy, the proxy will return the following headers: |
4 | 4 |
|
5 |
| -- `x-ratelimit-remaining-requests` - Optional[int]: The remaining number of requests that are permitted before exhausting the rate limit. |
6 |
| -- `x-ratelimit-remaining-tokens` - Optional[int]: The remaining number of tokens that are permitted before exhausting the rate limit. |
7 |
| -- `x-ratelimit-limit-requests` - Optional[int]: The maximum number of requests that are permitted before exhausting the rate limit. |
8 |
| -- `x-ratelimit-limit-tokens` - Optional[int]: The maximum number of tokens that are permitted before exhausting the rate limit. |
9 |
| -- `x-ratelimit-reset-requests` - Optional[int]: The time at which the rate limit will reset. |
10 |
| -- `x-ratelimit-reset-tokens` - Optional[int]: The time at which the rate limit will reset. |
| 5 | +## Rate Limit Headers |
| 6 | +[OpenAI-compatible headers](https://platform.openai.com/docs/guides/rate-limits/rate-limits-in-headers): |
11 | 7 |
|
12 |
| -These headers are useful for clients to understand the current rate limit status and adjust their request rate accordingly. |
| 8 | +| Header | Type | Description | |
| 9 | +|--------|------|-------------| |
| 10 | +| `x-ratelimit-remaining-requests` | Optional[int] | The remaining number of requests that are permitted before exhausting the rate limit | |
| 11 | +| `x-ratelimit-remaining-tokens` | Optional[int] | The remaining number of tokens that are permitted before exhausting the rate limit | |
| 12 | +| `x-ratelimit-limit-requests` | Optional[int] | The maximum number of requests that are permitted before exhausting the rate limit | |
| 13 | +| `x-ratelimit-limit-tokens` | Optional[int] | The maximum number of tokens that are permitted before exhausting the rate limit | |
| 14 | +| `x-ratelimit-reset-requests` | Optional[int] | The time at which the rate limit will reset | |
| 15 | +| `x-ratelimit-reset-tokens` | Optional[int] | The time at which the rate limit will reset | |
13 | 16 |
|
14 |
| -## How are these headers calculated? |
| 17 | +### How Rate Limit Headers work |
15 | 18 |
|
16 | 19 | **If key has rate limits set**
|
17 | 20 |
|
18 | 21 | The proxy will return the [remaining rate limits for that key](https://github.com/BerriAI/litellm/blob/bfa95538190575f7f317db2d9598fc9a82275492/litellm/proxy/hooks/parallel_request_limiter.py#L778).
|
19 | 22 |
|
20 | 23 | **If key does not have rate limits set**
|
21 | 24 |
|
22 |
| -The proxy returns the remaining requests/tokens returned by the backend provider. |
| 25 | +The proxy returns the remaining requests/tokens returned by the backend provider. (LiteLLM will standardize the backend provider's response headers to match the OpenAI format) |
23 | 26 |
|
24 | 27 | If the backend provider does not return these headers, the value will be `None`.
|
| 28 | + |
| 29 | +These headers are useful for clients to understand the current rate limit status and adjust their request rate accordingly. |
| 30 | + |
| 31 | + |
| 32 | +## Latency Headers |
| 33 | +| Header | Type | Description | |
| 34 | +|--------|------|-------------| |
| 35 | +| `x-litellm-response-duration-ms` | float | Total duration of the API response in milliseconds | |
| 36 | +| `x-litellm-overhead-duration-ms` | float | LiteLLM processing overhead in milliseconds | |
| 37 | + |
| 38 | +## Retry, Fallback Headers |
| 39 | +| Header | Type | Description | |
| 40 | +|--------|------|-------------| |
| 41 | +| `x-litellm-attempted-retries` | int | Number of retry attempts made | |
| 42 | +| `x-litellm-attempted-fallbacks` | int | Number of fallback attempts made | |
| 43 | +| `x-litellm-max-fallbacks` | int | Maximum number of fallback attempts allowed | |
| 44 | + |
| 45 | +## Cost Tracking Headers |
| 46 | +| Header | Type | Description | |
| 47 | +|--------|------|-------------| |
| 48 | +| `x-litellm-response-cost` | float | Cost of the API call | |
| 49 | +| `x-litellm-key-spend` | float | Total spend for the API key | |
| 50 | + |
| 51 | +## LiteLLM Specific Headers |
| 52 | +| Header | Type | Description | |
| 53 | +|--------|------|-------------| |
| 54 | +| `x-litellm-call-id` | string | Unique identifier for the API call | |
| 55 | +| `x-litellm-model-id` | string | Unique identifier for the model used | |
| 56 | +| `x-litellm-model-api-base` | string | Base URL of the API endpoint | |
| 57 | +| `x-litellm-version` | string | Version of LiteLLM being used | |
| 58 | +| `x-litellm-model-group` | string | Model group identifier | |
| 59 | + |
| 60 | +## Response headers from LLM providers |
| 61 | + |
| 62 | +LiteLLM also returns the original response headers from the LLM provider. These headers are prefixed with `llm_provider-` to distinguish them from LiteLLM's headers. |
| 63 | + |
| 64 | +Example response headers: |
| 65 | +``` |
| 66 | +llm_provider-openai-processing-ms: 256 |
| 67 | +llm_provider-openai-version: 2020-10-01 |
| 68 | +llm_provider-x-ratelimit-limit-requests: 30000 |
| 69 | +llm_provider-x-ratelimit-limit-tokens: 150000000 |
| 70 | +``` |
| 71 | + |
0 commit comments