Add Cerebras rate limit handler #2044

kevint-cerebras · 2025-08-18T17:39:57Z

Use exponential backoff to improve UX with Cerebras models.

Since the hourly and daily rate limits for tokens are the same, TPM is the limiter -> max retry wait time = 60 seconds so users can fully take advantage of using Cerebras

eloquence · 2025-08-20T07:23:18Z

I've tested this PR with Cerebras and can confirm that the user experience is much improved - requests do get throttled, but you no longer get a timeout pretty much immediately.

JC1738 · 2025-08-22T03:47:29Z

Same, I tested on my fork and made using Cerebras doable.

Add Cerebras rate limit handler

a7d12c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Cerebras rate limit handler #2044

Add Cerebras rate limit handler #2044

kevint-cerebras commented Aug 18, 2025

Uh oh!

eloquence commented Aug 20, 2025

Uh oh!

JC1738 commented Aug 22, 2025

Uh oh!

Uh oh!

Add Cerebras rate limit handler #2044

Are you sure you want to change the base?

Add Cerebras rate limit handler #2044

Conversation

kevint-cerebras commented Aug 18, 2025

Uh oh!

eloquence commented Aug 20, 2025

Uh oh!

JC1738 commented Aug 22, 2025

Uh oh!

Uh oh!