Skip to content

Conversation

kevint-cerebras
Copy link

Use exponential backoff to improve UX with Cerebras models.

Since the hourly and daily rate limits for tokens are the same, TPM is the limiter -> max retry wait time = 60 seconds so users can fully take advantage of using Cerebras

@eloquence
Copy link

I've tested this PR with Cerebras and can confirm that the user experience is much improved - requests do get throttled, but you no longer get a timeout pretty much immediately.

@JC1738
Copy link

JC1738 commented Aug 22, 2025

Same, I tested on my fork and made using Cerebras doable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants