-
Notifications
You must be signed in to change notification settings - Fork 52
fix hanging evals #1223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix hanging evals #1223
Conversation
this tightly couples our retry logic to braintrust state, which is weird.
ab4019c to
30c1fa0
Compare
manugoyal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. Mainly one question about the use of .close
| # Reset connection pool on timeout errors to clear stale connections | ||
| # (e.g., NAT gateway dropped idle connections) | ||
| if isinstance(e, requests.exceptions.ReadTimeout): | ||
| self.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that closing the HTTPAdapter will work for subsequent requests? It
seems to be calling methods like
this,
and it's unclear to me whether that's okay? I wonder if another weird thing to
try is to recreate the adapter like roughly self = RetryRequestExceptionsAdapter(...) (even though reassigning self won't strictly
work, but we could reassign a member variable).
| def ping(self) -> bool: | ||
| try: | ||
| resp = self.get("ping") | ||
| _state.set_user_info_if_null(resp.json()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why get rid of this? If we don't need it, should we get rid of the whole user_info family of functions?
| return self.session.delete(_urljoin(self.base_url, path), *args, **kwargs) | ||
|
|
||
| def get_json(self, object_type: str, args: Mapping[str, Any] | None = None, retries: int = 0) -> Mapping[str, Any]: | ||
| # FIXME[matt]: the retry logic seems to be unused and could be n*2 because of the the retry logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We were really just thinking "add more retries" when introducing the
retry handler. But we shouldn't have compounding retry handlers.
This bug adds timeouts / retries to prevent network errors from hanging evals.
report from customer: