-
Notifications
You must be signed in to change notification settings - Fork 58
Description
Recently, we've been seeing a lot more timeout warnings in our nightly script runs that have resulted in hundreds of repos being skipped (as noted in #695).
This is not the same as other/previous "timeout" issues, which were the result of waiting for GitHub to finish collating data on the server side; in those cases, we get a response from the API telling us it isn't ready yet, and we decide if/when we want to ask again. These newer timeouts occur when waiting to get a response from the GitHub API at all.
HTTPSConnectionPool(host='api.github.com', port=443): Read timed out. (read timeout=10)
This 10 second limit appears to be set by the LLNL/scraper we're using, at a level higher than just the GitHub specific module:
Evidently, this 10 seconds isn't reliably sufficient for a response from GitHub's API anymore.
We could do any/all of the following to try and fix these cases:
- Adjust the hardcoded limit in the base scraper.
- Choose another level at which to override this limit.
- Add this to the variety of cases we try to detect to retry the query (since evidently fresh queries are able to proceed unhindered).
How do we want to handle this?