Timeout Problems in Nightly Updates

Recently, we've been seeing a lot more timeout warnings in our nightly script runs that have resulted in *hundreds* of repos being skipped (as noted in #695).

This is not the same as other/previous "timeout" issues, which were the result of waiting for GitHub to finish collating data on the server side; in those cases, we get a response from the API telling us it isn't ready yet, and we decide if/when we want to ask again. These newer timeouts occur when waiting to get a response from the GitHub API at all.

```
HTTPSConnectionPool(host='api.github.com', port=443): Read timed out. (read timeout=10)
```

This 10 second limit appears to be set by the `LLNL/scraper` we're using, at a level higher than just the GitHub specific module:

https://github.com/LLNL/scraper/blob/536a72ce1ceb2e209281ff72a2ed59e735d45c33/scraper/util.py#L11-L13

https://github.com/LLNL/scraper/blob/536a72ce1ceb2e209281ff72a2ed59e735d45c33/scraper/github/queryManager.py#L17

Evidently, this 10 seconds isn't reliably sufficient for a response from GitHub's API anymore.

We could do any/all of the following to try and fix these cases:

1. Adjust the hardcoded limit in the base scraper.
2. Choose another level at which to override this limit.
3. Add this to the variety of cases we try to detect to retry the query (since evidently fresh queries are able to proceed unhindered).

How do we want to handle this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Timeout Problems in Nightly Updates #701

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timeout Problems in Nightly Updates #701

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions