Skip to content

Commit cdccd5f

Browse files
committed
libstore/filetransfer,http-binary-cache-store: disable on substituter 500s instead of throwing
Right now, if a substituter returns a 5xx other than 501, 505 or 511, nix will keep trying to use it even though it is broken/misconfigured at least four more times before trying to use the next one (given the necessary patch is applied, otherwise it throws), resulting in unnecessary log spam. Also, if you are receiving 504s (such as if an nginx proxy is trying to connect to a substituter but timing out) this can lead to rather nasty waits - I have personally seen it take 20-30s, with it timing out at each individual request, before it timing out completely and giving up. Awful. This doesn't really make sense because 500s wont go away unless the server is fixed? This change would make nix disable an http substituter on the first 500 it gets and immediately switch to the next substituter (intuitive default behaviour?) This also preserves `settings.tryFallback` as it will still throw if false and all the substituters have been disabled because they have had (terminal) errors. (that logic is unrelated to the http binary cache store and is in substition-goal/store-api, see NixOS#13301 ) linked to NixOS#13301 ![nix error log](https://github.com/user-attachments/assets/cc2a7911-4298-43fa-a3b6-b68eaf88132a)
1 parent 49e9c14 commit cdccd5f

File tree

2 files changed

+15
-7
lines changed

2 files changed

+15
-7
lines changed

src/libstore/filetransfer.cc

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -496,11 +496,9 @@ struct curlFileTransfer : public FileTransfer
496496
// Most 4xx errors are client errors and are probably not worth retrying:
497497
// * 408 means the server timed out waiting for us, so we try again
498498
err = Misc;
499-
} else if (httpStatus == 501 || httpStatus == 505 || httpStatus == 511) {
500-
// Let's treat most 5xx (server) errors as transient, except for a handful:
501-
// * 501 not implemented
502-
// * 505 http version not supported
503-
// * 511 we're behind a captive portal
499+
} else if (httpStatus >= 500 && httpStatus <= 599) {
500+
// a server is unlikely to 5xx unless there is a critical configuration/system error
501+
// we can't recover from an error that isn't our fault so give up
504502
err = Misc;
505503
} else {
506504
// Don't bother retrying on certain cURL errors either

src/libstore/http-binary-cache-store.cc

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ class HttpBinaryCacheStore : public virtual BinaryCacheStore
9696
void maybeDisable()
9797
{
9898
auto state(_state.lock());
99-
if (state->enabled && settings.tryFallback) {
99+
if (state->enabled) {
100100
int t = 60;
101101
printError("disabling binary cache '%s' for %s seconds", config->getHumanReadableURI(), t);
102102
state->enabled = false;
@@ -196,8 +196,14 @@ class HttpBinaryCacheStore : public virtual BinaryCacheStore
196196
try {
197197
(*callbackPtr)(std::move(result.get().data));
198198
} catch (FileTransferError & e) {
199-
if (e.error == FileTransfer::NotFound || e.error == FileTransfer::Forbidden)
199+
if (e.error == FileTransfer::NotFound || e.error == FileTransfer::Forbidden) {
200200
return (*callbackPtr)({});
201+
}
202+
// if the server is having errors then give up on it
203+
if (e.error == FileTransfer::Misc) {
204+
maybeDisable();
205+
return (*callbackPtr)({});
206+
}
201207
maybeDisable();
202208
callbackPtr->rethrow();
203209
} catch (...) {
@@ -219,6 +225,10 @@ class HttpBinaryCacheStore : public virtual BinaryCacheStore
219225
} catch (FileTransferError & e) {
220226
if (e.error == FileTransfer::NotFound)
221227
return std::nullopt;
228+
if (e.error == FileTransfer::Misc) {
229+
maybeDisable();
230+
return std::nullopt;
231+
}
222232
maybeDisable();
223233
throw;
224234
}

0 commit comments

Comments
 (0)