Skip to content

[release/8.0.4xx] Containers - Retry on download blob #48987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: release/8.0.4xx
Choose a base branch
from

Conversation

surayya-MS
Copy link
Member

@surayya-MS surayya-MS commented May 15, 2025

Fixes #48940

Description/Customer Impact

When building images we often have to pull base image layers from remote sources. These layers can be quite large, and so network interruption can cause the entire build to report failure. Typically we try to retry these operations (e.g. the Copy Task in MSBuild itself), so we should do the same here.

We're forced to use pretty naive retries (operation-level, not chunk-level) because the OCI HTTP API and many popular registries do not support chunked requests well.

Changes made

  1. Copied logic for improved logging in DownloadBlobAsync in case of failure from upgraded logging for #595 log url when connection is closed #44953 (too many commits, something goes wrong when try git cherry-pick)
  2. Modified Registry.DownloadBlobAsync by adding a retry mechanism

Testing

Added unit test for retry mechanism.

Risk

Low - tests have been added and the mechanism is very simple.

@surayya-MS surayya-MS marked this pull request as ready for review May 15, 2025 19:20
@surayya-MS surayya-MS requested a review from a team as a code owner May 15, 2025 19:20
@surayya-MS surayya-MS changed the title Retry on download blob [release/8.0.4xx] Containers - Retry on download blob May 15, 2025
@@ -539,6 +539,37 @@ public void IsRegistryInsecure(string registryName, string? insecureRegistriesEn
Assert.Equal(expectedInsecure, registrySettings.IsInsecure);
}

[Fact]
public async Task DownloadBlobAsync_RetriesOnFailure()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there also be a test that the download gives up if the number of retries is exceeded?

mockRegistryAPI.Verify(api => api.Blob.GetStreamAsync(repoName, descriptor.Digest, cancellationToken), Times.Exactly(3)); // Verify retries

//Cleanup
File.Delete(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be in a finally block so it cleans up even when the test fails?

@@ -237,6 +237,11 @@ private static async Task<int> PushToRemoteRegistryAsync(ILogger logger, BuiltIm
cancellationToken)).ConfigureAwait(false);
logger.LogInformation(Strings.ContainerBuilder_ImageUploadedToRegistry, destinationImageReference, destinationImageReference.RemoteRegistry.RegistryName);
}
catch (UnableToDownloadFromRepositoryException)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect to see this updated in PushToLocalRegistryAsync as well, correct? That's at the root of the stack in the linked issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants