Skip to content

chunked: use temporary file for tar-split data #2312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Apr 9, 2025

Replace the in-memory buffer with a O_TMPFILE file. This reduces the memory requirements for a partial pull since the tar-split data can be written to disk.

c/image needs the following patch:

diff --git a/storage/storage_dest.go b/storage/storage_dest.go
index d1c5acef..e66e0ccc 100644
--- a/storage/storage_dest.go
+++ b/storage/storage_dest.go
@@ -425,6 +425,7 @@ func (s *storageImageDestination) PutBlobPartial(ctx context.Context, chunkAcces
 	if err != nil {
 		return private.UploadedBlob{}, err
 	}
+	defer differ.Close()
 
 	out, err := s.imageRef.transport.store.PrepareStagedLayer(nil, differ)
 	if err != nil {

Copy link
Contributor

openshift-ci bot commented Apr 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Apr 9, 2025
@giuseppe giuseppe force-pushed the chunked-read-tarsplit-sequentially branch 2 times, most recently from cebc9ff to 13cbaef Compare April 9, 2025 12:07
@giuseppe giuseppe force-pushed the chunked-read-tarsplit-sequentially branch 2 times, most recently from 94a8a23 to b0fd638 Compare April 15, 2025 12:44
@flouthoc flouthoc requested a review from Copilot April 15, 2025 13:57
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

@giuseppe giuseppe force-pushed the chunked-read-tarsplit-sequentially branch 2 times, most recently from 4f0de4e to 6342175 Compare April 15, 2025 21:38
@giuseppe giuseppe force-pushed the chunked-read-tarsplit-sequentially branch 3 times, most recently from f04c803 to 187c56b Compare April 16, 2025 10:31
Copy link

@liamstask liamstask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found this PR while I happened to be following a few threads related to this, and noticed that using buffered i/o would likely be worthwhile - feel free to disregard, of course!

@giuseppe giuseppe force-pushed the chunked-read-tarsplit-sequentially branch from 187c56b to d70254a Compare April 23, 2025 07:30
@giuseppe
Copy link
Member Author

@Luap99 @mtrmac the PR is ready, PTAL

extract the blob checksum validation logic from decodeAndValidateBlob
into a separate function.

Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the chunked-read-tarsplit-sequentially branch from d70254a to 0084bc1 Compare April 28, 2025 20:40
@giuseppe giuseppe force-pushed the chunked-read-tarsplit-sequentially branch 2 times, most recently from 6472d74 to dfd505c Compare April 29, 2025 10:13
Replace the in-memory buffer with a O_TMPFILE file.  This reduces the
memory requirements for a partial pull since the tar-split data can be
written to disk.

Signed-off-by: Giuseppe Scrivano <[email protected]>
Replace the direct call to unix.Open with the O_TMPFILE flag
with the dedicated openTmpFile helper function.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the chunked-read-tarsplit-sequentially branch from dfd505c to 729821c Compare April 29, 2025 10:27
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhatdan
Copy link
Member

rhatdan commented Apr 29, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 29, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit ff9e524 into containers:main Apr 29, 2025
20 checks passed
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m afraid I’m way behind, just a quick note.

I’m also generally confused about the lifetime of DriverWithDifferOutput.TarSplit.

@@ -333,13 +339,16 @@ func makeConvertFromRawDiffer(store storage.Store, blobDigest digest.Digest, blo
// makeZstdChunkedDiffer sets up a chunkedDiffer for a zstd:chunked layer.
// It may return an error matching ErrFallbackToOrdinaryLayerDownload / errFallbackCanConvert.
func makeZstdChunkedDiffer(store storage.Store, blobSize int64, tocDigest digest.Digest, annotations map[string]string, iss ImageSourceSeekable, pullOptions pullOptions) (*chunkedDiffer, error) {
manifest, toc, tarSplit, tocOffset, err := readZstdChunkedManifest(iss, tocDigest, annotations)
manifest, toc, tarSplit, tocOffset, err := readZstdChunkedManifest(store.RunRoot(), iss, tocDigest, annotations)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several error code paths; tarSplit doesn’t get closed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up in #2323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants