Skip to content

Disable s3 checksumming #2337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Disable s3 checksumming #2337

wants to merge 9 commits into from

Conversation

phoebusm
Copy link
Collaborator

@phoebusm phoebusm commented Apr 22, 2025

Reference Issues/PRs

#2251

What does this implement or fix?

Disable s3 checksumming by setting environment variable in the wheel.

Any other comments?

This will also unblock the upgrade of aws-sdk-cpp on vcpkg.
The upgrade will not be made in this PR

One of the newly added test is needed to be skipped as conda CI has aws-sdk-cpp pinned at non-s3-checksumming version due the libarrow pin.
environment-dev.yml doesn't align with the counterpart in the feedstock. Therefore the new version of aws-sdk-cpp is only used in the feedstock thus release wheel but not in local and CI build here.
This will be addressed in separate ticket.

Commit to remove libarrow pin so more updated aws-sdk-cpp, which support s3 checksumming is in used in conda
It's for verifying the change with the newly added the test. The test is successful.

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@phoebusm phoebusm added the minor Feature change, should increase minor version label Apr 23, 2025
@phoebusm phoebusm marked this pull request as ready for review April 23, 2025 13:52
import os as _os

# Since aws-sdk-cpp >= 1.11.486, it has turned on checksum integrity check `x-amz-checksum-mode: enabled`
# This feature is not supported in many s3 implementation and causes error
Copy link
Collaborator

@poodlewars poodlewars Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/implementation/implementations
and causes error doesn't make sense, and we should describe the kinds of problems that can be caused


# Since aws-sdk-cpp >= 1.11.486, it has turned on checksum integrity check `x-amz-checksum-mode: enabled`
# This feature is not supported in many s3 implementation and causes error
# Update environment variable before import arcticdb_ext to disable checksum validation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to add these hooks when arcticdb_ext is imported? May be more robust - nothing stops a user from importing arcticdb_ext first themselves!

# Update environment variable before import arcticdb_ext to disable checksum validation
if _os.getenv("AWS_RESPONSE_CHECKSUM_VALIDATION") == "when_supported" or _os.getenv("AWS_REQUEST_CHECKSUM_CALCULATION") == "when_supported":
_logging.getLogger(__name__).warning(
"Checksum validation has been specfically enabled by user, which the endpoint may not support and causes error."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comment above, this doesn't make sense

pytest.param("forkserver", marks=FORK_SUPPORTED),
]
)
def test_s3_checksum_off_by_env_var(s3_storage, lib_name, multiprocess):
Copy link
Collaborator

@poodlewars poodlewars Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, this test doesn't have anything to do with the checksum?

Edit: Oh I see. You've changed the fixture so it breaks if anyone sends it a checksum. You should explain this in a comment.

pytest.param("forkserver", marks=FORK_SUPPORTED),
]
)
def test_s3_checksum_off_by_env_var(s3_storage, lib_name, multiprocess):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check that the fixture actually fails in the way you expect when the checksum is enabled

# Since aws-sdk-cpp >= 1.11.486, it has turned on checksum integrity check `x-amz-checksum-mode: enabled`
# This feature is not supported in many s3 implementation and causes error
# Update environment variable before import arcticdb_ext to disable checksum validation
if _os.getenv("AWS_RESPONSE_CHECKSUM_VALIDATION") == "when_supported" or _os.getenv("AWS_REQUEST_CHECKSUM_CALCULATION") == "when_supported":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have any tests for this case

Copy link
Collaborator

@poodlewars poodlewars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phoebusm phoebusm force-pushed the disable_s3_checksumming branch 2 times, most recently from e6641d1 to 565d49e Compare April 25, 2025 13:19
@phoebusm phoebusm force-pushed the disable_s3_checksumming branch 2 times, most recently from 245a02c to 657e259 Compare April 29, 2025 15:40
(request_checksum && std::string(request_checksum) == "when_supported")) {
log::storage().warn("S3 Checksum validation has been specifically enabled by user"
"If endpoint doesn't support it, its response will not contain checksum"
"which will be rejected by SDK and lead to storage exception in arcticdb");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing a space between checksum and which. We've actually seen worse failures than this where PutObject calls succeed but write garbage to the storage, so the description of the impact isn't quite correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run into the garbage issue. Will update


template<typename ConfigType>
auto get_s3_config(const ConfigType& conf) {
configure_s3_checksum_validation();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to me to introduce this side-effect in get_s3_config which I would expect to be a pure function, shouldn't we push it up a layer?

Copy link
Collaborator Author

@phoebusm phoebusm Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not pushing it by a level as the env var is mandatory for all s3 clients here. I rather rename get_s3_config to clarify what it does now

Copy link
Collaborator

@poodlewars poodlewars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phoebusm phoebusm force-pushed the disable_s3_checksumming branch from 657e259 to 3441ee4 Compare April 30, 2025 14:27
"If endpoint doesn't support it, its response will not contain checksum"
"which will be rejected by SDK and lead to storage exception in arcticdb");
log::storage().warn("S3 Checksum validation has been specifically enabled by user. "
"If endpoint doesn't support it, 1. garbage could be siliently written "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"garbage" isn't a great user-facing message, how about "incorrect objects could be..."

There is also a typo "siliently"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor Feature change, should increase minor version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants