-
Notifications
You must be signed in to change notification settings - Fork 126
Disable s3 checksumming #2337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Disable s3 checksumming #2337
Conversation
python/arcticdb/__init__.py
Outdated
import os as _os | ||
|
||
# Since aws-sdk-cpp >= 1.11.486, it has turned on checksum integrity check `x-amz-checksum-mode: enabled` | ||
# This feature is not supported in many s3 implementation and causes error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/implementation/implementations
and causes error
doesn't make sense, and we should describe the kinds of problems that can be caused
python/arcticdb/__init__.py
Outdated
|
||
# Since aws-sdk-cpp >= 1.11.486, it has turned on checksum integrity check `x-amz-checksum-mode: enabled` | ||
# This feature is not supported in many s3 implementation and causes error | ||
# Update environment variable before import arcticdb_ext to disable checksum validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to add these hooks when arcticdb_ext
is imported? May be more robust - nothing stops a user from importing arcticdb_ext
first themselves!
python/arcticdb/__init__.py
Outdated
# Update environment variable before import arcticdb_ext to disable checksum validation | ||
if _os.getenv("AWS_RESPONSE_CHECKSUM_VALIDATION") == "when_supported" or _os.getenv("AWS_REQUEST_CHECKSUM_CALCULATION") == "when_supported": | ||
_logging.getLogger(__name__).warning( | ||
"Checksum validation has been specfically enabled by user, which the endpoint may not support and causes error." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comment above, this doesn't make sense
pytest.param("forkserver", marks=FORK_SUPPORTED), | ||
] | ||
) | ||
def test_s3_checksum_off_by_env_var(s3_storage, lib_name, multiprocess): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, this test doesn't have anything to do with the checksum?
Edit: Oh I see. You've changed the fixture so it breaks if anyone sends it a checksum. You should explain this in a comment.
pytest.param("forkserver", marks=FORK_SUPPORTED), | ||
] | ||
) | ||
def test_s3_checksum_off_by_env_var(s3_storage, lib_name, multiprocess): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should check that the fixture actually fails in the way you expect when the checksum is enabled
python/arcticdb/__init__.py
Outdated
# Since aws-sdk-cpp >= 1.11.486, it has turned on checksum integrity check `x-amz-checksum-mode: enabled` | ||
# This feature is not supported in many s3 implementation and causes error | ||
# Update environment variable before import arcticdb_ext to disable checksum validation | ||
if _os.getenv("AWS_RESPONSE_CHECKSUM_VALIDATION") == "when_supported" or _os.getenv("AWS_REQUEST_CHECKSUM_CALCULATION") == "when_supported": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have any tests for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e6641d1
to
565d49e
Compare
245a02c
to
657e259
Compare
(request_checksum && std::string(request_checksum) == "when_supported")) { | ||
log::storage().warn("S3 Checksum validation has been specifically enabled by user" | ||
"If endpoint doesn't support it, its response will not contain checksum" | ||
"which will be rejected by SDK and lead to storage exception in arcticdb"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing a space between checksum
and which
. We've actually seen worse failures than this where PutObject
calls succeed but write garbage to the storage, so the description of the impact isn't quite correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't run into the garbage issue. Will update
|
||
template<typename ConfigType> | ||
auto get_s3_config(const ConfigType& conf) { | ||
configure_s3_checksum_validation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to me to introduce this side-effect in get_s3_config
which I would expect to be a pure function, shouldn't we push it up a layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not pushing it by a level as the env var is mandatory for all s3 clients here. I rather rename get_s3_config
to clarify what it does now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
657e259
to
3441ee4
Compare
"If endpoint doesn't support it, its response will not contain checksum" | ||
"which will be rejected by SDK and lead to storage exception in arcticdb"); | ||
log::storage().warn("S3 Checksum validation has been specifically enabled by user. " | ||
"If endpoint doesn't support it, 1. garbage could be siliently written " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"garbage" isn't a great user-facing message, how about "incorrect objects could be..."
There is also a typo "siliently"
Reference Issues/PRs
#2251
What does this implement or fix?
Disable s3 checksumming by setting environment variable in the wheel.
Any other comments?
This will also unblock the upgrade of
aws-sdk-cpp
on vcpkg.The upgrade will not be made in this PR
One of the newly added test is needed to be skipped as
conda
CI hasaws-sdk-cpp
pinned at non-s3-checksumming version due thelibarrow
pin.environment-dev.yml
doesn't align with the counterpart in the feedstock. Therefore the new version ofaws-sdk-cpp
is only used in the feedstock thus release wheel but not in local and CI build here.This will be addressed in separate ticket.
Commit to remove
libarrow
pin so more updatedaws-sdk-cpp
, which support s3 checksumming is in used in condaIt's for verifying the change with the newly added the test. The test is successful.
Checklist
Checklist for code changes...