-
Notifications
You must be signed in to change notification settings - Fork 7
Role-based S3 access (AssumeRole) #875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: antalya-25.6
Are you sure you want to change the base?
Conversation
retryable_errors.push_back("IDPCommunicationError"); | ||
retryable_errors.push_back("InvalidIdentityToken"); | ||
aws_client_configuration.retryStrategy = std::make_shared<Aws::Client::SpecifiedRetryableErrorsRetryStrategy>( | ||
retryable_errors, /*maxRetries=*/ 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, explain how does hardcoded maxRetries correspond with configuration.s3_retry_attempts ?
It existed before your change (e.g. in AwsAuthSTSAssumeRoleWebIdentityCredentialsProvider ctor) , but the question remains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a complete stranger in s3 secure world, but is my understanding correct that there is no audit facilities?
The credential machine is not trivial after AssumeRole is introduced, so may be we should start from making log
LOG_TRACE(logger, "Successfully assumed role using metadata credentials");
a bit more verbose and increase its severity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit more verbose and increase its severity?
Well, all other similar logs are no higher than trace. There can be thousands of those lines per second, and they are basically meaningless, aren't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, explain how does hardcoded maxRetries correspond with configuration.s3_retry_attempts ?
As far as I understand, this "3 retries" policy is specific for those retryable errors listed above (and those are errors in communicating with STS, which is in general a standalone thing not related to S3), this s3_retry_attempts
-- it is a general setting for S3.
I took this from existing code (AwsAuthSTSAssumeRoleWebIdentityCredentialsProvider
), but now I think about it and I do not know why they decided to set this to 3....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be thousands of those lines per second,
Sounds a bit worrying.
Does it mean that we are requesting thousands of tokens per a second?
and they are basically meaningless, aren't they?
They are meaningless if there is a way (e.g. using AWS logs) to clarify how exactly access was obtained.
Anyway, audit should be out of the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that we are requesting thousands of tokens per a second?
No, my bad -- of course, they are reused after obtaining them.
We need some doc because the change is visible for users. |
I have some: https://gist.github.com/zvonand/bb958ddc11bdcc788e0c13b363b56254 I did not include it in the PR -- anyway, we are not using those in-repo docs, are we? Or better to also include it in "standard" docs? @ilejn |
It is clear that
So, just adding the link to the description of the PR should be fine. If you can find a proper place in docs, e.g. in engines/table-engines/integrations/s3.md, may be it worth writing something very brief as well. |
39cb292
to
52e305f
Compare
@@ -721,6 +805,26 @@ S3CredentialsProviderChain::S3CredentialsProviderChain( | |||
) | |||
); | |||
} | |||
else | |||
{ | |||
DB::S3::PocoHTTPClientConfiguration aws_client_configuration = DB::S3::ClientFactory::instance().createClientConfiguration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not DB::S3 a current namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it is. But I guess it does not actually matter here -- it is done similarly in other places. Maybe some ambiguity can arise without explicitly specifying this...
52e305f
to
c6680b8
Compare
Role-based S3 access
0daece9
to
1783b9b
Compare
fix sts_client remove commented code add docs fix build
40775fb
to
2161c79
Compare
Also includes #688
https://gist.github.com/zvonand/bb958ddc11bdcc788e0c13b363b56254
If
extra_credentials(role_arn=...)
is provided, temporary credentials are requested from AWS STS and used to access S3.It is now possible to use EC2 metadata to perform AssumeRole. So, no need to explicitly specify AWS credentials
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added AWS IAM role assumption in
s3
table function using AWS EC2 credentials.Documentation entry for user-facing changes
...
Exclude tests: