Skip to content

Role-based S3 access (AssumeRole) #875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: antalya-25.6
Choose a base branch
from
Open

Conversation

zvonand
Copy link
Collaborator

@zvonand zvonand commented Jul 2, 2025

Also includes #688

https://gist.github.com/zvonand/bb958ddc11bdcc788e0c13b363b56254

If extra_credentials(role_arn=...) is provided, temporary credentials are requested from AWS STS and used to access S3.

It is now possible to use EC2 metadata to perform AssumeRole. So, no need to explicitly specify AWS credentials

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added AWS IAM role assumption in s3 table function using AWS EC2 credentials.

Documentation entry for user-facing changes

...

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

@zvonand zvonand marked this pull request as ready for review July 3, 2025 17:38
retryable_errors.push_back("IDPCommunicationError");
retryable_errors.push_back("InvalidIdentityToken");
aws_client_configuration.retryStrategy = std::make_shared<Aws::Client::SpecifiedRetryableErrorsRetryStrategy>(
retryable_errors, /*maxRetries=*/ 3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, explain how does hardcoded maxRetries correspond with configuration.s3_retry_attempts ?

It existed before your change (e.g. in AwsAuthSTSAssumeRoleWebIdentityCredentialsProvider ctor) , but the question remains.

Copy link
Collaborator

@ilejn ilejn Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a complete stranger in s3 secure world, but is my understanding correct that there is no audit facilities?
The credential machine is not trivial after AssumeRole is introduced, so may be we should start from making log

LOG_TRACE(logger, "Successfully assumed role using metadata credentials");

a bit more verbose and increase its severity?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit more verbose and increase its severity?

Well, all other similar logs are no higher than trace. There can be thousands of those lines per second, and they are basically meaningless, aren't they?

Copy link
Collaborator Author

@zvonand zvonand Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, explain how does hardcoded maxRetries correspond with configuration.s3_retry_attempts ?

As far as I understand, this "3 retries" policy is specific for those retryable errors listed above (and those are errors in communicating with STS, which is in general a standalone thing not related to S3), this s3_retry_attempts -- it is a general setting for S3.

I took this from existing code (AwsAuthSTSAssumeRoleWebIdentityCredentialsProvider), but now I think about it and I do not know why they decided to set this to 3....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be thousands of those lines per second,

Sounds a bit worrying.
Does it mean that we are requesting thousands of tokens per a second?

and they are basically meaningless, aren't they?

They are meaningless if there is a way (e.g. using AWS logs) to clarify how exactly access was obtained.

Anyway, audit should be out of the scope of this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that we are requesting thousands of tokens per a second?

No, my bad -- of course, they are reused after obtaining them.

ilejn
ilejn previously approved these changes Jul 9, 2025
@ilejn
Copy link
Collaborator

ilejn commented Jul 11, 2025

We need some doc because the change is visible for users.

@zvonand
Copy link
Collaborator Author

zvonand commented Jul 11, 2025

I have some: https://gist.github.com/zvonand/bb958ddc11bdcc788e0c13b363b56254

I did not include it in the PR -- anyway, we are not using those in-repo docs, are we? Or better to also include it in "standard" docs? @ilejn

@ilejn
Copy link
Collaborator

ilejn commented Jul 11, 2025

I have some: https://gist.github.com/zvonand/bb958ddc11bdcc788e0c13b363b56254
I did not include it in the PR -- anyway, we are not using those in-repo docs, are we? Or better to also include it in "standard" docs? @ilejn

It is clear that

  1. the text is great
  2. stylistically it does not match CH docs, it is more like a blog post.

So, just adding the link to the description of the PR should be fine.
BTW, if you had done it before review it were a bit easier.

If you can find a proper place in docs, e.g. in engines/table-engines/integrations/s3.md, may be it worth writing something very brief as well.

@@ -721,6 +805,26 @@ S3CredentialsProviderChain::S3CredentialsProviderChain(
)
);
}
else
{
DB::S3::PocoHTTPClientConfiguration aws_client_configuration = DB::S3::ClientFactory::instance().createClientConfiguration(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not DB::S3 a current namespace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it is. But I guess it does not actually matter here -- it is done similarly in other places. Maybe some ambiguity can arise without explicitly specifying this...

ilejn
ilejn previously approved these changes Jul 11, 2025
@zvonand zvonand changed the base branch from antalya to antalya-25.6 July 11, 2025 19:37
@zvonand zvonand dismissed ilejn’s stale review July 11, 2025 19:37

The base branch was changed.

@zvonand zvonand changed the base branch from antalya-25.6 to antalya July 11, 2025 19:38
@zvonand zvonand force-pushed the s3_assume_role_2 branch from 52e305f to c6680b8 Compare July 14, 2025 12:23
@zvonand zvonand changed the base branch from antalya to antalya-25.6 July 14, 2025 12:25
@zvonand zvonand changed the title AssumeRole with AWS EC2 credentials Role-based S3 access Jul 14, 2025
@zvonand zvonand changed the title Role-based S3 access Role-based S3 access (AssumeRole) Jul 14, 2025
@zvonand zvonand requested a review from ilejn July 14, 2025 12:38
@svb-alt svb-alt added the stable label Jul 14, 2025
Copy link

github-actions bot commented Jul 14, 2025

Workflow [PR], commit [40775fb]

@svb-alt svb-alt added port-antalya PRs to be ported to all new Antalya releases and removed port-antalya PRs to be ported to all new Antalya releases labels Jul 14, 2025
@zvonand zvonand force-pushed the s3_assume_role_2 branch from 0daece9 to 1783b9b Compare July 16, 2025 18:47
fix sts_client

remove commented code

add docs

fix build
@zvonand zvonand force-pushed the s3_assume_role_2 branch from 40775fb to 2161c79 Compare July 17, 2025 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants