Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 support #426
S3 support #426
Changes from 64 commits
32130c2
4a5884c
804fd99
bbbe637
a752eee
ce4218a
5ba191d
5126b44
185e687
502f7cf
740a15d
7fbabdf
5b41e74
555903d
69f7fd9
39ca408
cce5b1b
1c601ad
551fe20
b0ff3d1
afb2dae
06c0311
504f15d
2de3486
5f07f2e
f391ad1
247ab4d
9adffc1
8b4436d
2287c9b
e830f62
bff52a4
f6e7ebc
3f8291a
2607ca1
6f3c25d
149d6be
98991b3
daac52f
ba6925c
c635519
b8bfef2
4c4b36e
e489bbe
73d9104
b4ac1d7
d9ba372
146efc0
b19a146
4cb54a9
379f521
8849d11
8c54a6c
bb0ad24
33b9227
2e5ff0b
378e9b6
496fe8d
ea2d93c
3bb3ce1
b9b2c44
fa4f298
c483b68
ce0c59a
bbb4042
f363f15
8129d17
37b294c
fb95490
2729dbf
ce08365
e800233
a60011a
0d68e49
72812cc
c4ccd9e
4f765d0
355d192
500681a
d2a49c9
bd83655
2687cc6
8fe82b9
fdb9a59
4e4389f
cebc0a8
61a5d50
d55030a
2123666
0c2f09b
bb445a3
4701b52
2bd411e
b40a0ad
7bcfe5b
2a9f934
b978323
00304e7
f9b49f2
0403686
0001265
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be defined as a part of
--cmake-args
, with no special--no-s3
flag? It is possible for users to specify the option via--cmake-args
. If the user gives a value there, we probably want to respect that with higher priority than defaulting to "ON" based on the absence of a--no-s3
flag.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws-sdk-cpp
requires a nearly-exact match at runtime (version x.x.x) and we need compatibility with conda-forge. Note that we will need to maintain this pinning very actively and align with conda-forge for every RAPIDS release, or we risk breaking environment compatibility with conda-forge. This is a large part of why I am hesitant on adding this dependency -- it constrains us tightly, and we do not benefit from the automated migrations that are used for conda-forge recipes to "rebuild the world" when pinnings like aws-sdk-cpp are updated.We always need to be matching this line: https://github.com/conda-forge/conda-forge-pinning-feedstock/blob/2e592a612d925988747cd6daa9e271328ceb3bfc/recipe/conda_build_config.yaml#L268
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my ignorance, but why do we need an exact match? Naively, I would have thought that a lower-limit would be more flexible? Do all packages in conda-forge needs to maintain this pinning if they use
aws-sdk-cpp
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdice, would it help if we remove the version requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I'll explain the migration logic. Then I'll answer the questions you posed about other options.
Migrations and why they're needed
The
aws-sdk-cpp
release cadence is very frequent. Some of those releases (there's a huge number of them) are picked up by conda-forge, but taking the new version requires "rebuilding the world" becauseaws-sdk-cpp
is a core C++ dependency of many packages (seemamba repoquery whoneeds -c conda-forge aws-sdk-cpp
). Updates happen via a conda-forge migration (tracked here) which rebuild everything that depends onaws-sdk-cpp
.See the number of recent AWS-related rebuilds of libarrow, for instance:
https://github.com/conda-forge/arrow-cpp-feedstock/commits/main/recipe/meta.yaml
I'll ignore the
aws-crt-cpp
and other rebuilds, and only focus onaws-sdk-cpp
because that's the dependency we are introducing here.In the past 12 months, there have been 6 rebuilds of libarrow for new versions of
aws-sdk-cpp
(1.11.379, 1.11.329, 1.11.267, 1.11.242, 1.11.210, 1.11.182).Every time that conda-forge migrates to a new version, we must use the same version so that RAPIDS (KvikIO specifically) can be installed alongside the latest conda-forge packages.
Try this:
Note that
aws-sdk-cpp 1.11.379
is included in the environment because there has been a recent rebuild oflibarrow
.Can we use a lower limit (or no version pinning)?
If we were to use a lower limit (or no version pinning), kvikio would use the latest
aws-sdk-cpp
when it is built. That would impose a runtime pinning on the latestaws-sdk-cpp
. However, if a conda-forge migration has not happened yet (or is incomplete), thosekvikio
packages would be incompatible with the latest conda-forge packages of things like libarrow. We need to match the migrators so we don't get "latestaws-sdk-cpp
" and instead align with "current version used by conda-forge's ecosystem-wide pinnings". Typically conda-forge maintainers will start a migration as soon as a newaws-sdk-cpp
is released, but that's not guaranteed to happen immediately. Sometimes migrations are delayed or skipped, which would leave our packages unusable if we had picked up anaws-sdk-cpp
version that conda-forge did not adopt with a migration.So what do we do?
RAPIDS is not built with conda-forge infrastructure, but aims to be compatible with the conda-forge ecosystem. We do not benefit from conda-forge's automation for version migrations but we must track them in order to be compatible. (The conda-forge bots that "rebuild the world" do not rebuild RAPIDS recipes, but maybe we could create such a tool.)
To remain compatible with conda-forge, we need to match the exact pinnings for core packages like
fmt
,spdlog
,aws-sdk-cpp
, and others. We have a lot of pain from matching conda-forge migrations forfmt
andspdlog
(see Keith's warning and this problem in cuspatial that comes from us being behind on spdlog/fmt migrations)Our best course of action for now is to:
aws-sdk-cpp
to the exact version in this YAML file: https://github.com/conda-forge/conda-forge-pinning-feedstock/blob/f291cfdaf8c328337cb8cd1f63c63caceeda8991/recipe/conda_build_config.yaml#L267-L268 (that's a permalink but we should track the latest as it changes)aws-sdk-cpp
imposes run-exports that pin very tightly, which typically indicates that there is no guarantee of stable ABI across versionsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened rapidsai/build-planning#100 to think more about conda-forge migration support in RAPIDS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thanks for the explanation @bdice! I now see why you were hesitant from the get-go 😊
What if we move to use
libcurl
instead? They take API and ABI stability very serious: https://curl.se/libcurl/features.html#stableapiThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking of using libcurl from the start since we only need the very basic S3 operations. I don’t think it will be too hard to implement and it will make Azure Blob Storage support straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be a better option than
aws-sdk-cpp
if we can uselibcurl
. It seems likelibcurl
is already a dependency of some packages we depend on in RAPIDS, so that gives me more confidence in it. See$ mamba repoquery whoneeds -c conda-forge libcurl 2>&1 | awk '{print $1;}' | sort | uniq
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I will give libcurl a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not list a run dependency, this will be handled by run-exports: https://github.com/conda-forge/aws-sdk-cpp-feedstock/blob/f82d968670bdb9939ed7604a5fb7bb4885e2e6ba/recipe/meta.yaml#L14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match conda-forge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this and add a
host
dependency onaws-sdk-cpp==1.11.379
here so that we can have compatible runtime pinnings inserted automatically by the run-exports.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this ignoring run exports? I'm confused overall by the changes in
libkvikio-tests
. Doeslibkvikio-tests
not depend onlibkvikio
, even as a host dependency to build the standalone tests?Perhaps this needs to use
aws-sdk-cpp==1.11.379
inhost
, with nothing in run, and don't ignore the run-exports ofaws-sdk-cpp
.