Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stream parameters in pylibcudf IO APIs #17620

Open
wants to merge 30 commits into
base: branch-25.04
Choose a base branch
from

Conversation

Matt711
Copy link
Contributor

@Matt711 Matt711 commented Dec 18, 2024

Description

Apart of #15163. Now that #13744 is closed, we can expose the stream parameter to pylibcudf. This PR will focus on the IO APIs.
Example reading CSV files on different streams
image

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@Matt711 Matt711 added feature request New feature or request non-breaking Non-breaking change labels Dec 18, 2024
Copy link

copy-pr-bot bot commented Dec 18, 2024

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. pylibcudf Issues specific to the pylibcudf package labels Dec 18, 2024
@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

python/pylibcudf/pylibcudf/io/csv.pyx Outdated Show resolved Hide resolved
python/pylibcudf/pylibcudf/io/csv.pyx Outdated Show resolved Hide resolved
python/pylibcudf/pylibcudf/io/json.pyx Outdated Show resolved Hide resolved
@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 19, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 19, 2024

/ok to test

@vyasr
Copy link
Contributor

vyasr commented Dec 20, 2024

We should address rapidsai/rmm#1770 before we merge this PR or anything like it in cudf.

@Matt711
Copy link
Contributor Author

Matt711 commented Jan 18, 2025

TODO: Can do the AVRO too after #17766

@Matt711 Matt711 changed the base branch from branch-25.02 to branch-25.04 February 4, 2025 13:48
Copy link

copy-pr-bot bot commented Feb 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Matt711
Copy link
Contributor Author

Matt711 commented Feb 4, 2025

/ok to test

Copy link
Contributor Author

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We should be able to do something like this, but it currently produces a segfault that I will debug. I don't think this is a blocker for this PR though.

@@ -126,7 +128,8 @@ cdef class AvroReaderOptionsBuilder:


cpdef TableWithMetadata read_avro(
AvroReaderOptions options
AvroReaderOptions options,
Stream stream = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Stream stream = None,
Stream stream = DEFAULT_STREAM,

Comment on lines +148 to +151
if stream is not None:
c_result = move(cpp_read_avro(options.c_obj, stream.view()))
else:
c_result = move(cpp_read_avro(options.c_obj))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if stream is not None:
c_result = move(cpp_read_avro(options.c_obj, stream.view()))
else:
c_result = move(cpp_read_avro(options.c_obj))
c_result = move(cpp_read_avro(options.c_obj, stream.view()))

@Matt711 Matt711 marked this pull request as ready for review February 5, 2025 02:02
@Matt711 Matt711 requested a review from a team as a code owner February 5, 2025 02:02
@Matt711 Matt711 requested review from wence- and mroeschke February 5, 2025 02:02
@Matt711
Copy link
Contributor Author

Matt711 commented Feb 5, 2025

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Feb 5, 2025

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Feb 5, 2025

Note: I think we should also expose cudf::is_ptds_enabled() and set stream = PER_THREAD_DEFAULT_STREAM if true. I'll do this in a follow-up PR.

Tracking Issue: #17919

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants