Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve to_pyarrow_batches for PostgreSQL backend #10938

Closed
1 task done
ronif opened this issue Mar 5, 2025 · 1 comment · Fixed by #10954
Closed
1 task done

feat: Improve to_pyarrow_batches for PostgreSQL backend #10938

ronif opened this issue Mar 5, 2025 · 1 comment · Fixed by #10954
Labels
feature Features or general enhancements

Comments

@ronif
Copy link
Contributor

ronif commented Mar 5, 2025

Is your feature request related to a problem?

Hi,

It seems that to_pyarrow_batches is implemented somewhat naively in many backends. In many cases (including the SQL backends) all the data is first instantiated in the client-side cursor (or as a pandas DF) and then partitioned to batches. This means that something like

remote_con.table('huge_table').to_pyarrow_batches(...) tries to allocate the whole table in memory.

What is the motivation behind your request?

No response

Describe the solution you'd like

PostgreSQL (and maybe other backends) has a mechanism to batch the results server-side.

I can make a PR for this.

What version of ibis are you running?

10.2.0

What backend(s) are you using, if any?

PostgreSQL

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ronif ronif added the feature Features or general enhancements label Mar 5, 2025
@cpcloud
Copy link
Member

cpcloud commented Mar 6, 2025

@ronif Thanks for the issue!

Would definitely review a PR to improve to_pyarrow_batches() for Postgres.

cpcloud pushed a commit that referenced this issue Mar 9, 2025
…sors (#10954)

This adds a specific `to_pyarrow_batches` implementation to the
PostgreSQL backend, which uses server side cursors. This allows ibis to
allocate memory needed only for `chunk_size` results of the query
instead of the whole set.

Resolves #10938
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: done
Development

Successfully merging a pull request may close this issue.

2 participants