Skip to content

Implement COPY … FROM STDIN queries #566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ahoppen
Copy link
Contributor

@ahoppen ahoppen commented Jun 24, 2025

This implements support for COPY operations using COPY … FROM STDIN queries for fast data transfer from the client to the backend.

Performance

A quick note on local performance measurements: Inserting the numbers from 0 to 1,000,000 into a table that has two columns (INT and VARCHAR) takes ~150ms. Depending on the exact implementation, the majority of the active CPU cycles are spent converting the numbers to strings, inside string interpolation or inside ByteBuffer._setBytes. If I remove the code that sends the CopyData messages to the backend (but keep all other logic that might incur thread hopes), the test described above takes ~50ms and utilizes the CPU at ~200%, so the real bottleneck here is the Postges backend handling the data. For comparison psycopg2 takes 210ms with the data to be written already prepared in a StringIO object. So, performance-wise this PR should be good to go.

Ideas for follow-up PRs

  • Check if we should buffer data sent through the PostgresCopyFromWriter to reduce the number of CopyData messages we need to send (and thus the protocol overhead). Alternatively, we can leave that kind of optimization to the client.
  • Add an API that allows binary transfer of data
  • Implement remaining options that can be passed to COPY FROM.
  • Allow concurrently generating the data to be written and flushing a buffer to the backend.

Fixes #290

Copy link
Collaborator

@fabianfett fabianfett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another review round. Thanks so much for pushing through this!

@ahoppen ahoppen changed the title WIP: Implement COPY … FROM STDIN Implement COPY … FROM STDIN queries Jul 7, 2025
@ahoppen ahoppen marked this pull request as ready for review July 7, 2025 22:46
@ahoppen ahoppen requested a review from gwynne as a code owner July 7, 2025 22:46
ahoppen added 2 commits July 8, 2025 10:36
This implements support for COPY operations using `COPY … FROM STDIN` queries for fast data transfer from the client to the backend.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Copy In Mode
2 participants