Skip to content

Conversation

@typik89
Copy link

@typik89 typik89 commented Mar 3, 2025

Problem

I'm using jdbc sink task to save data to Postgresql. I wanted to use configuration property of jdbc driver reWriteBatchedInserts=true to optimize saving data.
This property toggles converting my separate upsert queries to one multi-value upsert. But there is problem when such multi-value query contains updates for same primary key.

Solution

I think it would be useful to have a feature that, before forming batches of database queries, reduces batches of records by keeping only the most recent record for a primary key.

record(primaryKey,value): (1,1),(2,2),(1,3) -> (2,2),(1,3)

It might be practical not only for my case and potentially could reduce number of queries when there are updates by same primary key.

Testing done:
  • Unit tests
  • Integration tests
  • System tests
  • Manual tests

@typik89 typik89 requested a review from a team as a code owner March 3, 2025 13:47
@confluent-cla-assistant
Copy link

❌ Error getting contributor login(s).
Please ensure the email address associated with this commit is added to your Github account.

@typik89
Copy link
Author

typik89 commented Mar 3, 2025

❌ Error getting contributor login(s). Please ensure the email address associated with this commit is added to your Github account.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant