Skip to content

Conversation

@lostmygithubaccount
Copy link

resolves #645

"AI" use disclosure: this code was entirely written using Claude Code on Opus 4.5, with a lot of direction. I've run the tests, including on the repro.sql in the linked discussion. if this is more work than it's worth, feel free to discard and apologies! I'm not comfortable writing C++ myself but would love to see this fixed

> /cost
  ⎿  Total cost:            $12.72
     Total duration (API):  34m 37s
     Total duration (wall): 1h 22m 54s
     Total code changes:    344 lines added, 167 lines removed
     Usage by model:
             claude-haiku:  180.4k input, 26.7k output, 2.0m cache read, 218.9k cache write ($0.78)
          claude-opus-4-5:  11.9k input, 67.4k output, 13.5m cache read, 546.2k cache write, 3 web search ($11.93)

keeps the first row if there are duplicates on merge/update

Copy link
Collaborator

@pdet pdet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Cody, long time no see! Thanks for the PR, and for being upfront wrt AI usage, I confess I'm extremely bear-ish of LLMs coding C++, but since this is somewhat localized I gave it a go reviewing it :-)

Could we also add inlined tests? I think these should probably be fine since they are executed in the catalog, I'm wondering what should happen if the catalog does not support multiple updates in the same row though, maybe that's not a problem since the catalog would just fail? I guess PG and SQLite can also resolve this.

@lostmygithubaccount
Copy link
Author

Hey Cody, long time no see!

I've been seeing you :) just lurking around the DuckLake repo a lot; I'm a huge fan and it's really cool to have something so relatively simple driving real use cases for Ascend's customers too

Thanks for the PR, and for being upfront wrt AI usage, I confess I'm extremely bear-ish of LLMs coding C++, but since this is somewhat localized I gave it a go reviewing it :-)

np! yeah I figured this was well-scoped enough and would be a cool experiment for me. and this error is really annoying for us, but not annoying enough that we've worked around it. + given how extensively we use DuckLake, always good to contribute back. I will of course not be spamming this repo w/ AI slop and will try to be a good OSS citizen on any future contributions as well. probably should have caught the hash and vector library code 🤷

Could we also add inlined tests? I think these should probably be fine since they are executed in the catalog, I'm wondering what should happen if the catalog does not support multiple updates in the same row though, maybe that's not a problem since the catalog would just fail? I guess PG and SQLite can also resolve this.

added an inlining test -- my understanding is the data is de-duplicated before it hits the catalog for the inlined data tables, but I could be wrong on that

thanks for the review! again really love all the work you and the DuckDB folks are doing for our industry

@pdet
Copy link
Collaborator

pdet commented Dec 24, 2025

Hey Cody, long time no see!

I've been seeing you :) just lurking around the DuckLake repo a lot; I'm a huge fan and it's really cool to have something so relatively simple driving real use cases for Ascend's customers too

Thanks for the PR, and for being upfront wrt AI usage, I confess I'm extremely bear-ish of LLMs coding C++, but since this is somewhat localized I gave it a go reviewing it :-)

np! yeah I figured this was well-scoped enough and would be a cool experiment for me. and this error is really annoying for us, but not annoying enough that we've worked around it. + given how extensively we use DuckLake, always good to contribute back. I will of course not be spamming this repo w/ AI slop and will try to be a good OSS citizen on any future contributions as well. probably should have caught the hash and vector library code 🤷

Absolutely, I think for this case, it is perfectly fine! Thanks for the initiative!

Could we also add inlined tests? I think these should probably be fine since they are executed in the catalog, I'm wondering what should happen if the catalog does not support multiple updates in the same row though, maybe that's not a problem since the catalog would just fail? I guess PG and SQLite can also resolve this.
added an inlining test -- my understanding is the data is de-duplicated before it hits the catalog for the inlined data tables, but I could be wrong on that
Oh yeah, I guess you might be right here!

thanks for the review! again really love all the work you and the DuckDB folks are doing for our industry

Thanks for doing the changes so quickly! I think it looks good! One last request, could you retarget it to v1.4?

@lostmygithubaccount lostmygithubaccount changed the base branch from main to v1.4-andium December 24, 2025 19:06
@lostmygithubaccount
Copy link
Author

done! that did explode the diff and resulted in a conflict; let me know if I should to do anything else

@pdet
Copy link
Collaborator

pdet commented Dec 29, 2025

Hi @lostmygithubaccount, with retarget to v1.4 it's not simply changing the branch, since that would also get all the changes in main to v1.4, not only the ones related to your PR.

You might have to cherry pick your commits.

@lostmygithubaccount
Copy link
Author

makes sense -- done and just squashed the 3 down into 1 commit

@pdet
Copy link
Collaborator

pdet commented Dec 29, 2025

Thanks, LGTM! :-)

@pdet pdet merged commit 4447a56 into duckdb:v1.4-andium Dec 29, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants