-
Notifications
You must be signed in to change notification settings - Fork 10
New arrow C-API implementation #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Merge after: duckdb/duckdb-go-bindings#39 |
|
@taniabogatsch FYI |
|
Thanks! Will have a look once we've released the bindings with the latest changes. 👍 |
|
@taniabogatsch Could you take a look this PR? What the sequence to merge it, should I the arrowmapping version increase in the duckdb package before merge or split pr to 2 - arrowmapping changes and the changes in the duckdb package? |
|
I think your duckdb-go-bindings changes should already be on |
|
We pushed a release today. |
|
Yes, I sow. I mean arrowmapping package in this repo. |
|
ah, I see now! We likely need to bump it in a separate PR - i.e., first one with the new functions in the arrow mapping package, then I can bump that version. 👍 |
|
@taniabogatsch Very strange with tests when the go version 1.24 is specified in the go.mod, the tests are failed. I have created an issue to update the go version to 1.25 in the CI and packages. #63 |
|
@taniabogatsch I revert transitive dependency versions. The failed tests was fixed |
|
@taniabogatsch Will you merge it? |
|
Yes, but I'll need to review it first. 👍 Didn't find the time to take a look yet, will so next week. |
taniabogatsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR / implementation - I saw that its not a huge PR, so I squeezed in a review already ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a breaking change? I.e., can people still use rdr.Record() and rdr.(*arrowStreamReader) in their implementations? Or would we break compatibility with this change as it currently is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, no, there are no breaking changes, the public methods are still the same.
| if err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call rows.Close() here before returning the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still using in the record reader and will be closed by last recordReader.Release. So it should be in non-closed state here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're removing a bunch of exported functionality here - let's instead tag them as deprecated? And add the new functionality on top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just moved the public method to the top of the file. There are no braking changes
arrow.go
Outdated
| } | ||
| if arrowmapping.ArrowScan(a.conn.conn, name, arrowStream) == mapping.StateError { | ||
| release() | ||
| return nil, errors.New("duckdb_arrow_scan") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we turn this into a bit more elaborate error? I.e., that something went wrong in the DuckDB-side? Also, can we trigger this in a test? If not, should we tell people to file a bug report if they run into it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but this part of the code is not changed. There is no method to get the real error in the ArrowScan (I think in the future it should be deprecated).
I want to add the new type table function - ArrowTableUDF, that will use the new C-API, and deprecated this method. With this new UDF and replacement scan we can provide the same functionality to have arrow view.
The main difference will be in the options to delete the view definition after using. Do you know, the C-API will contain the method to deregister replacement scans and UDFs?
| defer mapping.DestroyErrorData(&ed) | ||
| if mapping.ErrorDataHasError(ed) { | ||
| return nil, fmt.Errorf("failed to create arrow schema: %w", | ||
| getDuckDBError(mapping.ErrorDataMessage(ed))) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use func errorDataError(errorData mapping.ErrorData) error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| schema, ed := arrowmapping.NewArrowSchema(arrowOptions, types, names) | ||
| defer mapping.DestroyErrorData(&ed) | ||
| if mapping.ErrorDataHasError(ed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return with an error here, then we also need to destroy the arrow options, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
arrow.go
Outdated
| type duckdbArrowReader struct { | ||
| ctx context.Context | ||
| res mapping.Result | ||
| opts *arrowmapping.ArrowOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a pointer (heap allocation) here - the arrow mapping.ArrowOptions are already wrapping a pointer, which we can just copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| r.currentRec = nil | ||
| defer mapping.DestroyDataChunk(&chunk) | ||
| rec, ed := arrowmapping.DataChunkToArrowArray(*r.opts, r.schema, chunk) | ||
| defer mapping.DestroyErrorData(&ed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same: func errorDataError(errorData mapping.ErrorData) error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
The PR just replaces the internal implementation of the Arrow.QueryContext method and record reader implementation. I also rename the reader to make it simple. |
Rewrite the Arrow.QueryContext method to use new DuckDB Arrow C-API
Closes #5