-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change DB dependencies to allow async #110
Conversation
Bar a couple of uncontroversial changes (the log name, for example) I would have approved 99% of this -- but I don't like passing two arguments to Also worth noting is that on my Intel machine tests now take 20-30 seconds longer, presumably from the connection pooling change. I think this is pretty significant, and might be palpably felt server-side. I'm raising a ticket to look into it. |
Added connection pooling. It didn't help with test speed, but we have connection pooling! |
Added a few comments as I'm thinking about these issues as part of my L&D today. |
@leo-mazzone I propose separating the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy
Context
Changes proposed in this pull request
Guidance to review
I agree, the way I'm passing both a SQLAlchemy engine as well as an adbc_connection to sql_to_df is ugly, but it also... works? The thing is that we're using SQLAlchemy for preparing statements, even when we don't use it for running those statements.
There is an alternative to this that I've been toying with: continuing to use polars.read_database_uri, using the ADBC mode instead of the connectorx mode. It seems to be suggested by the ADBC docs as well. However, it negates batching, and I couldn't find proof one would be faster than another. Maybe we should do an empirical test (later?).
Another thing I'm not doing is connection pooling - see here. I am creating a new connection every time I need it, which probably will take about half a second. It feels like it will be comparatively much shorter than the query if calling sql_to_df.
Relevant links
Checklist: