Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typed entity registries #2236

Merged
merged 31 commits into from
Jan 29, 2025
Merged

typed entity registries #2236

merged 31 commits into from
Jan 29, 2025

Conversation

rudolfix
Copy link
Collaborator

Description

replaces #2183

also

  • cleansup dict arrow types for delta and iceberg
  • bumps duckdb and iceberg scanner to nightly
  • optimizes creating views in sql_client for filesystem
  • allows passing string source references to pipeline.run on top of passing source instances

Copy link

netlify bot commented Jan 26, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 35b1ac9
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6799336673375c000840d8d0

@rudolfix rudolfix added the ci full run the full load tests on pr label Jan 27, 2025
@rudolfix rudolfix marked this pull request as ready for review January 28, 2025 11:24
@rudolfix rudolfix requested a review from sh-rp January 28, 2025 11:24
@@ -942,6 +942,98 @@ async def r_async():
assert async_list in allowed_results


def test_limit_yield_cleanup() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good test! i had briefly thought about wether the limit influences the incremental state but I think I forgot to add a test. There are a couple of print statements remaining further down here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this test because I realized that we cannot limit pg_replication this way which is doing yield from iterator but full iterators MUST be consumed. So we'd need to add some kind of groupings.

AnySourceFactory = SourceFactory[Any, DltSource]


class SourceReference:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these classes are to different, but isn't there a common mechanism between the source and the destination reference that would warrant a mutual baseclass?

@sh-rp
Copy link
Collaborator

sh-rp commented Jan 28, 2025

Very cool PR! I read all of it but as always with these huge PRs can't promise to understand every part of it, but some semi related or even not related observations are:

  • The filesystem destination is getting a bit crazy (not due to these changes) and I am wondering if we should split it up into the regular filesystem, delta and iceberg. Also from a user / sdk perspective I think we are abusing the schema with the table_format. Intuitively a destination should be delta and then all the stuff getting loaded should be delta tables without any additional hints. Then you should be able to add "delta" as destination instead of filesystem when defining the pipeline.

  • Based on my thinking about having better insight into configuration errors, we could consider having a "DLT__" prefix for all environment variables that are consumed by dlt, this way it would be easier to give the user a report about which settings were found and which ones were actually used and which ones might be typos or outdated configs. Also collissions with settings for other things running in the same environment would be less likely.

Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like some tests are failing. I think there might be even a problem with the duckdb upgrade, i am not sure. num_rows is giving a strange result.

@rudolfix
Copy link
Collaborator Author

@sh-rp I've got all tests fixed I think

@rudolfix rudolfix merged commit 8d8b4c3 into devel Jan 29, 2025
57 of 60 checks passed
@rudolfix rudolfix deleted the feat/typed-entity-registries branch January 29, 2025 10:08
@rudolfix rudolfix restored the feat/typed-entity-registries branch January 29, 2025 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci full run the full load tests on pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants