exp: pygwalker data explorer in streamlit #2216

zilto · 2025-01-14T21:11:18Z

Description

Add a page to the streamlit dashboard available via dlt pipeline NAME show. It uses the recent Pipeline.dataset interface to pass a dataframe to pygwalker.

2025-01-14.16-15-45.mp4

Goals

When first loading a pipeline, I want to know the properties of the data (null, distribution, outliers). It's reducing the friction of having to create temporary notebooks and install ipykernel/jupyter dependencies

Implementation

pygwalker is an open-source Python library that produces an interactive widget to explore dataframes (works in Jupyter, VSCode, Streamlit, Marimo) based on Vega.
Integration surface area is small; only assumes that you can pull the data locally via the Pipeline.dataset feature.
- Deeper integration could use Pipeline.dataset to query data or pass an sqlalchemy connection to pygwalker to handle data loading.
- pygwalker can use duckdb for efficient processing of data

TODO

add useful message for optional/missing dependencies

netlify · 2025-01-14T21:11:36Z

✅ Deploy Preview for dlt-hub-docs ready!

Name	Link
🔨 Latest commit	`d748340`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/6786d2f89dfa550008c73f7f
😎 Deploy Preview	https://deploy-preview-2216--dlt-hub-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

sh-rp · 2025-01-21T11:54:12Z

dlt/helpers/streamlit_app/blocks/explorer.py

+def pygwalker_renderer(pipeline_name: str, table_name: str) -> StreamlitRenderer:
+    pipeline = dlt.attach(pipeline_name)
+    dataset = pipeline.dataset()
+    df = dataset[table_name].df()


this will load the full table into memory, is this desired? on a large table this will take a long time and at some point kill the host. is there some other way we could do this?

sh-rp · 2025-01-21T11:54:41Z

dlt/helpers/streamlit_app/pages/explorer.py

+from dlt.helpers.streamlit_app.utils import render_with_pipeline
+
+
+def show(pipeline: dlt.Pipeline) -> None:


we should have at least simple tests for this page to make sure it renders.

sh-rp · 2025-01-27T16:23:14Z

As discussed in the meeting today, we will keep this as a reference, alternatively if @zilto has the time, we can add a test and also derive the explorable view from an sql query which is set to a limit of 2000 items per default or something like that.

added pygwalker explorer

d748340

rudolfix assigned zilto Jan 20, 2025

rudolfix requested a review from sh-rp January 20, 2025 11:19

sh-rp reviewed Jan 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp: pygwalker data explorer in streamlit #2216

exp: pygwalker data explorer in streamlit #2216

zilto commented Jan 14, 2025 •

edited

Loading

netlify bot commented Jan 14, 2025 •

edited

Loading

sh-rp Jan 21, 2025

sh-rp Jan 21, 2025

sh-rp commented Jan 27, 2025

		from dlt.helpers.streamlit_app.utils import render_with_pipeline


		def show(pipeline: dlt.Pipeline) -> None:

exp: pygwalker data explorer in streamlit #2216

Are you sure you want to change the base?

exp: pygwalker data explorer in streamlit #2216

Conversation

zilto commented Jan 14, 2025 • edited Loading

Description

Goals

Implementation

TODO

netlify bot commented Jan 14, 2025 • edited Loading

✅ Deploy Preview for dlt-hub-docs ready!

sh-rp Jan 21, 2025

Choose a reason for hiding this comment

sh-rp Jan 21, 2025

Choose a reason for hiding this comment

sh-rp commented Jan 27, 2025

zilto commented Jan 14, 2025 •

edited

Loading

netlify bot commented Jan 14, 2025 •

edited

Loading