-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp: pygwalker data explorer in streamlit #2216
base: devel
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
def pygwalker_renderer(pipeline_name: str, table_name: str) -> StreamlitRenderer: | ||
pipeline = dlt.attach(pipeline_name) | ||
dataset = pipeline.dataset() | ||
df = dataset[table_name].df() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will load the full table into memory, is this desired? on a large table this will take a long time and at some point kill the host. is there some other way we could do this?
from dlt.helpers.streamlit_app.utils import render_with_pipeline | ||
|
||
|
||
def show(pipeline: dlt.Pipeline) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have at least simple tests for this page to make sure it renders.
As discussed in the meeting today, we will keep this as a reference, alternatively if @zilto has the time, we can add a test and also derive the explorable view from an sql query which is set to a limit of 2000 items per default or something like that. |
Description
Add a page to the streamlit dashboard available via
dlt pipeline NAME show
. It uses the recentPipeline.dataset
interface to pass a dataframe topygwalker
.2025-01-14.16-15-45.mp4
Goals
When first loading a pipeline, I want to know the properties of the data (null, distribution, outliers). It's reducing the friction of having to create temporary notebooks and install
ipykernel/jupyter
dependenciesImplementation
pygwalker is an open-source Python library that produces an interactive widget to explore dataframes (works in Jupyter, VSCode, Streamlit, Marimo) based on Vega.
Integration surface area is small; only assumes that you can pull the data locally via the
Pipeline.dataset
feature.Pipeline.dataset
to query data or pass ansqlalchemy
connection topygwalker
to handle data loading.duckdb
for efficient processing of dataTODO