Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading larger .csv files broken #87

Open
nesdeq opened this issue Feb 13, 2025 · 3 comments
Open

Loading larger .csv files broken #87

nesdeq opened this issue Feb 13, 2025 · 3 comments

Comments

@nesdeq
Copy link

nesdeq commented Feb 13, 2025

Loading large .csv files with 0.1.4 will result in a blank page. There are no error messages, no progress bar, no progress available in cli log as well. (py 3.12.2, clean env, macos).

It seems the only way to get data-formulator working currently is by using example data or copy pasting small amounts.

@Chenglong-MS
Copy link
Collaborator

Wondering what size of large csv are you testing? Would like to take a look!

@nesdeq
Copy link
Author

nesdeq commented Feb 13, 2025

Roughly 2m rows and 200 columns come out at about 500MB. Also I noticed Firefox will load it but be very unresponsive. Edge (and I guess Chrome) will blank page.

If working with larger datasets is a target of this project I suggest replacing the pandas df approach with duckdb loading of csvs in memory and having the llm agent write sql queries instead. This would solve the performance issue and also enable a way forward to integrating *sql databases in the future.

@Chenglong-MS
Copy link
Collaborator

You are right, data formulator is not able to handle data of such size --- partly due to using pandas, another part is that the dataset lives in the frontend most of the time that makes UI rendering unscalable.

I have tested datasets with ~10Mb size that can work (but already not super smooth). Some way to integrate with DB from the server side is a good potential approach to address this data size issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants