Loading larger .csv files broken #87

nesdeq · 2025-02-13T08:33:30Z

Loading large .csv files with 0.1.4 will result in a blank page. There are no error messages, no progress bar, no progress available in cli log as well. (py 3.12.2, clean env, macos).

It seems the only way to get data-formulator working currently is by using example data or copy pasting small amounts.

Chenglong-MS · 2025-02-13T16:58:46Z

Wondering what size of large csv are you testing? Would like to take a look!

nesdeq · 2025-02-13T22:30:03Z

Roughly 2m rows and 200 columns come out at about 500MB. Also I noticed Firefox will load it but be very unresponsive. Edge (and I guess Chrome) will blank page.

If working with larger datasets is a target of this project I suggest replacing the pandas df approach with duckdb loading of csvs in memory and having the llm agent write sql queries instead. This would solve the performance issue and also enable a way forward to integrating *sql databases in the future.

Chenglong-MS · 2025-02-13T23:01:14Z

You are right, data formulator is not able to handle data of such size --- partly due to using pandas, another part is that the dataset lives in the frontend most of the time that makes UI rendering unscalable.

I have tested datasets with ~10Mb size that can work (but already not super smooth). Some way to integrate with DB from the server side is a good potential approach to address this data size issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading larger .csv files broken #87

Loading larger .csv files broken #87

nesdeq commented Feb 13, 2025

Chenglong-MS commented Feb 13, 2025

nesdeq commented Feb 13, 2025 •

edited

Loading

Chenglong-MS commented Feb 13, 2025

Loading larger .csv files broken #87

Loading larger .csv files broken #87

Comments

nesdeq commented Feb 13, 2025

Chenglong-MS commented Feb 13, 2025

nesdeq commented Feb 13, 2025 • edited Loading

Chenglong-MS commented Feb 13, 2025

nesdeq commented Feb 13, 2025 •

edited

Loading