-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading larger .csv files broken #87
Comments
Wondering what size of large csv are you testing? Would like to take a look! |
Roughly 2m rows and 200 columns come out at about 500MB. Also I noticed Firefox will load it but be very unresponsive. Edge (and I guess Chrome) will blank page. If working with larger datasets is a target of this project I suggest replacing the pandas df approach with duckdb loading of csvs in memory and having the llm agent write sql queries instead. This would solve the performance issue and also enable a way forward to integrating *sql databases in the future. |
You are right, data formulator is not able to handle data of such size --- partly due to using pandas, another part is that the dataset lives in the frontend most of the time that makes UI rendering unscalable. I have tested datasets with ~10Mb size that can work (but already not super smooth). Some way to integrate with DB from the server side is a good potential approach to address this data size issue. |
Loading large .csv files with 0.1.4 will result in a blank page. There are no error messages, no progress bar, no progress available in cli log as well. (py 3.12.2, clean env, macos).
It seems the only way to get data-formulator working currently is by using example data or copy pasting small amounts.
The text was updated successfully, but these errors were encountered: