-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I updated the meta database the other day and the clics4 dataset triggered an error. It seems that python's built-in csv parser rejects the file due to a cell being too large:
import csv
import io
import zipfile
with zipfile.ZipFile('cldf/colexifications.csv.zip') as zf:
with zf.open('colexifications.csv') as raw_f:
unicode_f = io.TextIOWrapper(raw_f, encoding='utf-8')
rdr = csv.reader(unicode_f)
rows = list(rdr)
…throws:
_csv.Error: field larger than field limit (131072)
Consuming programs can avoid the problem by extending python's field size limit, e.g.:
csv.field_size_limit(256 * 1024)
I did that on my end but to be completely honest, the fact that an individual data point can't fit into 128 KiB is kind of a ‘data smell’. Maybe those Forms, Varieties, and Languages columns should be put into separate tables rather than being in-lined arrays within a table cell.
(As a side note: this also means that you currently can't run cldf validate on the data. At least not from the command-line – the dataset can still be validated from within Python after expanding the field size limit.)