Python can't read `colexifications.csv` (with default settings)

I updated the [meta database](https://github.com/cldf-datasets/cldf_meta) the other day and the clics4 dataset triggered an error.  It seems that python's built-in csv parser rejects the file due to a cell being too large:

    import csv
    import io
    import zipfile
    with zipfile.ZipFile('cldf/colexifications.csv.zip') as zf:
        with zf.open('colexifications.csv') as raw_f:
            unicode_f = io.TextIOWrapper(raw_f, encoding='utf-8')
            rdr = csv.reader(unicode_f)
            rows = list(rdr)

…throws:

    _csv.Error: field larger than field limit (131072)

Consuming programs can avoid the problem by extending python's field size limit, e.g.:

    csv.field_size_limit(256 * 1024)

I did that on my end but to be completely honest, the fact that an individual data point can't fit into 128 KiB is kind of a ‘data smell’.  Maybe those *Forms*, *Varieties*, and *Languages* columns should be put into separate tables rather than being in-lined arrays within a table cell.

(As a side note: this also means that you currently can't run *cldf validate* on the data.  At least not from the command-line – the dataset can still be validated from within Python after expanding the field size limit.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python can't read `colexifications.csv` (with default settings) #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python can't read colexifications.csv (with default settings) #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Python can't read `colexifications.csv` (with default settings) #12