Skip to content

Problems with unmerged concepts in some datasets #11

@LinguList

Description

@LinguList

I just found out, by coincidence, that the data in WOLD at least, and maybe some more datasets, have a problem with the OR-concepts in Concepticon. WOLD has "he/she/it" as a concept and they list ALL, also different entries, under this label. This leads to grotesk problems, as in Dutch:

Language Concept Word
Dutch he/she/it hij
Dutch he/she/it zij
Dutch he/she/it het

In CLICS4, where we unmerge these three pronouns, we have then:

1-he-1,1,he,het,ə t,,wold,het,wold-Dutch-2-93-1,,,,,HE OR SHE OR IT
1-she-1,1,she,het,ə t,,wold,het,wold-Dutch-2-93-1,,,,,HE OR SHE OR IT
1-it-1,1,it,het,ə t,,wold,het,wold-Dutch-2-93-1,,,,,HE OR SHE OR IT
1-he-2,1,he,hij,h ɛi,,wold,hij,wold-Dutch-2-93-2,,,,,HE OR SHE OR IT
1-she-2,1,she,hij,h ɛi,,wold,hij,wold-Dutch-2-93-2,,,,,HE OR SHE OR IT
1-it-2,1,it,hij,h ɛi,,wold,hij,wold-Dutch-2-93-2,,,,,HE OR SHE OR IT
1-he-3,1,he,zij,z ɛi,,wold,zij (1),wold-Dutch-2-93-3,,,,,HE OR SHE OR IT
1-she-3,1,she,zij,z ɛi,,wold,zij (1),wold-Dutch-2-93-3,,,,,HE OR SHE OR IT
1-it-3,1,it,zij,z ɛi,,wold,zij (1),wold-Dutch-2-93-3,,,,,HE OR SHE OR IT

This explains why we find in 226 families the colexification of "HE OR SHE". This is an artifact of our approach and also an artifact of badly resolved data.

This should be addressed actively. I see the following solutions:

  1. identify the problems in the original data and re-code them (e.g., he/she/it must be resolved in all of WOLD)
  2. exclude unclear concepts from the data, e.g., he/she/it should be removed
  3. exclude some concepts from unmerging, as he/she/it in this case

I guess, @AnnikaTjuka, @xrotwang and @chrzyki, we must in any case try and check the data more thoroughly.
What helps is to look at those colexifications that occur most frequently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions