-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I just found out, by coincidence, that the data in WOLD at least, and maybe some more datasets, have a problem with the OR-concepts in Concepticon. WOLD has "he/she/it" as a concept and they list ALL, also different entries, under this label. This leads to grotesk problems, as in Dutch:
| Language | Concept | Word |
|---|---|---|
| Dutch | he/she/it | hij |
| Dutch | he/she/it | zij |
| Dutch | he/she/it | het |
In CLICS4, where we unmerge these three pronouns, we have then:
1-he-1,1,he,het,ə t,,wold,het,wold-Dutch-2-93-1,,,,,HE OR SHE OR IT
1-she-1,1,she,het,ə t,,wold,het,wold-Dutch-2-93-1,,,,,HE OR SHE OR IT
1-it-1,1,it,het,ə t,,wold,het,wold-Dutch-2-93-1,,,,,HE OR SHE OR IT
1-he-2,1,he,hij,h ɛi,,wold,hij,wold-Dutch-2-93-2,,,,,HE OR SHE OR IT
1-she-2,1,she,hij,h ɛi,,wold,hij,wold-Dutch-2-93-2,,,,,HE OR SHE OR IT
1-it-2,1,it,hij,h ɛi,,wold,hij,wold-Dutch-2-93-2,,,,,HE OR SHE OR IT
1-he-3,1,he,zij,z ɛi,,wold,zij (1),wold-Dutch-2-93-3,,,,,HE OR SHE OR IT
1-she-3,1,she,zij,z ɛi,,wold,zij (1),wold-Dutch-2-93-3,,,,,HE OR SHE OR IT
1-it-3,1,it,zij,z ɛi,,wold,zij (1),wold-Dutch-2-93-3,,,,,HE OR SHE OR IT
This explains why we find in 226 families the colexification of "HE OR SHE". This is an artifact of our approach and also an artifact of badly resolved data.
This should be addressed actively. I see the following solutions:
- identify the problems in the original data and re-code them (e.g., he/she/it must be resolved in all of WOLD)
- exclude unclear concepts from the data, e.g., he/she/it should be removed
- exclude some concepts from unmerging, as he/she/it in this case
I guess, @AnnikaTjuka, @xrotwang and @chrzyki, we must in any case try and check the data more thoroughly.
What helps is to look at those colexifications that occur most frequently.