Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion FORMS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ The value-to-form processing is divided into two steps, implemented as methods:
- `FormSpec.clean`: Normalizes a form chunk.

These methods use the attributes of a `FormSpec` instance to configure their behaviour.

- `brackets`: `{'(': ')', '[': ']'}`
Pairs of strings that should be recognized as brackets, specified as `dict` mapping opening string to closing string
- `separators`: `/,`
Expand All @@ -16,7 +17,7 @@ These methods use the attributes of a `FormSpec` instance to configure their beh
Iterable of strings that are used to mark missing data
- `strip_inside_brackets`: `True`
Flag signaling whether to strip content in brackets (**and** strip leading and trailing whitespace)
- `replacements`: `[("u'", 'u'), ("a'", 'a'), ("ɔ'", 'ɔ'), ('7', 'ʔ'), ('?', 'ʔ'), ('9', 'ʕ'), ('q', "k'"), ("'", 'ˤ'), ('ɤ', 'ɣ'), ('ġ', 'ɣ'), ('έ', 'ɛ'), ('έ', 'ɛ'), ('á', 'a'), ('é', 'e'), ('ú', 'u'), ('ĩ', 'i'), ('ź', 'ɮ'), ('š', 'ʃ'), ('x', 'χ'), ('j', 'ʒ'), ('y', 'j'), ('ň', 'ɲ'), ('', 'e'), ('', 'ɾ'), ('\uf08d', ''), ('\uf0f0', ''), ('ˡ', ':')]`
- `replacements`: `[("u'", 'u'), ("a'", 'a'), ("ɔ'", 'ɔ'), ('7', 'ʔ'), ('?', 'ʔ'), ('9', 'ʕ'), ('q', "k'"), ("'", 'ˤ'), ('ɤ', 'ɣ'), ('ġ', 'ɣ'), ('έ', 'ɛ'), ('έ', 'ɛ'), ('á', 'a'), ('é', 'e'), ('ó', 'o'), ('ú', 'u'), ('í', 'i'), ('ĩ', 'i'), ('ź', 'ɮ'), ('', 'ʒ'), ('ž', 'ʒ'), ('', 'ðˤ'), ('š', 'ʃ'), ('x', 'χ'), ('j', 'ʒ'), ('y', 'j'), ('ň', 'ɲ')]`
List of pairs (`source`, `target`) used to replace occurrences of `source` in formswith `target` (before stripping content in brackets)
- `first_form_only`: `False`
Flag signaling whether at most one form should be returned from `split` - effectively ignoring any spelling variants, etc.
Expand Down
21 changes: 12 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# CLDF dataset derived from Kitchen et al.'s "Bayesian phylogenetic analysis of Semitic languages" from 2009

[![CLDF validation](https://github.com/lexibank/kitchensemitic/workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/kitchensemitic/actions?query=workflow%3ACLDF-validation)

## How to cite

If you use these data please cite
Expand Down Expand Up @@ -41,23 +43,24 @@ I can't see these three in Leslau, Gelb, Sobelman & Harrel or Rabin or Bender.
## Statistics


[![CLDF validation](https://github.com/lexibank/kitchensemitic/workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/kitchensemitic/actions?query=workflow%3ACLDF-validation)
![Glottolog: 100%](https://img.shields.io/badge/Glottolog-100%25-brightgreen.svg "Glottolog: 100%")
![Concepticon: 100%](https://img.shields.io/badge/Concepticon-100%25-brightgreen.svg "Concepticon: 100%")
![Source: 100%](https://img.shields.io/badge/Source-100%25-brightgreen.svg "Source: 100%")
![BIPA: 100%](https://img.shields.io/badge/BIPA-100%25-brightgreen.svg "BIPA: 100%")
![CLTS SoundClass: 100%](https://img.shields.io/badge/CLTS%20SoundClass-100%25-brightgreen.svg "CLTS SoundClass: 100%")

- **Varieties:** 25 (linked to 25 different Glottocodes)
- **Concepts:** 95 (linked to 95 different Concepticon concept sets)
- **Lexemes:** 2,468
- **Sources:** 5
- **Synonymy:** 1.07
- **Cognacy:** 2,125 cognates in 671 cognate sets (339 singletons)
- **Cognate Diversity:** 0.24
- **Concepts:** 97 (linked to 97 different Concepticon concept sets)
- **Lexemes:** 2,396
- **Sources:** 8
- **Synonymy:** 1.04
- **Cognacy:** 2,150 cognates in 665 cognate sets (329 singletons)
- **Cognate Diversity:** 0.25
- **Invalid lexemes:** 0
- **Tokens:** 11,173
- **Segments:** 130 (0 BIPA errors, 0 CLTS sound class errors, 130 CLTS modified)
- **Inventory size (avg):** 39.04
- **Tokens:** 10,911
- **Segments:** 110 (0 BIPA errors, 0 CLTS sound class errors, 110 CLTS modified)
- **Inventory size (avg):** 37.20

# Contributors

Expand Down
148 changes: 64 additions & 84 deletions TRANSCRIPTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,113 +5,99 @@

| Segment | Occurrence | BIPA | CLTS SoundClass |
|:----------|-------------:|:-------|:------------------|
| a | 1140 | ✓ | ✓ |
| ɛ | 829 | ✓ | ✓ |
| r | 648 | ✓ | ✓ |
| n | 646 | ✓ | ✓ |
| m | 565 | ✓ | ✓ |
| e | 467 | ✓ | ✓ |
| ɨ | 437 | ✓ | ✓ |
| b | 398 | ✓ | ✓ |
| d | 348 | ✓ | ✓ |
| l | 340 | ✓ | ✓ |
| a | 1104 | ✓ | ✓ |
| ɛ | 831 | ✓ | ✓ |
| n | 645 | ✓ | ✓ |
| r | 637 | ✓ | ✓ |
| m | 551 | ✓ | ✓ |
| ɨ | 454 | ✓ | ✓ |
| e | 429 | ✓ | ✓ |
| b | 395 | ✓ | ✓ |
| t | 339 | ✓ | ✓ |
| i | 328 | ✓ | ✓ |
| s | 306 | ✓ | ✓ |
| u | 289 | ✓ | ✓ |
| aː | 282 | ✓ | ✓ |
| kˤ | 256 | ✓ | ✓ |
| ʃ | 207 | ✓ | ✓ |
| ʔ | 206 | ✓ | ✓ |
| k | 203 | ✓ | ✓ |
| j | 199 | ✓ | ✓ |
| ħ | 193 | ✓ | ✓ |
| o | 188 | ✓ | ✓ |
| g | 180 | ✓ | ✓ |
| h | 177 | ✓ | ✓ |
| tˤ | 170 | ✓ | ✓ |
| w | 155 | ✓ | ✓ |
| f | 133 | ✓ | ✓ |
| ʕ | 122 | ✓ | ✓ |
| z | 119 | ✓ | ✓ |
| oː | 93 | ✓ | ✓ |
| sˤ | 90 | ✓ | ✓ |
| d | 332 | ✓ | ✓ |
| l | 326 | ✓ | ✓ |
| i | 321 | ✓ | ✓ |
| s | 302 | ✓ | ✓ |
| u | 290 | ✓ | ✓ |
| aː | 278 | ✓ | ✓ |
| k | 202 | ✓ | ✓ |
| ʔ | 201 | ✓ | ✓ |
| j | 196 | ✓ | ✓ |
| ʃ | 194 | ✓ | ✓ |
| o | 180 | ✓ | ✓ |
| g | 175 | ✓ | ✓ |
| ħ | 170 | ✓ | ✓ |
| h | 168 | ✓ | ✓ |
| w | 156 | ✓ | ✓ |
| f | 147 | ✓ | ✓ |
| kˤ | 140 | ✓ | ✓ |
| z | 122 | ✓ | ✓ |
| ʕ | 115 | ✓ | ✓ |
| kʼ | 106 | ✓ | ✓ |
| tˤ | 94 | ✓ | ✓ |
| eː | 88 | ✓ | ✓ |
| χ | 88 | ✓ | ✓ |
| ɔ | 71 | ✓ | ✓ |
| oː | 88 | ✓ | ✓ |
| χ | 82 | ✓ | ✓ |
| tʼ | 71 | ✓ | ✓ |
| sˤ | 70 | ✓ | ✓ |
| ɔ | 70 | ✓ | ✓ |
| ə | 67 | ✓ | ✓ |
| ʒ | 63 | ✓ | ✓ |
| cˤ | 62 | ✓ | ✓ |
| iː | 62 | ✓ | ✓ |
| ʒ | 67 | ✓ | ✓ |
| iː | 63 | ✓ | ✓ |
| tʃʼ | 63 | ✓ | ✓ |
| ɬ | 51 | ✓ | ✓ |
| uː | 48 | ✓ | ✓ |
| θ | 47 | ✓ | ✓ |
| + | 41 | ✓ | ✓ |
| ð | 34 | ✓ | ✓ |
| θ | 41 | ✓ | ✓ |
| p | 38 | ✓ | ✓ |
| ɲ | 33 | ✓ | ✓ |
| lˠ | 32 | ✓ | ✓ |
| p | 31 | ✓ | ✓ |
| ɣ | 30 | ✓ | ✓ |
| c | 27 | ✓ | ✓ |
| β | 27 | ✓ | ✓ |
| ɬ | 26 | ✓ | ✓ |
| ʊ | 20 | ✓ | ✓ |
| ð | 32 | ✓ | ✓ |
| tʃ | 27 | ✓ | ✓ |
| β | 26 | ✓ | ✓ |
| ɣ | 25 | ✓ | ✓ |
| ʊ | 24 | ✓ | ✓ |
| ɔː | 18 | ✓ | ✓ |
| ɛː | 16 | ✓ | ✓ |
| ɮ | 11 | ✓ | ✓ |
| ɛː | 17 | ✓ | ✓ |
| tsʼ | 14 | ✓ | ✓ |
| sʼ | 9 | ✓ | ✓ |
| ɮ | 9 | ✓ | ✓ |
| dˤ | 7 | ✓ | ✓ |
| kʷˤ | 7 | ✓ | ✓ |
| ɟˤ | 7 | ✓ | ✓ |
| ɸ | 7 | ✓ | ✓ |
| ʊː | 7 | ✓ | ✓ |
| dʒ | 6 | ✓ | ✓ |
| mʷ | 6 | ✓ | ✓ |
| ɟˤ | 6 | ✓ | ✓ |
| ɨː | 6 | ✓ | ✓ |
| ɸ | 6 | ✓ | ✓ |
| bː | 5 | ✓ | ✓ |
| nː | 5 | ✓ | ✓ |
| tː | 5 | ✓ | ✓ |
| kʷʼ | 5 | ✓ | ✓ |
| æ | 5 | ✓ | ✓ |
| ç | 5 | ✓ | ✓ |
| gʷ | 4 | ✓ | ✓ |
| nˤ | 4 | ✓ | ✓ |
| nʼ | 4 | ✓ | ✓ |
| ç | 4 | ✓ | ✓ |
| fʷ | 3 | ✓ | ✓ |
| kʲ | 3 | ✓ | ✓ |
| ðˤ | 3 | ✓ | ✓ |
| ɟ | 3 | ✓ | ✓ |
| ɪː | 3 | ✓ | ✓ |
| χʲ | 3 | ✓ | ✓ |
| χˤ | 3 | ✓ | ✓ |
| χʼ | 3 | ✓ | ✓ |
| au̯ | 2 | ✓ | ✓ |
| aˤː | 2 | ✓ | ✓ |
| dʲ | 2 | ✓ | ✓ |
| ei̯ | 2 | ✓ | ✓ |
| | 2 | ✓ | ✓ |
| | 2 | ✓ | ✓ |
| | 2 | ✓ | ✓ |
| kʷˤ | 2 | ✓ | ✓ |
| lʲ | 2 | ✓ | ✓ |
| lː | 2 | ✓ | ✓ |
| oˤ | 2 | ✓ | ✓ |
| pː | 2 | ✓ | ✓ |
| rː | 2 | ✓ | ✓ |
| zː | 2 | ✓ | ✓ |
| ó/o | 2 | ✓ | ✓ |
| ɛˤ | 2 | ✓ | ✓ |
| ɛ̃ | 2 | ✓ | ✓ |
| ɪ | 2 | ✓ | ✓ |
| ɮˤ | 2 | ✓ | ✓ |
| ʊˤ | 2 | ✓ | ✓ |
| χʷˤ | 2 | ✓ | ✓ |
| ẽ | 2 | ✓ | ✓ |
| aːi̯ | 1 | ✓ | ✓ |
| dðˤ | 1 | ✓ | ✓ |
| | 1 | ✓ | ✓ |
| e̤ | 1 | ✓ | ✓ |
| | 1 | ✓ | ✓ |
| | 1 | ✓ | ✓ |
| hʷ | 1 | ✓ | ✓ |
| iːi̯ | 1 | ✓ | ✓ |
| iˤː | 1 | ✓ | ✓ |
| jː | 1 | ✓ | ✓ |
| iːi | 1 | ✓ | ✓ |
| kʷ | 1 | ✓ | ✓ |
| mː | 1 | ✓ | ✓ |
| nʲ | 1 | ✓ | ✓ |
| ou̯ | 1 | ✓ | ✓ |
| oˤː | 1 | ✓ | ✓ |
| rˤ | 1 | ✓ | ✓ |
| sʲ | 1 | ✓ | ✓ |
| sʷ | 1 | ✓ | ✓ |
Expand All @@ -120,23 +106,17 @@
| uu̯ | 1 | ✓ | ✓ |
| v | 1 | ✓ | ✓ |
| wʲ | 1 | ✓ | ✓ |
| wː | 1 | ✓ | ✓ |
| ã | 1 | ✓ | ✓ |
| çˀ | 1 | ✓ | ✓ |
| í/i | 1 | ✓ | ✓ |
| ðˤ | 1 | ✓ | ✓ |
| ɔu̯ | 1 | ✓ | ✓ |
| əˤ | 1 | ✓ | ✓ |
| ɛi̯ | 1 | ✓ | ✓ |
| ɨˤ | 1 | ✓ | ✓ |
| ɵ | 1 | ✓ | ✓ |
| ʃː | 1 | ✓ | ✓ |
| ʃˤ | 1 | ✓ | ✓ |
| ʊu̯ | 1 | ✓ | ✓ |
| ʔˤ | 1 | ✓ | ✓ |
| χʷ | 1 | ✓ | ✓ |
| χʷʼ | 1 | ✓ | ✓ |
| χʷˤ | 1 | ✓ | ✓ |

(130 rows)
(110 rows)



Expand Down
Loading