Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,32 @@
],
"creators": [
{
"name": "Simon J. Greenhill"
"name": "Andrew Kitchen"
},
{
"name": "Christopher Ehret"
},
{
"name": "Shiferaw Assefa"
},
{
"name": "Connie J. Mulligan"
}
],
"contributors": [
{
"name": "Ben Sapirstein",
"type": "Editor"
},
{
"name": "Johann-Mattis List",
"type": "Editor"
},
{
"name": "Simon Greenhill",
"type": "Editor"
}
],
"contributors": [],
"communities": [
{
"identifier": "lexibank"
Expand Down
13 changes: 10 additions & 3 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Contributors

Name | GitHub user | Description | Role
--- | --- | --- | ---
Simon J. Greenhill | @SimonGreenhill | patron | Author
Name | GitHub user | Description | Role
--- | --- | --- | ---
Ben Sapirstein | | orthography profile, integration of original sources | Editor
Johann-Mattis List | @LinguList | maintainer | Editor
Simon Greenhill | | maintainer | Editor
Andrew Kitchen | | data collection | Author
Christopher Ehret | | data collection | Author
Shiferaw Assefa | | data collection | Author
Connie J. Mulligan | | data collection | Author

7 changes: 3 additions & 4 deletions FORMS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,15 @@ The value-to-form processing is divided into two steps, implemented as methods:
- `FormSpec.clean`: Normalizes a form chunk.

These methods use the attributes of a `FormSpec` instance to configure their behaviour.

- `brackets`: `{}`
- `brackets`: `{'(': ')', '[': ']'}`
Pairs of strings that should be recognized as brackets, specified as `dict` mapping opening string to closing string
- `separators`: `/,`
Iterable of single character tokens that should be recognized as word separator
- `missing_data`: `('---',)`
Iterable of strings that are used to mark missing data
- `strip_inside_brackets`: `False`
- `strip_inside_brackets`: `True`
Flag signaling whether to strip content in brackets (**and** strip leading and trailing whitespace)
- `replacements`: `[]`
- `replacements`: `[("u'", 'u'), ("a'", 'a'), ("ɔ'", 'ɔ'), ('7', 'ʔ'), ('?', 'ʔ'), ('9', 'ʕ'), ('q', "k'"), ("'", 'ˤ'), ('ɤ', 'ɣ'), ('ġ', 'ɣ'), ('έ', 'ɛ'), ('έ', 'ɛ'), ('á', 'a'), ('é', 'e'), ('ú', 'u'), ('ĩ', 'i'), ('ź', 'ɮ'), ('š', 'ʃ'), ('x', 'χ'), ('j', 'ʒ'), ('y', 'j'), ('ň', 'ɲ'), ('ẹ', 'e'), ('ṟ', 'ɾ'), ('\uf08d', ''), ('\uf0f0', 'ḥ'), ('ˡ', ':')]`
List of pairs (`source`, `target`) used to replace occurrences of `source` in formswith `target` (before stripping content in brackets)
- `first_form_only`: `False`
Flag signaling whether at most one form should be returned from `split` - effectively ignoring any spelling variants, etc.
Expand Down
6 changes: 5 additions & 1 deletion NOTES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
## Notes on the Comparison with Original Sources (B. Sapirstein)

Unable to identify original sources for Mɛhri, Jibbali, and Harsusi.

Kitchen et al say:
Expand All @@ -9,4 +11,6 @@ Kitchen et al say:
> Biblical Aramaic, ancient Hebrew and Ugaritic) were constructed from previously published
> lexicons (Leslau 1938; Gelb et al. 1956; Sobelman & Harrel 1963; Rabin 1975).

I can't see these three in Leslau, Gelb, Sobelman & Harrel or Rabin or Bender.
I can't see these three in Leslau, Gelb, Sobelman & Harrel or Rabin or Bender.


35 changes: 25 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# CLDF dataset derived from Kitchen et al.'s "Bayesian phylogenetic analysis of Semitic languages" from 2009

[![CLDF validation](https://github.com/lexibank/kitchensemitic/workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/kitchensemitic/actions?query=workflow%3ACLDF-validation)

## How to cite

If you use these data please cite
Expand All @@ -21,6 +19,8 @@ Conceptlists in Concepticon:
- [Kitchen-2009-95](https://concepticon.clld.org/contributions/Kitchen-2009-95)
## Notes

## Notes on the Comparison with Original Sources (B. Sapirstein)

Unable to identify original sources for Mɛhri, Jibbali, and Harsusi.

Kitchen et al say:
Expand All @@ -35,27 +35,42 @@ Kitchen et al say:
I can't see these three in Leslau, Gelb, Sobelman & Harrel or Rabin or Bender.





## Statistics


[![CLDF validation](https://github.com/lexibank/kitchensemitic/workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/kitchensemitic/actions?query=workflow%3ACLDF-validation)
![Glottolog: 100%](https://img.shields.io/badge/Glottolog-100%25-brightgreen.svg "Glottolog: 100%")
![Concepticon: 100%](https://img.shields.io/badge/Concepticon-100%25-brightgreen.svg "Concepticon: 100%")
![Source: 100%](https://img.shields.io/badge/Source-100%25-brightgreen.svg "Source: 100%")
![BIPA: 100%](https://img.shields.io/badge/BIPA-100%25-brightgreen.svg "BIPA: 100%")
![CLTS SoundClass: 100%](https://img.shields.io/badge/CLTS%20SoundClass-100%25-brightgreen.svg "CLTS SoundClass: 100%")

- **Varieties:** 25 (linked to 25 different Glottocodes)
- **Concepts:** 95 (linked to 95 different Concepticon concept sets)
- **Lexemes:** 2,288
- **Lexemes:** 2,468
- **Sources:** 5
- **Synonymy:** 1.02
- **Cognacy:** 2,074 cognates in 663 cognate sets (340 singletons)
- **Cognate Diversity:** 0.26
- **Synonymy:** 1.07
- **Cognacy:** 2,125 cognates in 671 cognate sets (339 singletons)
- **Cognate Diversity:** 0.24
- **Invalid lexemes:** 0
- **Tokens:** 11,173
- **Segments:** 130 (0 BIPA errors, 0 CLTS sound class errors, 130 CLTS modified)
- **Inventory size (avg):** 39.04

# Contributors

Name | GitHub user | Description | Role
--- | --- | --- | ---
Simon J. Greenhill | @SimonGreenhill | patron | Author
Name | GitHub user | Description | Role
--- | --- | --- | ---
Ben Sapirstein | | orthography profile, integration of original sources | Editor
Johann-Mattis List | @LinguList | maintainer | Editor
Simon Greenhill | | maintainer | Editor
Andrew Kitchen | | data collection | Author
Christopher Ehret | | data collection | Author
Shiferaw Assefa | | data collection | Author
Connie J. Mulligan | | data collection | Author




Expand Down
136 changes: 133 additions & 3 deletions TRANSCRIPTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,139 @@
## Segments

| Segment | Occurrence | BIPA | CLTS SoundClass |
|-----------|--------------|--------|-------------------|

(0 rows)
|:----------|-------------:|:-------|:------------------|
| a | 1140 | ✓ | ✓ |
| ɛ | 829 | ✓ | ✓ |
| r | 648 | ✓ | ✓ |
| n | 646 | ✓ | ✓ |
| m | 565 | ✓ | ✓ |
| e | 467 | ✓ | ✓ |
| ɨ | 437 | ✓ | ✓ |
| b | 398 | ✓ | ✓ |
| d | 348 | ✓ | ✓ |
| l | 340 | ✓ | ✓ |
| t | 339 | ✓ | ✓ |
| i | 328 | ✓ | ✓ |
| s | 306 | ✓ | ✓ |
| u | 289 | ✓ | ✓ |
| aː | 282 | ✓ | ✓ |
| kˤ | 256 | ✓ | ✓ |
| ʃ | 207 | ✓ | ✓ |
| ʔ | 206 | ✓ | ✓ |
| k | 203 | ✓ | ✓ |
| j | 199 | ✓ | ✓ |
| ħ | 193 | ✓ | ✓ |
| o | 188 | ✓ | ✓ |
| g | 180 | ✓ | ✓ |
| h | 177 | ✓ | ✓ |
| tˤ | 170 | ✓ | ✓ |
| w | 155 | ✓ | ✓ |
| f | 133 | ✓ | ✓ |
| ʕ | 122 | ✓ | ✓ |
| z | 119 | ✓ | ✓ |
| oː | 93 | ✓ | ✓ |
| sˤ | 90 | ✓ | ✓ |
| eː | 88 | ✓ | ✓ |
| χ | 88 | ✓ | ✓ |
| ɔ | 71 | ✓ | ✓ |
| ə | 67 | ✓ | ✓ |
| ʒ | 63 | ✓ | ✓ |
| cˤ | 62 | ✓ | ✓ |
| iː | 62 | ✓ | ✓ |
| uː | 48 | ✓ | ✓ |
| θ | 47 | ✓ | ✓ |
| + | 41 | ✓ | ✓ |
| ð | 34 | ✓ | ✓ |
| ɲ | 33 | ✓ | ✓ |
| lˠ | 32 | ✓ | ✓ |
| p | 31 | ✓ | ✓ |
| ɣ | 30 | ✓ | ✓ |
| c | 27 | ✓ | ✓ |
| β | 27 | ✓ | ✓ |
| ɬ | 26 | ✓ | ✓ |
| ʊ | 20 | ✓ | ✓ |
| ɔː | 18 | ✓ | ✓ |
| ɛː | 16 | ✓ | ✓ |
| ɮ | 11 | ✓ | ✓ |
| dˤ | 7 | ✓ | ✓ |
| kʷˤ | 7 | ✓ | ✓ |
| ʊː | 7 | ✓ | ✓ |
| dʒ | 6 | ✓ | ✓ |
| mʷ | 6 | ✓ | ✓ |
| ɟˤ | 6 | ✓ | ✓ |
| ɨː | 6 | ✓ | ✓ |
| ɸ | 6 | ✓ | ✓ |
| bː | 5 | ✓ | ✓ |
| nː | 5 | ✓ | ✓ |
| tː | 5 | ✓ | ✓ |
| æ | 5 | ✓ | ✓ |
| ç | 5 | ✓ | ✓ |
| gʷ | 4 | ✓ | ✓ |
| nˤ | 4 | ✓ | ✓ |
| fʷ | 3 | ✓ | ✓ |
| kʲ | 3 | ✓ | ✓ |
| ɟ | 3 | ✓ | ✓ |
| ɪː | 3 | ✓ | ✓ |
| χʲ | 3 | ✓ | ✓ |
| χˤ | 3 | ✓ | ✓ |
| au̯ | 2 | ✓ | ✓ |
| aˤː | 2 | ✓ | ✓ |
| dʲ | 2 | ✓ | ✓ |
| ei̯ | 2 | ✓ | ✓ |
| jˤ | 2 | ✓ | ✓ |
| kː | 2 | ✓ | ✓ |
| lʲ | 2 | ✓ | ✓ |
| lː | 2 | ✓ | ✓ |
| oˤ | 2 | ✓ | ✓ |
| pː | 2 | ✓ | ✓ |
| rː | 2 | ✓ | ✓ |
| zː | 2 | ✓ | ✓ |
| ó/o | 2 | ✓ | ✓ |
| ɛˤ | 2 | ✓ | ✓ |
| ɛ̃ | 2 | ✓ | ✓ |
| ɪ | 2 | ✓ | ✓ |
| ɮˤ | 2 | ✓ | ✓ |
| ʊˤ | 2 | ✓ | ✓ |
| χʷˤ | 2 | ✓ | ✓ |
| ẽ | 2 | ✓ | ✓ |
| aːi̯ | 1 | ✓ | ✓ |
| dðˤ | 1 | ✓ | ✓ |
| e̤ | 1 | ✓ | ✓ |
| gˤ | 1 | ✓ | ✓ |
| hʷ | 1 | ✓ | ✓ |
| iːi̯ | 1 | ✓ | ✓ |
| iˤː | 1 | ✓ | ✓ |
| jː | 1 | ✓ | ✓ |
| kʷ | 1 | ✓ | ✓ |
| mː | 1 | ✓ | ✓ |
| nʲ | 1 | ✓ | ✓ |
| ou̯ | 1 | ✓ | ✓ |
| oˤː | 1 | ✓ | ✓ |
| rˤ | 1 | ✓ | ✓ |
| sʲ | 1 | ✓ | ✓ |
| sʷ | 1 | ✓ | ✓ |
| tʰ | 1 | ✓ | ✓ |
| tʲ | 1 | ✓ | ✓ |
| uu̯ | 1 | ✓ | ✓ |
| v | 1 | ✓ | ✓ |
| wʲ | 1 | ✓ | ✓ |
| wː | 1 | ✓ | ✓ |
| ã | 1 | ✓ | ✓ |
| çˀ | 1 | ✓ | ✓ |
| í/i | 1 | ✓ | ✓ |
| ðˤ | 1 | ✓ | ✓ |
| ɔu̯ | 1 | ✓ | ✓ |
| əˤ | 1 | ✓ | ✓ |
| ɛi̯ | 1 | ✓ | ✓ |
| ɨˤ | 1 | ✓ | ✓ |
| ɵ | 1 | ✓ | ✓ |
| ʃː | 1 | ✓ | ✓ |
| ʃˤ | 1 | ✓ | ✓ |
| ʊu̯ | 1 | ✓ | ✓ |
| ʔˤ | 1 | ✓ | ✓ |
| χʷ | 1 | ✓ | ✓ |

(130 rows)



Expand Down
Loading