Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,14 @@ pip install -r requirements.txt
make wordlist
```

In addition to yielding the word list file (`npl_data.tsv`), the script also
performs an automatic recognition of sound correspondence patterns,
the result of which is stored in the file `npl_patterns.tsv`.
#### Reproduce analysis
In addition to yielding the word list file (`npl_data.tsv`), the Makefile
also runs a script that performs the multiple sequence alignment and an
automatic recognition of sound correspondence patterns. To do so, please
type the following:

```shell
make analysis
```
The result of both processes are stored in the files `npl_msaligned`
and `npl_patterns.tsv`.
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,17 @@ pip install -r requirements.txt
make wordlist
```

In addition to yielding the word list file (`npl_data.tsv`), the script also
performs an automatic recognition of sound correspondence patterns,
the result of which is stored in the file `npl_patterns.tsv`.
#### Reproduce analysis
In addition to yielding the word list file (`npl_data.tsv`), the Makefile
also runs a script that performs the multiple sequence alignment and an
automatic recognition of sound correspondence patterns. To do so, please
type the following:

```shell
make analysis
```
The result of both processes are stored in the files `npl_msaligned`
and `npl_patterns.tsv`.


## Statistics
Expand Down
5 changes: 4 additions & 1 deletion analysis/Makefile
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
wordlist:
edictor wordlist --name=npl_data --data=../cldf/cldf-metadata.json --preprocessing=realign.py --addon="language_subgroup:subgroup","alignment:alignment","cognacy:cogid","partial_cognacy:cogids","source:source"
edictor wordlist --name=npl_data --data=../cldf/cldf-metadata.json --preprocessing=preprocessing.py --addon="language_subgroup:subgroup","alignment:alignment","cognacy:cogid","partial_cognacy:cogids","source:source"

analysis:
edictor wordlist --name=npl_msaligned --data=../cldf/cldf-metadata.json --preprocessing=realign.py --addon="language_subgroup:subgroup","alignment:alignment","cognacy:cogid","partial_cognacy:cogids","source:source"
6,275 changes: 4,987 additions & 1,288 deletions analysis/npl_data.tsv

Large diffs are not rendered by default.

1,288 changes: 1,288 additions & 0 deletions analysis/npl_msaligned.tsv

Large diffs are not rendered by default.

18 changes: 18 additions & 0 deletions analysis/preprocessing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
from lingpy import Alignments
from lingpy.compare.partial import Partial

def run(wordlist):
D = {0: wordlist.columns}
for idx in wordlist:
D[idx] = [wordlist[idx, h] for h in D[0]]

lex = Partial(D, segments='tokens', check=False)

lex = Alignments(lex, ref="cogids")
lex.align(ref='cogids')

D = {0: wordlist.columns+["alignment"]}
for idx in lex:
D[idx] = [lex[idx, h] for h in D[0]]

return lex
2 changes: 1 addition & 1 deletion analysis/realign.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
the sound correspondence pattern identification with LingRex (List 2018).
"""
import re
from lingpy import Alignments, LexStat, Wordlist
from lingpy import Alignments,Wordlist
from lingpy.compare.partial import Partial
from lingrex.copar import CoPaR
from lingpy.read.qlc import reduce_alignment
Expand Down
2 changes: 1 addition & 1 deletion cldf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ property | value
[dc:format](http://purl.org/dc/terms/format) | <ol><li>http://concepticon.clld.org/contributions/Swadesh-1952-200</li></ol>
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | [email protected]:lexibank/northperulex
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="[email protected]:lexibank/northperulex/tree/1e3f230a">[email protected]:lexibank/northperulex 1e3f230a</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v5.2.1">Glottolog v5.2.1</a></li><li><a href="https://github.com/concepticon/concepticon-data/tree/v3.4.0">Concepticon v3.4.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="[email protected]:lexibank/northperulex/tree/00241a1d">[email protected]:lexibank/northperulex 00241a1d</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v5.2.1">Glottolog v5.2.1</a></li><li><a href="https://github.com/concepticon/concepticon-data/tree/v3.4.0">Concepticon v3.4.0</a></li><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">CLTS v2.3.0</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>lingpy-rcParams</strong>: <a href="./lingpy-rcParams.json">lingpy-rcParams.json</a></li><li><strong>python</strong>: 3.9.6</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | northperulex
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
Expand Down
2 changes: 1 addition & 1 deletion cldf/cldf-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
{
"rdf:about": "[email protected]:lexibank/northperulex",
"rdf:type": "prov:Entity",
"dc:created": "1e3f230a",
"dc:created": "00241a1d",
"dc:title": "Repository"
},
{
Expand Down
4 changes: 2 additions & 2 deletions cldf/lingpy-rcParams.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
10,
10
],
"filename": "lingpy-2025-12-24",
"filename": "lingpy-2025-12-27",
"gap_symbol": "-",
"gap_weight": 0.5,
"gop": -2,
Expand Down Expand Up @@ -123,7 +123,7 @@
"scorer": {},
"sonar": true,
"stress": "\u02c8\u02cc'",
"timestamp": "2025-12-24 00:00",
"timestamp": "2025-12-27 21:11",
"tones": "\u00b9\u00b2\u00b3\u2074\u2075\u2076\u2077\u2078\u2079\u2070\u2081\u2082\u2083\u2084\u2085\u2086\u2087\u2088\u2089\u20800123456789\u02e5\u02e6\u02e7\u02e8\u02e9\u02ea\u02eb-\ua708-\ua709-\ua70a-\ua70b-\ua70c-\ua70d-\ua70e-\ua70f-\ua710-\ua711-\ua712-\ua713-\ua714-\ua715-\ua716-\ua717-\ua718-\ua719-\ua71a-\ua700-\ua701-\ua702-\ua703-\ua704-\ua705-\ua706-\ua707",
"tree_calc": "neighbor",
"unique_sequences": true,
Expand Down