-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
98cdf80
commit 2d3fd4b
Showing
1 changed file
with
118 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
# CLS | ||
|
||
## Blackboard CLS | ||
### V1.0 | ||
**Model Details** | ||
|
||
- Developer: Wannaphong Phatthiyaphaibun | ||
- This report author: Wannaphong Phatthiyaphaibun | ||
- Model date: 2022-10-14 | ||
- Model version: 1.0 | ||
- Used in PyThaiNLP version: 3.2 + | ||
- Filename: `pythainlp/corpus/blackboard-cls_v1.0.crfsuite` | ||
- GitHub: [https://github.com/PyThaiNLP/pythainlp/issues/729](https://github.com/PyThaiNLP/pythainlp/issues/729) | ||
- CRF Model | ||
- License: CC0 | ||
|
||
**Intended Use** | ||
|
||
- Segmenting Thai text into clauses (smaller than a sentence but bigger than a word) | ||
- Not suitable for other language or non-news domains. | ||
|
||
**Factors** | ||
|
||
- Based on known problems with thai natural Language processing. | ||
|
||
**Metrics** | ||
|
||
- Evaluation metrics include precision, recall and f1-score. | ||
|
||
**Training Data** | ||
|
||
Blackboard treebank | ||
|
||
**Evaluation Data** | ||
|
||
Blackboard treebank | ||
|
||
**Quantitative Analyses** | ||
|
||
``` | ||
precision recall f1-score support | ||
B_CLS 1.00 1.00 1.00 91698 | ||
E_CLS 1.00 1.00 1.00 91700 | ||
I_CLS 1.00 1.00 1.00 707795 | ||
micro avg 1.00 1.00 1.00 891193 | ||
macro avg 1.00 1.00 1.00 891193 | ||
weighted avg 1.00 1.00 1.00 891193 | ||
samples avg 1.00 1.00 1.00 891193 | ||
``` | ||
**Ethical Considerations** | ||
|
||
- It trains from Blackboard treebank. It is possible to have a bias from Blackboard treebank. | ||
|
||
**Caveats and Recommendations** | ||
|
||
- The user must perform word segmentation first before using this model. | ||
- Thai text only | ||
|
||
|
||
## LST20 CLS | ||
### v0.2 | ||
**Model Details** | ||
|
||
- Developer: Wannaphong Phatthiyaphaibun | ||
- This report author: Wannaphong Phatthiyaphaibun | ||
- Model date: 2020-10-03 | ||
- Model version: 0.2 | ||
- Used in PyThaiNLP version: 2.2.4 + | ||
- Filename: `~/pythainlp-data/cls-v0.2.crfsuite` | ||
- GitHub: [https://github.com/PyThaiNLP/pythainlp/pull/479](https://github.com/PyThaiNLP/pythainlp/pull/479) | ||
- CRF Model | ||
- License: CC0 | ||
|
||
**Intended Use** | ||
|
||
- Segmenting Thai text into clauses (smaller than a sentence but bigger than a word) | ||
- Not suitable for other language or non-news domains. | ||
|
||
**Factors** | ||
|
||
- Based on known problems with thai natural Language processing. | ||
|
||
**Metrics** | ||
|
||
- Evaluation metrics include precision, recall and f1-score. | ||
|
||
**Training Data** | ||
|
||
LST20 Corpus Train set (news domain) | ||
|
||
**Evaluation Data** | ||
|
||
LST20 Corpus Test set (news domain) | ||
|
||
**Quantitative Analyses** | ||
|
||
``` | ||
precision recall f1-score support | ||
B_CLS 0.90 0.94 0.92 16111 | ||
E_CLS 0.90 0.94 0.92 15947 | ||
I_CLS 0.99 0.97 0.98 169565 | ||
micro avg 0.97 0.97 0.97 201623 | ||
macro avg 0.93 0.95 0.94 201623 | ||
weighted avg 0.97 0.97 0.97 201623 | ||
samples avg 0.94 0.94 0.94 201623 | ||
``` | ||
**Ethical Considerations** | ||
|
||
- It trains from LST20 Corpus. It is possible to have a bias from LST20 Corpus. | ||
|
||
**Caveats and Recommendations** | ||
|
||
- The user must perform word segmentation first before using this model. | ||
- Thai text only |