-
Notifications
You must be signed in to change notification settings - Fork 284
Open
Labels
Description
Description
MyPy reports a bunch of typing issues in pythainlp/benchmarks/word_tokenization.py
Expected results
- All functions have explicit type hinting information
- No typing incompatible issues
Current results
ref_sample in these two lines for examples, are seen as str and should not have shape attribute.
pythainlp/pythainlp/benchmarks/word_tokenization.py
Lines 164 to 165 in 9a1274b
| c_pos_pred = c_pos_pred[c_pos_pred < ref_sample.shape[0]] | |
| c_neg_pred = c_neg_pred[c_neg_pred < ref_sample.shape[0]] |
But it looks like from _binary_representation function, it may has a type of ND array.
However, the _binary_representation type hints and docstring said they are str:
pythainlp/pythainlp/benchmarks/word_tokenization.py
Lines 208 to 221 in 9a1274b
| def _binary_representation(txt: str, verbose: bool = False): | |
| """ | |
| Transform text into {0, 1} sequence. | |
| where (1) indicates that the corresponding character is the beginning of | |
| a word. For example, ผม|ไม่|ชอบ|กิน|ผัก -> 10100... | |
| :param str txt: input text that we want to transform | |
| :param bool verbose: for debugging purposes | |
| :return: {0, 1} sequence | |
| :rtype: str | |
| """ | |
| chars = np.array(list(txt)) |
So there're confusions here to be fixed.
Steps to reproduce
Use MyPy to check the code
PyThaiNLP version
5
Python version
any
Operating system and version
any
More info
No response
Possible solution
No response
Files
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
No status