Skip to content

Commit a3939d7

Browse files
author
sgarda
committed
fix dic paths
1 parent 38925d7 commit a3939d7

File tree

2 files changed

+4
-1
lines changed

2 files changed

+4
-1
lines changed

data/dataset/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ The intermediate step is done in order to get rid of missing tweets ( and tweets
1414
## SPELL CHECKED
1515

1616
Once you created the the file with the dependency it is possible to apply spell checking. For complete reproducibility the files using for spell check are provided.
17+
18+
$ python3 generate_data_file.py -p tweet_file_parsed -o output_path
19+
1720
This step is performed now for the following reason:
1821
- need for tokenization
1922
- need for pos tags (provided by parsing) for avoiding parsing urls and emoticons

data/dataset/generate_data_file.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313

1414
DISCARD = ['\n', 'Twitter / Account gesperr\n']
15-
DICS = ['/usr/share/hunspell/en_US.dic', '/usr/share/hunspell/en_US.aff']
15+
DICS = ['./en_US.dic', './en_US.aff']
1616
SPELLER = HunSpell(*DICS)
1717
NO_SPELL = ['^','Z','L','M','!','Y','#','@','~','U','E',',','G','S']
1818

0 commit comments

Comments
 (0)