Skip to content

Commit

Permalink
Fixed formatting in tchunk program -- The python strip program was re…
Browse files Browse the repository at this point in the history
…moving too many types of characters (e.g., underscores). This caused POS labels to be incorrectly read in. I changed the call to strip, limiting the removal to spaces, tabs and line separators.
  • Loading branch information
AdamMeyers authored and Adam Meyers committed Jan 31, 2017
1 parent d85c4ba commit 36cf2e9
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion inline_terms.py
Original file line number Diff line number Diff line change
Expand Up @@ -1470,7 +1470,7 @@ def find_inline_terms(lines,fact_file,pos_file,terms_file,marked_paragraphs=Fals

def get_pos_structure (line):
start_end = re.compile('S:([0-9]+) E:([0-9]+)')
line = line.strip()
line = line.strip(' '+os.linesep+'\t')
if line[0:3]=='|||':
fields = ['|||']
fields2 = line[3:].split('|||')
Expand Down

0 comments on commit 36cf2e9

Please sign in to comment.