Skip to content

Commit

Permalink
The following changes were made to this version:
Browse files Browse the repository at this point in the history
1) Removed stemming procedures from the distributional component and replace it
   with: lemmatizations from the .terms file.
2) For distributional purposes, count alternative forms of the same lemma as instances
   of the same term.
3) Removed substring procedures from distributional process and instead
   make them part of the chunking procedures (preprocessors to the
   distributional system). Treat cases where prepositional phrases are
   converted to prenominals specially (e.g., full term = Recognition
   of Speech, lemma = Speech Recognition). In particular, limit
   substring relations for these cases.
4) Fixed a bug in abbreviate that occurs if white space is made part of an abbreviation sequence.
5) Added an option for webscore to use www.webcorp.org.uk search
   engine. This is experimental. It is slower than Yahoo. However, it
   may be possible to download webcorp and use it internally. In that
   case, it would have some advantages over Yahoo: it would not necessarily
   be slower; and it would be more resilient.
6) Removed an unnecessary print statement in filter_term_output.py
7) Started using make_final_output_file.py to produce .out_term_list file
   instead of shell command. There is a change in output format. Each
   line consists of the lemma of the term, followed by variants,
   separated by tabs.
8) Added possibly_create_abbreviate_dicts.py to create missing abbreviate
   dictionaries.
9) Added a way of running the program that is customized to using
   single files as foreground with the same background. For example,
   we are currently generating one set of terms using each supreme court
   decision as foreground and the full set of supreme court decisions as
   background. In future versions, we will provide an example from this
   run (using the shell script run_termolator_with_1_file_foreground.sh
10) The changed described above significantly reduce the system's dependency
    on NLTK and it is possible that we will add a future version that makes
    installation of NLTK unnecessary.
11) We are now using argument 12 in the run_termolator.sh script as the prefix
    for multiple shared caches of information to reuse. Previously, it was just
    used as a prefix for the webscore file, but now it is also used as a prefix
    for the lemma dictionary file. The current assumption is that one such file
    should be used for each widely defined topic.
12) Updated the Readme to reflect these changes
13) Updated the test file directories (gutenberg-test, OANC-test and patent-test)
14) Will make this release a new release number
  • Loading branch information
Adam Meyers committed Sep 20, 2017
1 parent 2ac3be8 commit a01ee98

Sorry, this diff is taking too long to generate.

It may be too large to display on GitHub.

0 comments on commit a01ee98

Please sign in to comment.