Skip to content

Commit 2191919

Browse files
sebschuStanford NLP
authored and
Stanford NLP
committed
Merge branch 'master' into ud-gapping
1 parent 516d10a commit 2191919

File tree

318 files changed

+268567
-112565
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

318 files changed

+268567
-112565
lines changed

README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -38,13 +38,13 @@ At present [the current released version of the code](https://stanfordnlp.github
3838
#### Build with Maven
3939

4040
1. Make sure you have Maven installed, details here: [https://maven.apache.org/](https://maven.apache.org/)
41-
2. If you run this command in the CoreNLP directory: `mvn package` , it should run the tests and build this jar file: `CoreNLP/target/stanford-corenlp-3.7.0.jar`
41+
2. If you run this command in the CoreNLP directory: `mvn package` , it should run the tests and build this jar file: `CoreNLP/target/stanford-corenlp-3.9.2.jar`
4242
3. When using the latest version of the code make sure to download the latest versions of the [corenlp-models](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar), [english-models](http://nlp.stanford.edu/software/stanford-english-corenlp-models-current.jar), and [english-models-kbp](http://nlp.stanford.edu/software/stanford-english-kbp-corenlp-models-current.jar) and include them in your CLASSPATH. If you are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.
43-
4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository. Below is a sample command for installing the Spanish models jar. For other languages just change the language name in the command. To install `stanford-corenlp-models-current.jar` you will need to set `-Dclassifier=models`. Here is the sample command for Spanish: `mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=3.9.1 -Dclassifier=models-spanish -Dpackaging=jar`
43+
4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository. Below is a sample command for installing the Spanish models jar. For other languages just change the language name in the command. To install `stanford-corenlp-models-current.jar` you will need to set `-Dclassifier=models`. Here is the sample command for Spanish: `mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=3.9.2 -Dclassifier=models-spanish -Dpackaging=jar`
4444

4545
### Useful resources
4646

47-
You can find releases of Stanford CoreNLP on [Maven Central](https://search.maven.org/#artifactdetails%7Cedu.stanford.nlp%7Cstanford-corenlp%7C3.7.0%7Cjar).
47+
You can find releases of Stanford CoreNLP on [Maven Central](https://search.maven.org/artifact/edu.stanford.nlp/stanford-corenlp/3.9.2/jar).
4848

4949
You can find more explanation and documentation on [the Stanford CoreNLP homepage](http://stanfordnlp.github.io/CoreNLP/).
5050

build.gradle

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ sourceCompatibility = 1.8
1111
targetCompatibility = 1.8
1212
compileJava.options.encoding = 'UTF-8'
1313

14-
version = '3.9.1'
14+
version = '3.9.2'
1515

1616
// Gradle application plugin
1717
mainClassName = "edu.stanford.nlp.pipeline.StanfordCoreNLP"

build.xml

+107
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,113 @@
175175
</junit>
176176
</target>
177177

178+
<target name="itest-many-docs" depends="classpath"
179+
description="Run StanfordCoreNLP on a large volume of documents.">
180+
<echo message="${ant.project.name}" />
181+
<junit fork="yes" maxmemory="14g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
182+
<classpath refid="classpath"/>
183+
<classpath path="${build.path}"/>
184+
<classpath path="${data.path}"/>
185+
<classpath path="${source.path}"/>
186+
<formatter type="brief" usefile="false"/>
187+
<batchtest fork="yes">
188+
<fileset dir="${itests.path}">
189+
<include name="**/*StanfordCoreNLPSlowITest.java"/>
190+
</fileset>
191+
</batchtest>
192+
</junit>
193+
</target>
194+
195+
<target name="itest-coreference" depends="classpath"
196+
description="Coreference related slow itests.">
197+
<echo message="${ant.project.name}" />
198+
<junit fork="yes" maxmemory="7g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
199+
<classpath refid="classpath"/>
200+
<classpath path="${build.path}"/>
201+
<classpath path="${data.path}"/>
202+
<classpath path="${source.path}"/>
203+
<formatter type="brief" usefile="false"/>
204+
<batchtest fork="yes">
205+
<fileset dir="${itests.path}">
206+
<include name="**/*Coref*SlowITest.java"/>
207+
<include name="**/DcorefBenchmarkSlowITest.java"/>
208+
<include name="**/DcorefSlowITest.java"/>
209+
<include name="**/ChineseCorefBenchmarkSlowITest.java"/>
210+
</fileset>
211+
</batchtest>
212+
</junit>
213+
</target>
214+
215+
<target name="itest-protobuf" depends="classpath"
216+
description="Protocol buffer related slow itests.">
217+
<echo message="${ant.project.name}" />
218+
<junit fork="yes" maxmemory="14g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
219+
<classpath refid="classpath"/>
220+
<classpath path="${build.path}"/>
221+
<classpath path="${data.path}"/>
222+
<classpath path="${source.path}"/>
223+
<formatter type="brief" usefile="false"/>
224+
<batchtest fork="yes">
225+
<fileset dir="${itests.path}">
226+
<include name="**/*Protobuf*SlowITest.java"/>
227+
</fileset>
228+
</batchtest>
229+
</junit>
230+
</target>
231+
232+
<target name="itest-kbp" depends="classpath"
233+
description="KBP related slow itests.">
234+
<echo message="${ant.project.name}" />
235+
<junit fork="yes" maxmemory="14g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
236+
<classpath refid="classpath"/>
237+
<classpath path="${build.path}"/>
238+
<classpath path="${data.path}"/>
239+
<classpath path="${source.path}"/>
240+
<formatter type="brief" usefile="false"/>
241+
<batchtest fork="yes">
242+
<fileset dir="${itests.path}">
243+
<include name="**/*KBP*SlowITest.java"/>
244+
</fileset>
245+
</batchtest>
246+
</junit>
247+
</target>
248+
249+
<target name="itest-ner" depends="classpath"
250+
description="NER related slow itests">
251+
<echo message="${ant.project.name}" />
252+
<junit fork="yes" maxmemory="14g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
253+
<classpath refid="classpath"/>
254+
<classpath path="${build.path}"/>
255+
<classpath path="${data.path}"/>
256+
<classpath path="${source.path}"/>
257+
<formatter type="brief" usefile="false"/>
258+
<batchtest fork="yes">
259+
<fileset dir="${itests.path}">
260+
<include name="**/NERBenchmarkSlowITest.java"/>
261+
<include name="**/TrainCRFClassifierSlowITest.java"/>
262+
</fileset>
263+
</batchtest>
264+
</junit>
265+
</target>
266+
267+
<target name="itest-misc" depends="classpath"
268+
description="NER related slow itests">
269+
<echo message="${ant.project.name}" />
270+
<junit fork="yes" maxmemory="14g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
271+
<classpath refid="classpath"/>
272+
<classpath path="${build.path}"/>
273+
<classpath path="${data.path}"/>
274+
<classpath path="${source.path}"/>
275+
<formatter type="brief" usefile="false"/>
276+
<batchtest fork="yes">
277+
<fileset dir="${itests.path}">
278+
<include name="**/RequirementsCorrectSlowITest.java"/>
279+
<include name="**/ThreadedParserSlowITest.java"/>
280+
</fileset>
281+
</batchtest>
282+
</junit>
283+
</target>
284+
178285
<target name="slowitest" depends="classpath,compile"
179286
description="Run really slow integration tests">
180287
<echo message="${ant.project.name}" />

data/edu/stanford/nlp/dcoref/STILLALONEWOLF_20050102.1100.eng.LDC2005E83.expectedcoref

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
1
2-
1 STILLALONEWOLF_20050102.1100 .
2+
1 20050102.1100
33

44
2
55
2 Munir
@@ -161,18 +161,18 @@
161161
10 Alexandria
162162

163163
68
164+
10 :-RRB-
165+
166+
69
164167
10 this
165168
10 the decisive factor
166169
11 this
167170

168-
72
171+
73
169172
10 residence in Alexandria
170173

171-
75
172-
10 the most beautiful concerts :-RRB- , In general
173-
174174
76
175-
10 the most beautiful concerts :-RRB-
175+
10 the most beautiful concerts :-RRB- , In general , thank you Mohammed Munir for giving me unexpected pleasure on New Year 's Eve
176176

177177
77
178178
10 general

data/edu/stanford/nlp/dcoref/coref.properties

+3-3
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ dcoref.postprocessing = true
1010
dcoref.maxdist = -1
1111
dcoref.use.big.gender.number = false
1212
dcoref.replicate.conll = true
13-
dcoref.conll.scorer = /scr/nlp/data/conll-2011/scorer/v4/scorer.pl
13+
dcoref.conll.scorer = /u/scr/nlp/data/conll-2011/scorer/v4/scorer.pl
1414

15-
dcoref.logFile = /scr/nlp/coref/error_log/temp/result_conlldev.txt
16-
dcoref.conll2011 = /scr/nlp/data/conll-2011/v2/data/dev/data/english/annotations
15+
dcoref.logFile = /u/scr/nlp/coref/error_log/temp/result_conlldev.txt
16+
dcoref.conll2011 = /u/scr/nlp/data/conll-2011/v2/data/dev/data/english/annotations
1717

doc/README

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
This directory contains various forms of documentation.
22

3-
software: the web pages on our apache installation, eg
4-
http://nlp.stanford.edu/software/index.shtml
3+
software: the web pages on our apache httpd installation, ei.e.,
4+
https://nlp.stanford.edu/software/
55

66
releasenotes: the output of running the release scripts to create the
77
zips we release

doc/classify/README.txt

+20-6
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Stanford Classifier v3.7.0 - 2016-10-31
1+
Stanford Classifier v3.9.2 - 2018-10-16
22
-------------------------------------------------
33

44
Copyright (c) 2003-2012 The Board of Trustees of
@@ -28,7 +28,18 @@ java -cp "*:." edu.stanford.nlp.classify.ColumnDataClassifier -prop examples/che
2828

2929
This will classify the included test data, cheeseDisease.test, based on the probability that each example is a cheese or a disease, as calculated by a linear classifier trained on cheeseDisease.train.
3030

31-
The cheese2007.prop file demonstrates how features are specified. The first feature in the file, useClassFeature, indicates that a feature should be used based on class frequency in the training set. Most other features are calculated on specific columns of data in your tab-delimited text file. For example, "1.useNGrams=true" indicates that n-gram features should be created for the values in column 1 (numbering begins at 0!). Note that you must specify, for example, "true" in "1.useNGrams=true"; "1.useNGrams" alone will not cause n-gram features to be created. N-gram features are character subsequences of the string in the column, for example, "t", "h", "e", "th", "he", "the" from the word "the". You can also specify various other kinds of features such as just using the string value as a categorical feature (1.useString=true) or splitting up a longer string into bag-of-words features (1.splitWordsRegexp=[ ] 1.useSplitWords=true). The prop file also allows a choice of printing and optimization options, and allows you to specify training and test files (e.g., in cheese2007.prop under the "Training input" comment). See the javadoc for ColumnDataClassifier within the edu.stanford.nlp.classify package for more information on these and other options.
31+
The cheese2007.prop file demonstrates how features are specified. The first feature in the file, useClassFeature,
32+
indicates that a feature should be used based on class frequency in the training set. Most other features are
33+
calculated on specific columns of data in your tab-delimited text file. For example, "1.useNGrams=true" indicates
34+
that n-gram features should be created for the values in column 1 (numbering begins at 0!). Note that you must
35+
specify, for example, "true" in "1.useNGrams=true"; "1.useNGrams" alone will not cause n-gram features to be created.
36+
N-gram features are character subsequences of the string in the column, for example, "t", "h", "e", "th", "he",
37+
"the" from the word "the". You can also specify various other kinds of features such as just using the string value
38+
as a categorical feature (1.useString=true) or splitting up a longer string into bag-of-words features
39+
(1.splitWordsRegexp=[ ] 1.useSplitWords=true). The prop file also allows a choice of printing and optimization
40+
options, and allows you to specify training and test files (e.g., in cheese2007.prop under the "Training input"
41+
comment). See the javadoc for ColumnDataClassifier within the edu.stanford.nlp.classify package for more information
42+
on these and other options.
3243

3344
Another included dataset is the iris dataset which uses numerical features to separate types of irises. To specify the use of a real-valued rather than categorical feature, you can use one or more of "realValued", "logTransform", or "logitTransform" for a given column. "realValued" adds the number in the given column as a feature value, while the transform options perform either a log or a logit transform on the value first. The format of these feature options is the same as for categorical features; for instance, iris2007.prop shows the use of real valued features such as "2.realValued=true".
3445

@@ -60,13 +71,12 @@ LICENSE
6071
// GNU General Public License for more details.
6172
//
6273
// You should have received a copy of the GNU General Public License
63-
// along with this program; if not, write to the Free Software
64-
// Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
74+
// along with this program. If not, see http://www.gnu.org/licenses/ .
6575
//
6676
// For more information, bug reports, fixes, contact:
6777
// Christopher Manning
68-
// Dept of Computer Science, Gates 1A
69-
// Stanford CA 94305-9010
78+
// Dept of Computer Science, Gates 2A
79+
// Stanford CA 94305-9020
7080
// USA
7181
7282
// https://nlp.stanford.edu/software/classifier.html
@@ -76,6 +86,10 @@ LICENSE
7686
CHANGES
7787
-------------------------
7888

89+
2018-10-16 3.9.2 Update for compatibility
90+
91+
2018-02-27 3.9.1 Updated for compatibility
92+
7993
2016-10-31 3.7.0 Update for compatibility
8094

8195
2015-12-09 3.6.0 Update for compatibility

doc/classify/build.xml

+31-10
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
<!-- build.xml file for ant for JavaNLP -->
22

3+
<!-- Before using this, unjar the sources' jar file into the src/ directory! -->
4+
35
<!-- A "project" describes a set of targets that may be requested
46
when Ant is executed. The "default" attribute defines the
57
target which is executed if no specific target is requested,
@@ -41,6 +43,7 @@
4143
<property name="compile.optimize" value="true"/>
4244
<property name="compile.source" value="1.8" />
4345
<property name="compile.target" value="1.8" />
46+
<property name="compile.encoding" value="utf-8" />
4447

4548

4649

@@ -93,15 +96,18 @@
9396
<javac srcdir="${src.home}"
9497
destdir="${build.home}"
9598
debug="${compile.debug}"
99+
encoding="${compile.encoding}"
96100
deprecation="${compile.deprecation}"
97101
optimize="${compile.optimize}"
98102
source="${compile.source}"
99-
target="${compile.target}">
103+
target="${compile.target}"
104+
includeantruntime="false">
100105
<compilerarg value="-Xmaxerrs"/>
101106
<compilerarg value="20"/>
102107
<classpath>
103108
<fileset dir="${basedir}">
104109
<include name="*.jar"/>
110+
<exclude name="javanlp*"/>
105111
</fileset>
106112
</classpath>
107113
<!-- <compilerarg value="-Xlint"/> -->
@@ -134,15 +140,30 @@
134140
<mkdir dir="${javadoc.home}"/>
135141
<javadoc sourcepath="${src.home}"
136142
destdir="${javadoc.home}"
137-
maxmemory="768m"
138-
author="true"
139-
source="1.6"
140-
Overview="${src.home}/edu/stanford/nlp/overview.html"
141-
Doctitle="Stanford JavaNLP API Documentation"
142-
Windowtitle="Stanford JavaNLP API"
143-
packagenames="*">
144-
<bottom><![CDATA[<FONT SIZE=2><A HREF=\"http://nlp.stanford.edu\">Stanford NLP Group</A></FONT>]]></bottom>
145-
<link href="http://java.sun.com/j2se/1.6.0/docs/api/"/>
143+
maxmemory="1g"
144+
author="true"
145+
source="${compile.source}"
146+
overview="${src.home}/edu/stanford/nlp/overview.html"
147+
doctitle="Stanford JavaNLP API Documentation"
148+
windowtitle="Stanford JavaNLP API"
149+
encoding="${compile.encoding}"
150+
docencoding="${compile.encoding}"
151+
charset="${compile.encoding}"
152+
packagenames="*">
153+
<!-- Allow @generated, @modifiable and @ordered tags -->
154+
<tag name="generated" scope="all" description="Generated" />
155+
<tag name="modifiable" scope="all" description="Modifiable" />
156+
<tag name="ordered" scope="all" description="Ordered" />
157+
<!-- Depends on lib and classes folders -->
158+
<classpath>
159+
<fileset dir="${basedir}">
160+
<include name="*.jar"/>
161+
<exclude name="javanlp*"/>
162+
</fileset>
163+
<pathelement path="${build.home}" />
164+
</classpath>
165+
<bottom><![CDATA[<font size="2"><a href="https://nlp.stanford.edu" target="_top">Stanford NLP Group</a></font>]]></bottom>
166+
<link href="https://docs.oracle.com/javase/8/docs/api/"/>
146167
</javadoc>
147168

148169
</target>

doc/corenlp/README.txt

+12-4
Original file line numberDiff line numberDiff line change
@@ -28,20 +28,28 @@ LICENSE
2828
// GNU General Public License for more details.
2929
//
3030
// You should have received a copy of the GNU General Public License
31-
// along with this program; if not, write to the Free Software Foundation,
32-
// Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
31+
// along with this program. If not, see http://www.gnu.org/licenses/ .
3332
//
3433
// For more information, bug reports, fixes, contact:
3534
// Christopher Manning
36-
// Dept of Computer Science, Gates 1A
37-
// Stanford CA 94305-9010
35+
// Dept of Computer Science, Gates 2A
36+
// Stanford CA 94305-9020
3837
// USA
3938
//
4039

4140
---------------------------------
4241
CHANGES
4342
---------------------------------
4443

44+
2018-10-05 3.9.2 improved NER pipeline and entity mention
45+
confidences; support for Java 11; new POS
46+
models for English; 4 methods for setting
47+
document dates; tokenizer improvements;
48+
CoreNLP runs as filter from stdin to stdout;
49+
bug fixes
50+
51+
2018-02-27 3.9.1 Bug fixes, minor enhancements
52+
4553
2018-01-31 3.9.0 Spanish KBP and new dependency parse model,
4654
wrapper API for data, quote attribution
4755
improvements, easier use of coref info, bug

0 commit comments

Comments
 (0)