-
Notifications
You must be signed in to change notification settings - Fork 15
PCFG Parsing
accezz edited this page Oct 15, 2012
·
6 revisions
This page explains how probabilistic features are extracted from with PCFG parsing.
- Berkeley parser jar (downloaded v1.7 from Berkeley repository, included in lib/)
- Grammars for source and target language English Spanish trained over Ancora. Please observe license. more at Berkeley code
#Features
- log likelihood
- number of alternative parses generated
- confidence score averaged over all parses
- confidence score of best parse The parse is also available
- A line has been added in
FeatureExtractor.runBB()
which instantiates aBParserProcessor
and then calls theBParserProcessor.initialize
function, giving as parameters the PropertiesManager (i.e. configuration reader) and the language. -
BParserProcessor
extendsResourceProcessor
and uponinitialize
reads the desired configuration parameters from the passedPropertiesManager
. Given these parameters, it instantiates aBParser
which is aResource
. -
BParser
imports the official Berkeley jar and extends the originalCoarseToFineMaxRuleParser
which needs to be initialized with a compiled grammar file.
- Every sentence is parsed and the generated statistic as well as the tree are added with
Sentence.setValue
- When features are created, they retrieve the statistic values from the sentence and they add set them as Feature values
- Features are specified with java classes
Feature9300.java
toFeature9307.java
: