PCFG Parsing

This page explains how probabilistic features are extracted from with PCFG parsing.

Requirements

Berkeley parser jar (downloaded v1.7 from Berkeley repository, included in lib/)
Grammars for source and target language English Spanish trained over Ancora. Please observe license. more at Berkeley code

#Features

A line has been added in FeatureExtractor.runBB() which instantiates a BParserProcessor and then calls the BParserProcessor.initialize function, giving as parameters the PropertiesManager (i.e. configuration reader) and the language.
BParserProcessor extends ResourceProcessor and upon initialize reads the desired configuration parameters from the passed PropertiesManager. Given these parameters, it instantiates a BParser which is a Resource.
BParser imports the official Berkeley jar and extends the original CoarseToFineMaxRuleParser which needs to be initialized with a compiled grammar file.

Every sentence is parsed and the generated statistic as well as the tree are added with Sentence.setValue
When features are created, they retrieve the statistic values from the sentence and they add set them as Feature values
Features are specified with java classes Feature9300.java to Feature9307.java: