Skip to content
accezz edited this page Oct 15, 2012 · 6 revisions

This page explains how probabilistic features are extracted from with PCFG parsing.

Requirements

#Features

  • log likelihood
  • number of alternative parses generated
  • confidence score averaged over all parses
  • confidence score of best parse The parse is also available

Implementation

Initialization of the necessary objects

  1. A line has been added in FeatureExtractor.runBB() which instantiates a BParserProcessor and then calls the BParserProcessor.initialize function, giving as parameters the PropertiesManager (i.e. configuration reader) and the language.
  2. BParserProcessor extends ResourceProcessor and upon initialize reads the desired configuration parameters from the passed PropertiesManager. Given these parameters, it instantiates a BParser which is a Resource.
  3. BParser imports the official Berkeley jar and extends the original CoarseToFineMaxRuleParser which needs to be initialized with a compiled grammar file.

Sentence-level annotation

  • Every sentence is parsed and the generated statistic as well as the tree are added with Sentence.setValue
  • When features are created, they retrieve the statistic values from the sentence and they add set them as Feature values
  • Features are specified with java classes Feature9300.java to Feature9307.java: