Skip to content
Dimitris Kontokostas edited this page Mar 1, 2016 · 2 revisions

According to https://github.com/dbpedia/extraction-framework/pull/35/#issuecomment-16187074 the current design is the following

I order to create a new Extractor you need to extend the Extractor[T] trait and in particular:

  • WikiPageExtractor when you want to use only the page metadata. e.g. RedirectExtractor, `LabelExtractor', ...
  • PageNodeExtractor when you want to work with the Wikitext AST (most common case)
  • JsonNodeExtractor when you want to work with Wikidata pages

Examples of Extractors can be found in the org/dbpedia/extraction/mappings package in core module.

If you want to test your new extractor you can do it in two ways:

  • for a full dump extraction you can add .MyNewExtractor in the extraction property files in dump module
  • add your extractor in the server.default.properties in the server module and start the mapping server with ../run server. Open http://localhost:{PORT}/server/extraction/{LANG}/ and try your extractor on a specific page. (You can also run this from your IDE and debug)

Clone this wiki locally