Skip to content

strip header, extra attributes from published treebank files #19

@balmas

Description

@balmas

From alpheios-project/arethusa#748:

When I try to import into Arethusa a treebank file from the Perseus Latin repository (e. g. phi1221.phi007.perseus-lat1.tb.xml from https://github.com/PerseusDL/treebank_data/blob/master/v2.1/Latin/texts/phi1221.phi007.perseus-lat1.tb.xml) using the "Upload Base XML Treebank / from file" button, I get the message: ERROR!! CHANGES NOT SAVED! errorunexpected attribute "oldId". When I change the file, removing the header element (with all its children) and body (I put in the annotator element from one of my exported treebank annotations), the file is read and displayed OK.
The users often want to review or change already annotated trees from the "gold standard". Arethusa is the logical choice of environment to do so. Perhaps this could be achieved with an XSL or XQuery stylesheet layer which would, on import, strip out from the base XML treebank file everything above the sentence element, and add a treebank element that is acceptable to Arethusa.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions