Consider supporting ISO-TimeML standard

The ISO-TimeML version of the TimeML Standard offers (at least) the following benefits:
- Standoff Annotations (Chapter 3.3) (see compromise below)
- It preserves Tokenization

Read about it here:
https://lexitron.nectec.or.th/public/LREC-2010_Malta/pdf/55_Paper.pdf

If supporting the complete standard is too much work, it would still be nice, to have standoff annotations. We currently calculate those manually and fuzzy-match them to the Token- and Sentence-Boundaries detected by our own Preprocessing Pipeline.


### Compromise to add standoff information to actual inline TimeML annotations
A simple fix to this specific problem would be (optionally) adding the CharacterPositions to the tagged Spans like so:

```python
# input text:
"Today I feel great."

# currently generated TimeML output:
'<?xml version="1.0"?><!DOCTYPE TimeML SYSTEM "TimeML.dtd"><TimeML>
<TIMEX3 tid="t1" type="DATE" value="2021-11-16">Today</TIMEX3> nothing happened.
</TimeML>'

# Proposed additional tag-attributes (orig_start_char, orig_end_char):
<TIMEX3 tid="t1" type="DATE" value="2021-11-16" orig_start_char="0" orig_end_char="5">Today</TIMEX3>
```

So this would capture the information the Original-Span tagged by the TIMEX3 with tid `t1`, is referring to the Span from character 0 (inclusive) to character 5 (exclusive).

Again, this information is necessary to synchronize HeidelTimes internally used but then forgotten Tokenization with your own Tokenization.

The information for those additional attributes should be easily accessible at runtime.


_We've already implemented a first draft of a parsing algorithm that incrementally generates those char-based Span indices afterwards, but it feels like it's a lot of duplicate work to reconstruct information that has already been there at HeidelTime's runtime._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider supporting ISO-TimeML standard #92

Compromise to add standoff information to actual inline TimeML annotations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider supporting ISO-TimeML standard #92

Description

Compromise to add standoff information to actual inline TimeML annotations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions