In the output of `split_parser`, `split` and `parser` we have an output of tokens and predictions. It may be worth considering a different type of output with the spans of each reference/token rather than the tokens themselves.