[Feature Request] HTER Implementation #248

shivanraptor · 2023-11-13T01:25:07Z

Is there an implementation of Human-mediated Translation Edit Rate (HTER) algorithm?

Related paper: https://aclanthology.org/2006.amta-papers.25/

martinpopel · 2023-11-13T09:09:25Z

SacreBLEU includes an implementation of TER, using -m ter. The implementation of HTER is exactly the same, you just need to use "targeted" references for the MT system you plan to evaluate (i.e. human post-edited the MT output, possibly using existing untargeted references).
If you need to strictly follow the original HTER paper, you should also have a set of untargeted references and multiply the final score by the avg length of the targeted reference and divide by the avg length of the untargeted references.
Note that HTER computation is very costly because you need to create a new targeted reference for (each version of) each MT system you plan to evaluate. If you want fairly compare several MT systems, you should create their targeted references at the same time with the same pool of annotators and make sure the assignment of annotators is random.
Note also that HTER was invented before the introduction of modern NMT systems, so we don't know what would be the correlation with human judgements. Also, it is well known that some systems have worse translation quality but need less edits post-editing relative to other systems, so HTER would be biased against these systems (similarly to BLEU).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] HTER Implementation #248

[Feature Request] HTER Implementation #248

shivanraptor commented Nov 13, 2023

martinpopel commented Nov 13, 2023

[Feature Request] HTER Implementation #248

[Feature Request] HTER Implementation #248

Comments

shivanraptor commented Nov 13, 2023

martinpopel commented Nov 13, 2023