The message parsing is very rudimentary and assumes any "h3" tag with an "a" child represents the start of a new paper.
If the paper starts elsewhere, I can end up with duplicates.
The fix is probably to make sure that the first block really does look like a paper title.