Skip to content

Comparing emails for duplicate detection when key headers are missing may produce false positives #500

@jfarwer

Description

@jfarwer

Duplicate message detection currently uses the following fields:

  • Message-ID
  • Subject
  • Sender
  • Recipient
  • Date

Duplicate detection is performed even when some or all of these fields are missing.

Current behaviour

If multiple emails have no sender, no recipient, no date, no message-ID, and no subject, they are detected as duplicates regardless of their body content. Only the first message will be imported.

If two messages have no message-ID and no date, duplicate detection falls back to comparing subject, sender, and recipient.

This may lead to incorrect duplicate detection.

Potential issue

Using only subject, sender, and recipient is likely not reliable enough to determine duplicates.

Suggestion

If both message-ID and date are missing, the duplicate detection could additionally compare the email body (or at least part of it) to reduce the risk of false positives.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions