Deduplicate members on re-import across all importers#131
Merged
Conversation
Every importer (PluralKit, SimplyPlural, Tupperbox, PluralSpace, Prism, and the Sheaf native re-import) now matches each incoming member against the system's existing roster before writing, so re-importing the same export no longer doubles the member list. Matching is by PluralKit ID where present (exact) and otherwise by the name blind-index, scoped by is_custom_front so a member and a custom front sharing a name never collide. A new conflict_strategy option picks the behaviour on a match: skip (default, leave the existing member alone), update (overwrite the existing member's importable fields), or create (the old append-everything behaviour, kept as an escape hatch). The tier member cap now counts only the members an import would actually create, so re-importing into a near-full system no longer trips the cap on members that already exist. New members_skipped / members_updated counts surface on the import detail page. Deduplication is member-scoped: fronts, groups, journals, messages, polls, and reminders are still appended on re-import. Shared logic lives in sheaf/services/import_dedup.py with unit coverage, plus re-import skip/update/create integration tests and a regression test that the PluralKit member HID lands in pluralkit_id.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds member deduplication to every importer (PluralKit, SimplyPlural, Tupperbox, PluralSpace, Prism, and the Sheaf native re-import). Re-importing the same export no longer doubles the member list: each incoming member is matched against the system's existing roster before anything is written.
Matching is by PluralKit ID where present (exact, so PK round-trips cleanly) and otherwise by the name blind-index, scoped by
is_custom_frontso a member and a custom front that happen to share a name never merge. A newconflict_strategyoption chooses the behaviour on a match:skip(default, leave the existing member untouched and add nothing),update(overwrite the existing member's importable fields from the export), orcreate(the old append-everything behaviour, kept as an explicit escape hatch).The tier member cap now counts only the members an import would actually create, so re-importing into a near-full system no longer trips the cap on members that already exist. New
members_skipped/members_updatedcounts surface on the import detail page (the counts grid renders them automatically). Each import flow gains an "If a member already exists" selector.Scope
Deduplication is member-scoped. Fronts, groups, journals, messages, polls, and reminders are still appended on re-import, so re-importing those sections over existing data can still duplicate them. Custom-field values are the one dependent section made idempotent here, because the Sheaf native importer also dedupes field definitions and the (field, member) pair has a uniqueness constraint. Broadening dedup to the other dependent sections is tracked as a follow-up.
Notes for review
The
updatestrategy overwritesprivacyfrom the source, so a PluralKit re-import withupdatewould reset a member you had made public back to PluralKit's default. This is consistent with "refresh from source" but is a mild footgun; happy to exclude privacy from the update set if that reads better.Validation
Shared logic lives in
sheaf/services/import_dedup.pywith a 13-test unit suite. Added re-import skip/update/create integration tests plus a regression test that the PluralKit member HID lands inpluralkit_id. Full importer suite plus the export/import parity round-trip run green (109 passed); backend ruff clean; frontend type-check, lint, and build clean.