This is a cautious, test-backed utility for identifying and moving duplicate files within a directory tree. It favors safety, transparency, and auditability over blind automation.
- Dry run mode that logs everything but moves nothing
- Safe file movement with hash verification and collision-safe naming
- Grouping by file size and hash (fast and reliable)
- Structured logs with group IDs, reclaimable size, keeper strategy, and hashes
- CLI flags to discourage accidental file destruction
- Can be imported as a module or run from the command line
- No file deletion, only safe moves or logged intents
python -m dedoopsie.cli /path/to/scanDUDE_ARE_YOU_SURE=YES python -m dedoopsie.cli /path/to/scan \
--wet --yes-really \
--move-dir /some/target/path \
--keeper longest--move-dir: where dupes will go (flat layout, names made unique)--keeper: choose which file to keep (first, oldest, newest, longest)--strict: verify hash after move before unlinking--log: write structured CSV output--verbose: see hashing progress in real time
CSV with the following columns:
GROUP_ID,ACTION,ORIGINAL_PATH,DEST_PATH,KEEPER_PATH,GROUP_SIZE,RECLAIMABLE,HASH,ERROR
Use it to debug, review, or rollback if needed. You'll thank yourself later.
from dedoopsie.core import scan_directory, find_duplicates, safe_move, select_keeper
files = scan_directory("/some/dir")
dupe_groups = find_duplicates(files)
for group in dupe_groups:
keeper = select_keeper(group, "oldest")
for file in group:
if file != keeper:
safe_move(file, Path("/quarantine"))This tool assumes:
- You're operating on a real system
- You care more about not breaking things than shaving microseconds
- Logging, reversibility, and testability matter
If your priorities differ, there are better and more efficient utilities for deduplication. This exists to help humans avoid stepping on rakes.
Run the test suite:
pytestDry run, safe move, collision handling, and keeper strategies are all covered.
BSD 2-Clause
[GROUP 17] 3 files, size each: 5.32 MB
- Before: 15.96 MB | After: 5.32 MB | Reclaimable: 10.64 MB
From the repo root:
pip install .
dedoopsie --helpOr without installing:
python -m dedoopsie.cli /your/target/path