[Feature] Duplicate photo detection #1968
Replies: 21 comments 66 replies
-
It would also be useful to be able to detect updated versions of identical images. This is more applicable when publishing images from Mirrorless images, where multiple versions might be created. |
Beta Was this translation helpful? Give feedback.
-
I have this scenario:
I still would want Immich to accept uploads of the same image in better quality. And some dedupe feature would be very nice in my current situation. |
Beta Was this translation helpful? Give feedback.
-
In #3816 it was mentioned that there is an open source tool available which can detect similar images, either by hash or by visual similarity, including where there are differing resolutions. If this had a manual approval process, it would be very useful for people dealing with multiple versions of the same image or rescales produced by messaging apps. |
Beta Was this translation helpful? Give feedback.
-
I think this would be a stunning feature |
Beta Was this translation helpful? Give feedback.
-
Cameras quite often take photos in Post edit note: I realize this really, really isn't the right issue (discussion? I'm eternally confused at these things) to have picked, it seems #2479 fits my need much more, still, I feel both ideas can be adapted as one. |
Beta Was this translation helpful? Give feedback.
-
It can be done via digikam application as well |
Beta Was this translation helpful? Give feedback.
-
So if i have a bunch of photos in my external library (already loaded in the db) and i run some dedupe on the directory, what will happen with the assets in the library? Do i have to rescan the entire library or will immich handle the missing photos fine? |
Beta Was this translation helpful? Give feedback.
-
with regards to duplicates, I think that it would be a good idea to have an optional "hide" feature which would hide files which have the same hash as an existing file. These could stay in the respective folders but only one photo is shown to avoid displaying the same picture twice. Any thoughts? |
Beta Was this translation helpful? Give feedback.
-
Dupeguru is open source, and it has worked a fuzzy logic based matching on contents and not just file names. There is a slider to adjust the threshold of match as well. Can we port this code / write a version that can work with immich? Seems like an interesting project! |
Beta Was this translation helpful? Give feedback.
-
I use digiKam and allDup or other external programs for deduplication. This works great, but currently it can be used only on external libraries. When you delete duplicates in the internal libraries, these deletings are not (yet) detected or accepted by immich. |
Beta Was this translation helpful? Give feedback.
-
Found another library for photo deduplicating, it is AI based so may perform better in many cases. This is for devs to consider to use in the server. |
Beta Was this translation helpful? Give feedback.
-
Has there ever been any movement on this request? Just started using this wonderful app myself, and image duplication is a REAL pain. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone, I'm excited to share that I've developed a Python script which launches a local Streamlit website, specifically designed for finding duplicates based on both deep learning or hashing. What makes this project particularly interesting is its capability to be integrated directly with Immich using the local server API. This should offer a seamless experience for those looking to streamline their workflows.The script is all set for testing and evaluation. Looking forward to your interest and feedback. Ciao |
Beta Was this translation helpful? Give feedback.
-
I feel like introducing a more basic check on media ingestion could be done that would solve a good majority of use cases for people. Something as simple as calculating phash when a photo is uploaded or scanned from an external directory and checking the DB for a matching phash would work in most cases. If a match is found, just mark the image as a "potential duplicate" and either ask the user to verify or just hide those potential dupes. Obviously this doesn't include videos and is going to miss photos that have been edited or cropped in some way, but will work for actual duplicates. If this is something that seems like a good idea, I'd love to add a PR |
Beta Was this translation helpful? Give feedback.
-
I've been exploring more fuzzy duplicate detection, where we get a similarity percentage from a more sensitive phash implementation. Similarity detection is a difficult task as a family library of 100K+ images would mean 10 Billion comparisions if every image was compared with every other image. Ideally some smart algorithm would be used to help speed things up by reducing comparisons. I'm currently using the postgres GIN index to do this for us. Below is an sql table which stores the result of similary comparision search on all library images. The sql table is currently external to immich, however the sha1 columns match the The similarity calculation is currently using the postgres The pHash implementation uses the python pHash library. import imagehash
fp = 'some file path'
imagehash.phash(Image.open(fp), hash_size=16) The hashes are then stored in a sql table. Note the varchar CREATE TABLE IF NOT EXISTS hashes
(
sha1 bytea
constraint hashes_pk
primary key,
phash_16 varchar,
file_size integer not null
);
CREATE INDEX phash_16_trgm_idx ON hashes USING GIN (phash_16 gin_trgm_ops); Then we can ask Postgres to search for similar hashes like so (On 120K hashes SELECT sha1, phash_16, similarity(phash_16, %s) AS sml FROM hashes
WHERE phash_16 %% %s AND sha1 != %s
ORDER BY sml DESC, phash_16;''',
(phash_16, phash_16, sha1) After getting some results, we can lookup the sha1 in immich SELECT "originalFileName", "originalPath", checksum
FROM assets
WHERE "checksum" in (E'\\x<SHA1-HASH-HERE>') Love to hear thoughts on your findings or possible improvements to this :) |
Beta Was this translation helpful? Give feedback.
-
How does Immich behave when an external program or the user manually deletes pictures from the database? Is it left with a broken reference to a missing picture or does it take care of the issue somehow? |
Beta Was this translation helpful? Give feedback.
-
I would say ideally this would auto stack duplicates, on top of that though there seems to be no way to define which image defines a stack and about half the time Immich is using the raw file as the stack cover image which is a darker image. |
Beta Was this translation helpful? Give feedback.
-
👋 Hi there, I've decided to proceed with implementing the FAISS way for analyzing assets via the DB API in Immich. The plan is to interfaces directly with Immich's asset database to retrieve images for processing. Here are the key components of the implementation strategy: 1. Image Retrieval:
2. Image Preprocessing:
3. Feature Extraction and Vectorization:
4. FAISS Indexing:
5. Similarity Search and Duplicate Identification:
6. Results Storage:
Technical Details:
I'm excited about leveraging FAISS for this project due to its efficiency and scalability in handling similarity searches. |
Beta Was this translation helpful? Give feedback.
-
Would be nice to have a “how duplicate is this”. Sometimes I have 5 photos taken within a minute or the same scene. They won’t be 100% identical. But maybe 90%? So would be nice to be able to select the survivor and the rest will be victims. |
Beta Was this translation helpful? Give feedback.
-
I would announce that I've uploaded a revised version of the Immich Duplicate Finder https://github.com/vale46n1/immich_duplicate_finder. This latest update introduces an advanced image analysis enhancement, powered by Facebook AI Similarity Search (FAISS) and deep learning feature extraction. These improvements significantly enhance our ability to identify duplicate or similar images, thus streamlining asset management and improving data quality. Key Features:
Next week, I plan to implement features to visually distinguish duplicates and group them together similarly to Google Photos. Additionally, I aim to introduce functionality for identifying and categorizing animals and various objects into similar groups. These would be enhancements that I hope could be directly implemented into Immich. Currently, I'm developing these features to address personal requirements and look forward to sharing more updates soon. |
Beta Was this translation helpful? Give feedback.
-
Duplicates, as mentioned in this thread, can have multiple reasons: 1 - can be adressed by a duplicate finder and the manually deciding what to keepr and what to delete by each user 2 - is way more complex as immich would need to track and manage which files are identical and which users has them and then keeping track if any user deletes a file and another user wants to keep it - not a trivial task Another way would be file level deduplication, i.e. via ZFS. So for immich all files/operations stay the same and all duplicates appear in the file system but the OS/FS decides which blocks are duplciate and stored only one - this would also dramatically decrease storage requierements in such enviroments. I might give the ZFS option a try and attacha ZF volume to my immich VM and make a sym-link fpr the "common" directory and move the fies over to the ZFS driver - in theory that should work just fine as for micch "nothing changes" |
Beta Was this translation helpful? Give feedback.
-
The feature
I want a feature that can detect duplicate photos to help me organize my pictures. Ideally, the feature should be able to handle photos of different resolutions. Sometimes the same photo might be published in a lower resolution on a mobile platform, while a high-resolution version is backed up on a computer. I hope the feature can automatically detect duplicates and present the options to the user for further action.
Platform
Beta Was this translation helpful? Give feedback.
All reactions