A quick and dirty solution to all your dataset deduplication preprocess needs.
``
#Install
pip install -r requirements.txt
#Usage
Set recursive as False.
from imageSimilarity import ImageSimilarity
sim = ImageSimilarity()
sim.add_image_directory("C:/datasets/sample_dir", recursive=False)
print(sim.find_dupes())
Set recursive as True.
from imageSimilarity import ImageSimilarity
sim = ImageSimilarity()
sim.add_image_directory("C:/datasets/sample_dir", recursive=True)
print(sim.find_dupes())
similarity
rate is between 0.0 and 1.0
from imageSimilarity import ImageSimilarity
sim = ImageSimilarity()
sim.add_image_directory("C:/datasets/sample_dir", recursive=True)
print(sim.find_dupes(min_similarity=0.5))
More examples will be available in the examples
directory.