• Our booking engine at tickets.railforums.co.uk (powered by TrainSplit) helps support the running of the forum with every ticket purchase! Find out more and ask any questions/give us feedback in this thread!

Photo De-Duplication - Software Preferences?

Buzby

Member
Joined
14 Apr 2023
Messages
646
Location
Glasgow, Scotland
Into a New Year and something that I have pug off for far too long - it cataloguing and de-duplicating my 1Tb or so of transporotation images taken over the years. Many I’ll-advised cataloguing methods were tried including Titling, Classifying by mode, year and location - so much so I have ended up with one almighty non-standard mess that I’ve got to rectify. (As many images ended up being duplicated in different classifications - so could appear 5 times, just bloating the storage and database.

I’ve looked at various photo cleaners on the MS store, but only after downloading do you find it’s a trial and you have to pay £35 per annum! Have you had this problem, how did you solve it and is there a software package for PC or Mac that you recommend? TIA
 
Sponsor Post - registered members do not see these adverts; click here to register, or click here to log in
R

RailUK Forums

jfollows

Established Member
Joined
26 Feb 2011
Messages
6,055
Location
Wilmslow
If you're looking for identical duplicates, which I use for video files, then "rdfind" is a useful utility - https://github.com/pauldreik/rdfind. It compares identically-sized files by SHA sum (I think) to make sure they're really identical. I use it on my Mac (via Macports).
It’s free.

Code:
bash-5.2$ cat ~/find_duplicates
#!/opt/local/bin/bash
delete_option=""
delete_argument=$(echo "$1" | tr '[:upper:]' '[:lower:]')
if [[ "$delete_argument" == "delete" ]]; then delete_option="-deleteduplicates true"; fi
find /Volumes -maxdepth 2 -type d -name "<subdirectory>" -print0 2>/dev/null | xargs -0 rdfind -minsize 100000000 -outputname rdfind_results_$(date +"%d-%m-%Y").txt $delete_option
# Add
# -deleteduplicates true
# to the end of the command to delete the duplicate files
# Invoking this script with the first argument "delete" (any case, no quotes) will achieve the same thing

Substitute <subdirectory> with a real subdirectory name - in my case it's the same name on multiple volumes. And reduce the minimum size for photos. Essentially the "find" command defines what to look for and where - multiple directories on different volumes in my case - and then pipes the output to the utility which looks to see which are identical.
It can take a long time to run, so overnight could be a good option, first time without deleting anything to see what it reports.

I don’t like GUI interfaces, but if you don’t like the command line this probably won’t be for you.
 
Last edited:

Buzby

Member
Joined
14 Apr 2023
Messages
646
Location
Glasgow, Scotland
I gave up using bash years ago when inattentive to detail It caused much consternatio! Ideally I’m after a download solution that is a package that will de-dupe and recover space - any thoughts?
 

Buzby

Member
Joined
14 Apr 2023
Messages
646
Location
Glasgow, Scotland
Problem solved! After downloading and evaluating some 6 different products, and the incredibly frustrating task of the developers (and Microsoft) conspiring to obfuscate the costs of the utility until AFTER it had been downloaded and the ‘trial’ had expired.

The Auslogics Duplicate File Finder is incredibly fast, creates a list of duplicates and invites you to select which to remove (but cleverly won’t let you remove all of them, one must be retained). I’ve managed to get rid of 7k dupes and my HDD is slowly recovering. The cost? Absolutely free - so if you are in a similar situation I can highly recommend it.
 

ac6000cw

Established Member
Joined
10 May 2014
Messages
3,228
Location
Cambridge, UK
I used Auslogics Duplicate File Finder for while, but I switched to AllDup recently which is also very good (and also free). I also use FreeFileSync to synchronise my photo & video backups to external drives.
 

Buzby

Member
Joined
14 Apr 2023
Messages
646
Location
Glasgow, Scotland
Thanks for the pointers I will have a look at them. Another tricky problem is how to catalogue stills and videos effectively. I came across an app that would allow you to ‘tag’ the content - however rather than add this to the image, it simply created a database only searchable on the device that made it - which I found out only after the first 250 were completed.

Is there a good way to list the file contents and make it easily retrievable? I’ve seen AI programmes that will ‘describe‘ the content in text but it’s quite a laborious process!
 

etr221

Member
Joined
10 Mar 2018
Messages
1,082
My starting comment is perhaps that if I were you, I wouldn't start from here: it's not a good place. But it is where you are, so the only place you can... except where you'll be tomorrow, which is probably worse.

Secondly, this is perhaps a subject for photograhy fora, rather than this one. That is where you will find the right sort of expert.

It does sound like you need to have a good think and review of your workflow: a search for 'photography workflow tagging' will yield many 'best workflow' recommendations - look through them and use as a base for what you want to do, based on what you have, and what you want. Don't expect anyone else's ideas to match yours! Once you think you know what you want, give it a try for a small number (how small? how much are you happy redoing?), to see if it really works for you. Then reconsider, to ensure you are sure.

One thing you will have to think about (because a lot of your workflow will follow from it) is what software you will use. A lot of people use Adobe's offerings (Lightroom, Photoshop....) which cost: there are a lot of free alternatives: Darktable and GIMP are highly regarded alternatives to Lightroom and Photoshop, beyond them https://alternativeto.net/software/photo-mechanic/?license=free lists a number of free options, or just seek, and see what you can find. (Google, etc. are your friend - and when you've found something, read all about before you go and install it. Sometimes a barge pole is something useful to have to hand)

On the more specific issue of tagging and recording, there are two options: including tags within the image file - this is what EXIF and IPTC contain, I believe - I'm no expert, so look them up (Exiftool seems to be good for manipulating Exif data); or an 'external' database, either completely separate (essentially an office, rather than photograpy, product), or included in some photo manipulation/management products (Darktable and Lightroom both have this): look to see whether they also write it to EXIF/IPTC.
 

Top