substorm Posted January 3, 2011 Posted January 3, 2011 First of all I would like to say Hello to everyone on this forum.As to my question:Does anybody know of a duplicate file detection utility that can re-use MD5 values for future comparisons?Let me explain. Basically, I am trying to accomplish the following:1) I have a "main folder" containing thousands of files.2) I would like to scan this "main folder" only once and save all the MD5 hash values for future comparisons.3) When I add new files to the "main folder", I would like to first check for duplicates against the MD5 values from step 2.I've seen lots of duplicate file search utilities but they all need to rescan all the folders each time you add new files, thus, taking a very long time.TIA
jaclaz Posted January 3, 2011 Posted January 3, 2011 If I understood you right, you are looking for something that will compare "newly added" files to a list of "old files" MD5 list?But how would the "newly added" files be actually "added"?I mean, maybe you need also another function (or two) like a "folder watching" utility and/or a "file adding" one. Which kind of files are we talking about?I mean big sized like - say - videos or smallish like - say - batch and text files?The performance of an OS with a folder with thousands of files may be slowed down considerably when using a file manager like Explorer.Something like this:http://www.datamystic.com/filewatcher.htmlhttp://leelusoft.blogspot.com/2010/07/watch-4-folder-22.htmlWhich could "trigger" a batch or whatever that:calculates the MD5 checksum for the new files only andcompares them one by one to the saved list of MD5 for the "old" filesthenIF match deletes the "new" fileIF no match doesn't do anythingjaclaz
substorm Posted January 4, 2011 Author Posted January 4, 2011 (edited) If I understood you right, you are looking for something that will compare "newly added" files to a list of "old files" MD5 list?But how would the "newly added" files be actually "added"?I mean, maybe you need also another function (or two) like a "folder watching" utility and/or a "file adding" one. Which kind of files are we talking about?I mean big sized like - say - videos or smallish like - say - batch and text files?The performance of an OS with a folder with thousands of files may be slowed down considerably when using a file manager like Explorer.Something like this:http://www.datamystic.com/filewatcher.htmlhttp://leelusoft.blogspot.com/2010/07/watch-4-folder-22.htmlWhich could "trigger" a batch or whatever that:calculates the MD5 checksum for the new files only andcompares them one by one to the saved list of MD5 for the "old" filesthenIF match deletes the "new" fileIF no match doesn't do anythingjaclazHi Jaclaz,That is exactly what I am looking for however my programming skills are not that great. Would you happen to know of some software that can already handle this? Thanks for outlining it so clearly. Edited January 4, 2011 by substorm
Glenn9999 Posted January 4, 2011 Posted January 4, 2011 (edited) And don't forget the problem of removed files. That's why most software usually doesn't store/cache MD5. Because there isn't any guarantee that the directory scanned won't have all the files in the MD5 list or that they aren't changed (and need the MD5 recalculated). I don't know if the "file monitoring" examples do this or not. Besides that, MD5 is pretty computational/time intensive on multiple files or large files (I'm finding that out first-hand), so it doesn't work too well in a real-time situation.That said, you might find CDCheck interesting, though it might not fit what you are interested in. Edited January 4, 2011 by Glenn9999
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now