Jump to content

Duplicate file utility that can re-use MD5 values?


Recommended Posts

Posted

First of all I would like to say Hello to everyone on this forum.

As to my question:

Does anybody know of a duplicate file detection utility that can re-use MD5 values for future comparisons?

Let me explain. Basically, I am trying to accomplish the following:

1) I have a "main folder" containing thousands of files.

2) I would like to scan this "main folder" only once and save all the MD5 hash values for future comparisons.

3) When I add new files to the "main folder", I would like to first check for duplicates against the MD5 values from step 2.

I've seen lots of duplicate file search utilities but they all need to rescan all the folders each time you add new files, thus, taking a very long time.

TIA


Posted

If I understood you right, you are looking for something that will compare "newly added" files to a list of "old files" MD5 list?

But how would the "newly added" files be actually "added"?

I mean, maybe you need also another function (or two) like a "folder watching" utility and/or a "file adding" one. :unsure:

Which kind of files are we talking about?

I mean big sized like - say - videos or smallish like - say - batch and text files?

The performance of an OS with a folder with thousands of files may be slowed down considerably when using a file manager like Explorer.

Something like this:

http://www.datamystic.com/filewatcher.html

http://leelusoft.blogspot.com/2010/07/watch-4-folder-22.html

Which could "trigger" a batch or whatever that:

  • calculates the MD5 checksum for the new files only

and

  • compares them one by one to the saved list of MD5 for the "old" files

then

  • IF match deletes the "new" file
  • IF no match doesn't do anything

jaclaz

Posted (edited)

If I understood you right, you are looking for something that will compare "newly added" files to a list of "old files" MD5 list?

But how would the "newly added" files be actually "added"?

I mean, maybe you need also another function (or two) like a "folder watching" utility and/or a "file adding" one. :unsure:

Which kind of files are we talking about?

I mean big sized like - say - videos or smallish like - say - batch and text files?

The performance of an OS with a folder with thousands of files may be slowed down considerably when using a file manager like Explorer.

Something like this:

http://www.datamystic.com/filewatcher.html

http://leelusoft.blogspot.com/2010/07/watch-4-folder-22.html

Which could "trigger" a batch or whatever that:

  • calculates the MD5 checksum for the new files only

and

  • compares them one by one to the saved list of MD5 for the "old" files

then

  • IF match deletes the "new" file
  • IF no match doesn't do anything

jaclaz

Hi Jaclaz,

That is exactly what I am looking for however my programming skills are not that great. Would you happen to know of some software that can already handle this? Thanks for outlining it so clearly.

Edited by substorm
Posted (edited)

And don't forget the problem of removed files. That's why most software usually doesn't store/cache MD5. Because there isn't any guarantee that the directory scanned won't have all the files in the MD5 list or that they aren't changed (and need the MD5 recalculated). I don't know if the "file monitoring" examples do this or not. Besides that, MD5 is pretty computational/time intensive on multiple files or large files (I'm finding that out first-hand), so it doesn't work too well in a real-time situation.

That said, you might find CDCheck interesting, though it might not fit what you are interested in.

Edited by Glenn9999

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...