This subject seems to be a recurring question:
http://superuser.com/questions/48916/how-best-to-compare-huge-directory-trees
http://serverfault.com/questions/39534/best-way-to-compare-diff-a-full-directory-structure
As much as I appreciate existing solutions, they do not perform precisely what is requested and therefore can be described rather as workarounds.
What I would like to present in this post is a utility capable of 1) creation of directory tree snapshots in a size-efficient manner, 2) fast comparison of two snapshots revealing files present in only one of them.
This kind of functionality is useful both for tracking of what is going on in your filesystem, as well as (especially if you use command line a lot and/or sometimes tend to lose focus) ensuring that you didn’t accidentally delete any important files. So basically a dream tool for paranoid people like myself who like to exert control over everything 😉
The usage is really straightforward (mind the trailing slashes) and pretty self-explanatory:
python2.7 dirsnap.py --snap /selected/directory/ --out snap1.out.gz ... a few days later ... python2.7 dirsnap.py --snap /selected/directory/ --out snap2.out.gz python2.7 dirsnap.py --compare snap1.out.gz snap2.out.gz
which should normally print to standard output comparison results looking similar to the following:
L /selected/directory/file_only_in_snap1 R /selected/directory/file_only_in_snap2
Where L and R respectively mark file paths found only in the first specified snapshot file and the second specified snapshot file.
There are as well some options allowing not to display Unix-style hidden files and/or limiting depth of search.
If you’re looking to add file date/size comparison, the code is extremely straightforward and consists of only 150 lines. So easy but finally something that does exactly what was needed 😉
Thanks for the complete information. You helped me.