Page 1 of 2

Hashing use cases (why would I want to hash my files?)

Posted: Tue May 23, 2017 2:43 pm
by webfork
Background

Something that's been bothering me that I couldn't quite name was the ongoing submission of a TON of different hashing tools. Normally I like a variety, but this is a really crowded area and there are really only 3 reasons to use hashes:
  • Check VirusTotal - basically find out a given file's reputation. Algorithm required: I understand the service will check SHA1 files, but for the most part all this site cares about is SHA256.
  • File verification - making sure the file or files you downloaded from somewhere are really the file you were looking for. Algorithm required: SHA1 and MD5 seem to be the only ones anyone uses despite security concerns with both. EDIT: I've seen more and more posts with the far more secure SHA 256, format largely due to support from Github.
  • Change analysis - seeing if a file or files have changed. Examples include if a CD is corrupt. Algorithm required: CRC32, Blake2, or SHA-1 are all that's necessary for this kind of analysis, especially if also checking against file size or even just file type. Anything else is probably overkill unless you're dealing with billions of files.
If developers come up with a hashing program that doesn't improve on existing tools that do one of the above tasks, they are wasting time. Still, I have little doubt I'll see at least 10 more MD5 file analyzers before the end of the year.

For reference, here are my programs of choice for the above requirements: 1. SigCheckGUI, 2. fHash, and 3. RapidCRC Unicode.


Periphery (not quite hashing but in the neighborhood):
  • Duplicate check - this is usually done behind the scenes by a duplicate-checking program (like DoubleKiller). You rarely see or care about the actual hash of a given file. Algorithm required: usually CRC32 unless looking through millions of files, and then MD5 or SHA1.
  • Search for a file by the hash - this is one of the reasons I use SigcheckGUI on files I highlight here on the site: that makes it possible to find a file when you only have an old hash. Algorithm required: due to popularity, MD5 or SHA1, but I'm increasingly finding files by their SHA256 code.
  • Torrents - a series of segmented SHA1 hash files, they can definitely check files for file changes and do verification, assuming you trust the torrent. However, it's very inefficient compared to other methods. On the upside, the hashes are portable and everyone has a torrent client so it could certainly do the trick.
  • Data redundancy - e.g. Multipar - also like torrents, not really hash but can verify and fix broken files. Solely for file verification alone, it's hugely inefficient in terms of space and processor usage by comparison to other hashing tools.
Did I miss any?

---

Related:

Re: Hashing use cases

Posted: Tue May 23, 2017 10:41 pm
by SYSTEM
webfork wrote:File verification - making sure the file or files you downloaded from somewhere are really the file you were looking for. Algorithm required: SHA1 and MD5 seem to be the only ones anyone uses despite security concerns with both.
I use a hash calculator to verify that files I download haven't been corrupted during the download. For that purpose, SHA1 and MD5 are perfectly sufficient.

Re: Hashing use cases

Posted: Wed May 24, 2017 5:00 am
by Midas
SYSTEM wrote:I use a hash calculator to verify that files I download haven't been corrupted during the download.

+1 :)

Re: Hashing use cases

Posted: Wed May 24, 2017 9:23 am
by JohnTHaller
The PortableApps.com Platform uses MD5 hashes for download verification on all apps in the updater/app store. The PortableApps.com Installer uses MD5 hashes for download verification in live installers where the app can't legally be repackaged as per publisher request or EULA. We also publish the MD5 sums to each app download page for end users that manually download. This is able to catch infected mirrors, incomplete downloads, corrupted mirroring, and a typical malicious actor inserting themselves into the network with a substituted fake file. The hash check serves as a nice additional security measure on top of the installer's built in CRC self check and code signing of the installer itself.

We're considering switching to SHA2 in a future release for even better security as that would also protect against a more advanced malicious actor custom-creating a badware file that matched everything including the MD5 sum, which is difficult but not impossible.

Re: Hashing use cases

Posted: Mon Jun 26, 2017 4:45 pm
by Midas
Quick note to link an extensive list of related tools posted to TPFC: viewtopic.php?t=6358 ...

For its sheer uniqueness, let me add mention of another, sadly but expectably non-portable, shell extension with hashing and VirusTotal capabilities.
http://kvsoft.at.ua/peid_tab/peid_tab_en.html wrote:PEiD Tab, a free utility that extends the possibilities of Windows Explorer by adding a function studies of PE files, which allows the compiler to know, and, consequently, the programming language used for writing the program, packer or kriptora.
Image

Re: Hashing use cases

Posted: Tue Jun 27, 2017 7:25 pm
by webfork
Midas wrote:a free utility that extends the possibilities of Windows Explorer by adding a function studies of PE files, which allows the compiler to know, and, consequently, the programming language used for writing the program, packer or kriptora.
Yeah, that's what I was looking for when I came up with this. Great add.
Midas wrote:Quick note to link an extensive list of related tools posted to TPFC: viewtopic.php?t=6358
Good related post, thanks.

Re: Hashing use cases

Posted: Tue Nov 13, 2018 7:25 pm
by webfork
Two other possible uses of file hashing:

1. Unique password generator/manager (doesn't work on sites with case or special character requirements) - create or generate random file, run hash somewehre bewteen CRC32 or MD5 hash (creates an 8 or 32 character password respectively) , and use the value as the password. The caveat here is that if the file is ever accidentally corrupted or modified, the password of course cannot be recovered.

2. Unmanaged file server trick - A combination of verification and change detection, I use this on a file server (Windows shared folder) whose file permissions are very open. It's often unclear who did what. To address this, I've been adding a hash code to the filename (filename [A3JS0N].txt) via RapidCRC Unicode to ensure the last file I posted is the copy I posted. Also the odd text seems to throw people off and they leave it alone unless specifically pointed to the file.

Re: Hashing use cases

Posted: Sun Oct 11, 2020 7:26 pm
by webfork
Found another great trick this weekend ...

* Duplicate photo names - I frequently pull images down from my camera with with sequential or semi-original filenames e.g. IMG038.jpg but when I try to restore from backup or combine different groups of photos, there's often overlap. I want to add those pictures to the same collection and avoid both overwrite or duplicate images. By integrating a hash into the filename, such as the CRC32 code in the file above IMG_038 [A97131F8].jpg, I can safely avoid both issues. The rename of course enables the ability to quickly check the files for errors and hopefully restore the correct file from backup.

As with the "Unmanaged File Server" case above, I used RapidCRC Unicode to do this in batch.

Re: Hashing use cases (why would I want to hash my files?)

Posted: Mon Oct 12, 2020 7:32 am
by Midas
A great tip to better manage my ever growing digital image collection. Thanks. :sunglasses:

Re: Hashing use cases (why would I want to hash my files?)

Posted: Sun Feb 21, 2021 11:09 pm
by webfork
Two more use cases ...
  • Source agnostic - It doesn't matter if you download it from a trusted website, file sharing network, obscure backup, or a thumb drive lost in an airport, the file only requires that it has the right hash. This doesn't mean any medium is a trusted medium (outdated browsers can accept malware by just visiting a website), but once you have the file, you can trust it. (Usually SHA 256.)

    This was more important in the early days of the Internet when broadband was rare and sometimes you had to use whatever hosting was available, but still follows.
  • Search by Hash - It is possible for some popular files to enter into a search engine and have a mirror appear.
    For example, searching for Notepad 2 Portable's MD5 has reveals it's home page and other sites. This mostly works on older hashes.

Re: Hashing use cases (why would I want to hash my files?)

Posted: Wed Feb 24, 2021 6:58 pm
by Andrew Lee
I personally use RapidCRC to prevent bit rot.

So I have 2 portable HDDs that I use to backup my file server (parent, grandparent) via robocopy.

From time to time (every other backup or so), I create a SFV that contains the hashes of all the files on that HDD.

Then occasionally, I use the SFV to verify all the files on that HDD to make sure there is no bit rot.

Re: Hashing use cases (why would I want to hash my files?)

Posted: Thu Feb 25, 2021 5:14 am
by Midas
Andrew Lee wrote:From time to time (every other backup or so), I create a SFV that contains the hashes of all the files on that HDD.

Good to know. Your post solves a lingering blank spot I had about my hashing strategy... 8)

I also mostly use RapidCRC, along with an older copy of TeraCopy for quick verification (just double click the hash file).

Thus, I usually create sidecar files as I go along, on a case by case stance -- mostly MD5 for download integrity verification, SHA256 whenever security is required.

While it has served me well over time, I always felt I was missing a global and long-term strategy; your use case suggests a reasonable and easy path into one. Thank you.

Mostly for my own benefit, a couple of related links:

Re: Hashing use cases (why would I want to hash my files?)

Posted: Tue Mar 02, 2021 6:49 am
by Userfriendly
I've been seeing lots things related to checksums/hashes lately. I'm getting the case of Baader–Meinhof phenomenon or frequency illusion.

This techspot article was posted today https://www.techspot.com/article/2199-what-is-checksum/

Which made me remember a registry context menu tweak https://www.tenforums.com/tutorials/786 ... -10-a.html

You can use powershell in windows 10 to get the hash of files. No 3rd party app required.

Another use case for hashing that I don't see mentioned a lot is to check the integrity of files when overclocking or have faulty hardware. CPU or RAM that has been OC'd or defective can lead to instability which usually means app crashes and BSOD's but can also be silently corrupting files which could lead to hard to diagnose issues. Sometimes faulty or unstable overclocked RAM can lead to file corruption when just transferring files from drive to drive because some data is usually cached in ram before it reaches storage device.

Like Disk A has pristine good files and when you transfer to Disk B the hashes of those files could be different causing corrupted data. A lot of the time you won't notice as that data would still work like pictures or videos will can mostly be displayed but will have strange artifacts in them. Apps and games that get corrupted still run but now for some reason crash randomly and you don't know why.

ECC Memory exists to prevent issues like that and would be nice if it was standard at the consumer level. Blame intel for that as Linus points out https://arstechnica.com/gadgets/2021/01 ... sumer-pcs/

Re: Hashing use cases (why would I want to hash my files?)

Posted: Wed Mar 03, 2021 5:18 am
by Midas
Great input, Userfriendly. Making hashing a context menu entry is definitely a good idea.

Concerning Powershell, my first attempt of automating hash creation via batch did in fact rely on it: viewtopic.php?p=91786#p91786 (it's also mentioned in an old article a couple of posts back over there).

As for the overclocking safety measures, I guess you'll probably be interested in a fully Open Source consumer grade PowerPC workstation that comes equipped with ECC RAM by default (brace yourself for the price tag! :twisted:):

My machine further came equipped with 64GB of registered ECC DDR4 RAM (running at 2666MHz)...
https://www.osnews.com/story/133093/


Oh, and here's a quick explanation of the Baader-Meinhoff Effect for the unaware (like myself):

https://en.wikipedia.org/wiki/Frequency_illusion

Re: Hashing use cases (why would I want to hash my files?)

Posted: Wed Mar 03, 2021 8:15 am
by Userfriendly
Midas wrote: ↑Wed Mar 03, 2021 5:18 am As for the overclocking safety measures, I guess you'll probably be interested in a fully Open Source consumer grade PowerPC workstation that comes equipped with ECC RAM by default (brace yourself for the price tag! :twisted:):
Anyone with a Ryzen machine can most likely just drop in some DDR4 Unbuffered ECC RAM and it will work. It's not too expensive at around $100 for 16GB. It's neat for anyone who wanna consider converting their Ryzen rig into a NAS file server or something. ECC memory is something data hoarders consider on top of their local and cloud storage backups. If your data is precious, better safe than sorry.