Hashing use cases (why would I want to hash my files?)

Discuss anything related to portable freeware here.
Message
Author
User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Hashing use cases (why would I want to hash my files?)

#1 Post by webfork »

Background

Something that's been bothering me that I couldn't quite name was the ongoing submission of a TON of different hashing tools. Normally I like a variety, but this is a really crowded area and there are really only 3 reasons to use hashes:
  • Check VirusTotal - basically find out a given file's reputation. Algorithm required: I understand the service will check SHA1 files, but for the most part all this site cares about is SHA256.
  • File verification - making sure the file or files you downloaded from somewhere are really the file you were looking for. Algorithm required: SHA1 and MD5 seem to be the only ones anyone uses despite security concerns with both. EDIT: I've seen more and more posts with the far more secure SHA 256, format largely due to support from Github.
  • Change analysis - seeing if a file or files have changed. Examples include if a CD is corrupt. Algorithm required: CRC32, Blake2, or SHA-1 are all that's necessary for this kind of analysis, especially if also checking against file size or even just file type. Anything else is probably overkill unless you're dealing with billions of files.
If developers come up with a hashing program that doesn't improve on existing tools that do one of the above tasks, they are wasting time. Still, I have little doubt I'll see at least 10 more MD5 file analyzers before the end of the year.

For reference, here are my programs of choice for the above requirements: 1. SigCheckGUI, 2. fHash, and 3. RapidCRC Unicode.


Periphery (not quite hashing but in the neighborhood):
  • Duplicate check - this is usually done behind the scenes by a duplicate-checking program (like DoubleKiller). You rarely see or care about the actual hash of a given file. Algorithm required: usually CRC32 unless looking through millions of files, and then MD5 or SHA1.
  • Search for a file by the hash - this is one of the reasons I use SigcheckGUI on files I highlight here on the site: that makes it possible to find a file when you only have an old hash. Algorithm required: due to popularity, MD5 or SHA1, but I'm increasingly finding files by their SHA256 code.
  • Torrents - a series of segmented SHA1 hash files, they can definitely check files for file changes and do verification, assuming you trust the torrent. However, it's very inefficient compared to other methods. On the upside, the hashes are portable and everyone has a torrent client so it could certainly do the trick.
  • Data redundancy - e.g. Multipar - also like torrents, not really hash but can verify and fix broken files. Solely for file verification alone, it's hugely inefficient in terms of space and processor usage by comparison to other hashing tools.
Did I miss any?

---

Related:
Last edited by webfork on Wed Dec 16, 2020 11:47 am, edited 1 time in total.

User avatar
SYSTEM
Posts: 2041
Joined: Sat Jul 31, 2010 1:19 am
Location: Helsinki, Finland

Re: Hashing use cases

#2 Post by SYSTEM »

webfork wrote:File verification - making sure the file or files you downloaded from somewhere are really the file you were looking for. Algorithm required: SHA1 and MD5 seem to be the only ones anyone uses despite security concerns with both.
I use a hash calculator to verify that files I download haven't been corrupted during the download. For that purpose, SHA1 and MD5 are perfectly sufficient.
My YouTube channel | Release date of my 13th playlist: August 24, 2020

User avatar
Midas
Posts: 6710
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Hashing use cases

#3 Post by Midas »

SYSTEM wrote:I use a hash calculator to verify that files I download haven't been corrupted during the download.

+1 :)

User avatar
JohnTHaller
Posts: 715
Joined: Wed Feb 10, 2010 4:44 pm
Location: New York, NY
Contact:

Re: Hashing use cases

#4 Post by JohnTHaller »

The PortableApps.com Platform uses MD5 hashes for download verification on all apps in the updater/app store. The PortableApps.com Installer uses MD5 hashes for download verification in live installers where the app can't legally be repackaged as per publisher request or EULA. We also publish the MD5 sums to each app download page for end users that manually download. This is able to catch infected mirrors, incomplete downloads, corrupted mirroring, and a typical malicious actor inserting themselves into the network with a substituted fake file. The hash check serves as a nice additional security measure on top of the installer's built in CRC self check and code signing of the installer itself.

We're considering switching to SHA2 in a future release for even better security as that would also protect against a more advanced malicious actor custom-creating a badware file that matched everything including the MD5 sum, which is difficult but not impossible.
PortableApps.com - The open standard for portable software | Support Net Neutrality

User avatar
Midas
Posts: 6710
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Hashing use cases

#5 Post by Midas »

Quick note to link an extensive list of related tools posted to TPFC: viewtopic.php?t=6358 ...

For its sheer uniqueness, let me add mention of another, sadly but expectably non-portable, shell extension with hashing and VirusTotal capabilities.
http://kvsoft.at.ua/peid_tab/peid_tab_en.html wrote:PEiD Tab, a free utility that extends the possibilities of Windows Explorer by adding a function studies of PE files, which allows the compiler to know, and, consequently, the programming language used for writing the program, packer or kriptora.
Image

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Hashing use cases

#6 Post by webfork »

Midas wrote:a free utility that extends the possibilities of Windows Explorer by adding a function studies of PE files, which allows the compiler to know, and, consequently, the programming language used for writing the program, packer or kriptora.
Yeah, that's what I was looking for when I came up with this. Great add.
Midas wrote:Quick note to link an extensive list of related tools posted to TPFC: viewtopic.php?t=6358
Good related post, thanks.

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Hashing use cases

#7 Post by webfork »

Two other possible uses of file hashing:

1. Unique password generator/manager (doesn't work on sites with case or special character requirements) - create or generate random file, run hash somewehre bewteen CRC32 or MD5 hash (creates an 8 or 32 character password respectively) , and use the value as the password. The caveat here is that if the file is ever accidentally corrupted or modified, the password of course cannot be recovered.

2. Unmanaged file server trick - A combination of verification and change detection, I use this on a file server (Windows shared folder) whose file permissions are very open. It's often unclear who did what. To address this, I've been adding a hash code to the filename (filename [A3JS0N].txt) via RapidCRC Unicode to ensure the last file I posted is the copy I posted. Also the odd text seems to throw people off and they leave it alone unless specifically pointed to the file.

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Hashing use cases

#8 Post by webfork »

Found another great trick this weekend ...

* Duplicate photo names - I frequently pull images down from my camera with with sequential or semi-original filenames e.g. IMG038.jpg but when I try to restore from backup or combine different groups of photos, there's often overlap. I want to add those pictures to the same collection and avoid both overwrite or duplicate images. By integrating a hash into the filename, such as the CRC32 code in the file above IMG_038 [A97131F8].jpg, I can safely avoid both issues. The rename of course enables the ability to quickly check the files for errors and hopefully restore the correct file from backup.

As with the "Unmanaged File Server" case above, I used RapidCRC Unicode to do this in batch.

User avatar
Midas
Posts: 6710
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Hashing use cases (why would I want to hash my files?)

#9 Post by Midas »

A great tip to better manage my ever growing digital image collection. Thanks. :sunglasses:

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Hashing use cases (why would I want to hash my files?)

#10 Post by webfork »

Two more use cases ...
  • Source agnostic - It doesn't matter if you download it from a trusted website, file sharing network, obscure backup, or a thumb drive lost in an airport, the file only requires that it has the right hash. This doesn't mean any medium is a trusted medium (outdated browsers can accept malware by just visiting a website), but once you have the file, you can trust it. (Usually SHA 256.)

    This was more important in the early days of the Internet when broadband was rare and sometimes you had to use whatever hosting was available, but still follows.
  • Search by Hash - It is possible for some popular files to enter into a search engine and have a mirror appear.
    For example, searching for Notepad 2 Portable's MD5 has reveals it's home page and other sites. This mostly works on older hashes.

User avatar
Andrew Lee
Posts: 3052
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: Hashing use cases (why would I want to hash my files?)

#11 Post by Andrew Lee »

I personally use RapidCRC to prevent bit rot.

So I have 2 portable HDDs that I use to backup my file server (parent, grandparent) via robocopy.

From time to time (every other backup or so), I create a SFV that contains the hashes of all the files on that HDD.

Then occasionally, I use the SFV to verify all the files on that HDD to make sure there is no bit rot.

User avatar
Midas
Posts: 6710
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Hashing use cases (why would I want to hash my files?)

#12 Post by Midas »

Andrew Lee wrote:From time to time (every other backup or so), I create a SFV that contains the hashes of all the files on that HDD.

Good to know. Your post solves a lingering blank spot I had about my hashing strategy... 8)

I also mostly use RapidCRC, along with an older copy of TeraCopy for quick verification (just double click the hash file).

Thus, I usually create sidecar files as I go along, on a case by case stance -- mostly MD5 for download integrity verification, SHA256 whenever security is required.

While it has served me well over time, I always felt I was missing a global and long-term strategy; your use case suggests a reasonable and easy path into one. Thank you.

Mostly for my own benefit, a couple of related links:

User avatar
Userfriendly
Posts: 430
Joined: Tue Nov 27, 2012 11:41 pm

Re: Hashing use cases (why would I want to hash my files?)

#13 Post by Userfriendly »

I've been seeing lots things related to checksums/hashes lately. I'm getting the case of Baader–Meinhof phenomenon or frequency illusion.

This techspot article was posted today https://www.techspot.com/article/2199-what-is-checksum/

Which made me remember a registry context menu tweak https://www.tenforums.com/tutorials/786 ... -10-a.html

You can use powershell in windows 10 to get the hash of files. No 3rd party app required.

Another use case for hashing that I don't see mentioned a lot is to check the integrity of files when overclocking or have faulty hardware. CPU or RAM that has been OC'd or defective can lead to instability which usually means app crashes and BSOD's but can also be silently corrupting files which could lead to hard to diagnose issues. Sometimes faulty or unstable overclocked RAM can lead to file corruption when just transferring files from drive to drive because some data is usually cached in ram before it reaches storage device.

Like Disk A has pristine good files and when you transfer to Disk B the hashes of those files could be different causing corrupted data. A lot of the time you won't notice as that data would still work like pictures or videos will can mostly be displayed but will have strange artifacts in them. Apps and games that get corrupted still run but now for some reason crash randomly and you don't know why.

ECC Memory exists to prevent issues like that and would be nice if it was standard at the consumer level. Blame intel for that as Linus points out https://arstechnica.com/gadgets/2021/01 ... sumer-pcs/

User avatar
Midas
Posts: 6710
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Hashing use cases (why would I want to hash my files?)

#14 Post by Midas »

Great input, Userfriendly. Making hashing a context menu entry is definitely a good idea.

Concerning Powershell, my first attempt of automating hash creation via batch did in fact rely on it: viewtopic.php?p=91786#p91786 (it's also mentioned in an old article a couple of posts back over there).

As for the overclocking safety measures, I guess you'll probably be interested in a fully Open Source consumer grade PowerPC workstation that comes equipped with ECC RAM by default (brace yourself for the price tag! :twisted:):

My machine further came equipped with 64GB of registered ECC DDR4 RAM (running at 2666MHz)...


Oh, and here's a quick explanation of the Baader-Meinhoff Effect for the unaware (like myself):


User avatar
Userfriendly
Posts: 430
Joined: Tue Nov 27, 2012 11:41 pm

Re: Hashing use cases (why would I want to hash my files?)

#15 Post by Userfriendly »

Midas wrote: Wed Mar 03, 2021 5:18 am As for the overclocking safety measures, I guess you'll probably be interested in a fully Open Source consumer grade PowerPC workstation that comes equipped with ECC RAM by default (brace yourself for the price tag! :twisted:):
Anyone with a Ryzen machine can most likely just drop in some DDR4 Unbuffered ECC RAM and it will work. It's not too expensive at around $100 for 16GB. It's neat for anyone who wanna consider converting their Ryzen rig into a NAS file server or something. ECC memory is something data hoarders consider on top of their local and cloud storage backups. If your data is precious, better safe than sorry.

Post Reply