We all saw some downloads have the accompanying signature for us to verify the downloaded file. This is where the hash algorithms kick in. Using hashing algorithms we can create a digital fingerprint or signature. It is like comparing fingerprints against a person. Similar though.
When you download something the vendor gives you a way to verify that it's a complete download. After the download, we can calculate the hash of the downloaded file ourselves. Then we simply compare the provided hash with the one we calculated. It must match.
One-Way Functions in Hash Algorithms
What hashing algorithms reside on is the so-called one/way mathematical functions. These functions have a very particular property. The result of the one-way function can't be inverted back to input components. The idea is to have a mechanism that will "describe" certain objects. This "description" is the fingerprint with a fixed-length value. Let's check out how one-way mathematical functions work:
This specific property of the one-way functions was used to generate fingerprints of the digital objects. How the algorithms implement these functions using something called a hash table or rainbow table. It is the data structure with associative arrays that can map the keys to values. The specificity of hash tables is that they map variable input size to fixed output size. This, basically, means that whatever you put into the hash algorithm it will always produce the same length result.
Throughout the hash algorithms evolution there has been different standards and different output sizes:
MD5produces 128-bit output and given this length it is highly recommended not to be used in production. A collision attack is possible in a relatively short time given the short output result. This means that two different inputs can create same output. This is a big no-no in cryptography! It was created by the Ronald Rivest co-creator of the
SHA-1(Secure Hash Algorithm) The algorithm creates 160-bit output and it is still considered too short for attack by collision.
SHA-2consists of two hash functions, SHA-256 and SHA-512. There are few variants of the SHA-256 - SHA-224, SHA-384 producing 224 bits, and 394 bits respectively. SHA-512 produces 512 bits output.
SHA-3is based on the cryptographic algorithm "Keccak" and it produces output sizes as the SHA-2, 224, 256, 384, and 512.
RIPEMMD-160produces the output of 160 bits. It is based on the MD4 algorithm that MD5 replaced. However, RIPEMD-160 still stands against the attacks.
Whirpoolis based on the Advanced Encryption Standard (AES) and it produces 512-bit output.
BLAKE3produce outputs as SHA-2 and SHA-3, 224, 256, 384, and 512.
Let's test how MD5 works in real life by running it on one of our test servers. We will use three different sizes of text as input and will check out the output:
$ echo "Short text" | md5sum 74bd0714d535c21825f2c1df57b3d13b - $ echo "A bit longer text than the previous one" | md5sum 86c9dfbfe2390f84c377f5a2cbca3c67 - $ echo "Text longer than the both of the previous texts we tested" | md5sum 4adf60e3fbad28a3807ffc7def9c5fe6 -
As you can see, regardless of the length of the input the output will always be fixed. In this case 128 bits (output size of the MD5).