Introduction to hash functions
Hash functions are unique because they use cryptographic primitives and principles but are not cryptographic algorithms. Unlike symmetric and asymmetric encryption algorithms, which are designed to protect data confidentiality and are reversible operations, hash functions are designed to protect data integrity and are one-way functions.
Requirements for collision resistance
A key requirement for hash functions to do their job is collision resistance. Collision resistance means that it is difficult (but not impossible) to find two inputs to a hash function that produce the same output. The existence of collision resistance within a hash function makes it a useful tool for ensuring the integrity of data. This is because, if the hash is trusted, the fact that a piece of data hashed to the desired value means that the data has probably not been modified, since it would be difficult to find another version with the same hash output.
In order for a hash function to be collision-resistant, it needs a few different properties. These include being a one-way function, having a large state space and being a non-local function.
By definition, a one-way function is one where it is possible to go from input to output and not from output to input. In general, this means destroying some information about the input to create the output. This makes it impossible to reverse the function since a number of inputs produce the same output.
Hash functions can take any binary string as input and produce a fixed-size output. With an infinite input space and finite output space, this means that an infinite number of input maps to the same output. This means that it is definitely possible to find two inputs that produce the same hash output, but hash functions are designed to make this as difficult as possible.
Large state space
Hash functions are designed so that an attacker needs to do a brute-force search through the space of possible inputs to find a collision. The number of inputs that an attacker needs to search to be guaranteed a collision is equal to the number of possible outputs of the hash function.
The reason for this is described in the Pigeonhole Principle. Given X pigeonholes and X+1 pigeons, it is impossible to find a way to place each pigeon in its own pigeonhole (as illustrated above). This means that the upper limit on the number of guesses that an attacker needs to make to find a collision is equal to one more than the number of potential outputs of the hash function.
To protect against brute-force guessing attacks, hash functions must have a large space of possible outputs, making brute-force attacks infeasible. For reference, a hash function with a 256-bit output space has 2256 possible outputs. The known universe has an estimated 2266 atoms, which is only about 1,024 times greater. An attacker may get lucky and find a collision early in the brute-force attack, but the probability is very low.
The final important component of hash function collision resistance is non-locality. Non-locality means that similar inputs produce very dissimilar outputs. Ideally, this would mean that flipping a single bit in the input to a hash function would produce outputs that differ in half of their bits. This makes the outputs as dissimilar as possible, since flipping almost all of the bits reduces the space of possible outputs as well. There are only 256 values that differ in one bit from a particular 256-bit hash value.
Hash function non-locality is designed to protect against hill climbing attacks. In a hill-climbing attack, the attacker makes a small change to an input, calculates the output and checks if they have moved in the “right” direction. If so, they retain the change. If not, they discard it and try another change.
If a hash function produced similar outputs from similar inputs, there would be some feedback that an attacker could use to identify if they are moving in the “right” direction. For example, if an attacker is targeting a hash value of zero, any change that decreases the hash output value likely moves them to their goal. A hill climbing attack is much faster and easier than a brute-force attack, so hash functions must minimize the existence of feedback that an attacker could use.
Real-world applications of hash functions
Hash functions are designed to be collision-resistant, one-way functions. Additionally, they are deterministic, meaning that they always produce the same output from the same input (no randomness involved).
A well-designed hash function can be used in a number of different applications. Some common examples include integrity checking (such as message authentication codes), as part of digital signatures and for secure password storage (storing and comparing hashes instead of passwords protects the password in the event of a breach). If a hash function is secure and the hash value being compared to is trusted, then hash functions make a great protection for data integrity.