Most tutorials on digital currencies (such as BitCoin) say that a hash function is a cryptographic function that takes an input and always returns a binary string of length $ 256 $ in such a way that a small change in the input would result in a totally different hash.

Obviously, in the simplest interpretation, the hash function must be injective. However, since the length of the incoming messages is unlimited while there is only $ 2 ^ {256} $ binary strings, it is obvious that an injective function can not exist.

So, what is the correct interpretation? I tried to formulate a possible interpretation and I looked it up in Google to see if there is a real mathematical definition of the hash function as in the BitCoin tutorials, but I had no luck.

Anyway, here is my interpretation:

Leave $ mathcal {M} $ denotes the set of messages. We can think of this set as binary strings of unlimited length, that is, $$ mathcal {M} = Large { large (a_1, a_2, a_3, cdots, a_n, cdots) large mid a_i = 0,1 ( forall i in mathbb {N}) Large } $$ We can define a distance in this set as follows:

$$ d (m_1, m_2) = sum_ {i = 1} ^ { infty} frac {a_i} {2 ^ i} $$

Then the statement becomes something like this:

$$ forall m_1 in mathcal {M}: , , mathbb {P} ( {{f (m_2) | f (m_2) neq f (m_1) }}) = 1 – frac {c} {2 ^ {256}} $$

Where $ mathbb {P} $ It is the uniform distribution in the set of binary chains of length. $ 256 $ Y $ c $ it's a constant that $ c ll 2 ^ {256} $. For example, $ c leq 2 ^ {10} $.

Are there alternative interpretations? A formal definition maybe?