July 17, 2019 • ☕️ 3 min read

**Hash functions** are used to map large data sets of elements of an arbitrary
length (*the keys*) to smaller data sets of elements of a fixed length
(*the fingerprints*).

The basic application of hashing is efficient testing of equality of keys by comparing their fingerprints.

A *collision* happens when two different keys have the same fingerprint. The way
in which collisions are handled is crucial in most applications of hashing.
Hashing is particularly useful in construction of efficient practical algorithms.

A **rolling hash** (also known as recursive hashing or rolling checksum) is a hash
function where the input is hashed in a window that moves through the input.

A few hash functions allow a rolling hash to be computed very quickly — the new hash value is rapidly calculated given only the following data:

- old hash value,
- the old value removed from the window,
- and the new value added to the window.

An ideal hash function for strings should obviously depend both on the *multiset* of
the symbols present in the key and on the *order* of the symbols. The most common
family of such hash functions treats the symbols of a string as coefficients of
a *polynomial* with an integer variable `p`

and computes its value modulo an
integer constant `M`

:

The *Rabin–Karp string search algorithm* is often explained using a very simple
rolling hash function that only uses multiplications and
additions - **polynomial rolling hash**:

H(s

_{0}, s_{1}, …, s_{k}) = s_{0}_ p^{k-1}+ s_{1}_ p^{k-2}+ … + s_{k}* p^{0}

where `p`

is a constant, and *(s _{1}, … , s_{k})* are the input
characters.

For example we can convert short strings to key numbers by multiplying digit codes by
powers of a constant. The three letter word `ace`

could turn into a number
by calculating:

key = 1 _ 26

^{2}+ 3 _ 26^{1}+ 5 * 26^{0}

In order to avoid manipulating huge `H`

values, all math is done modulo `M`

.

H(s

_{0}, s_{1}, …, s_{k}) = (s_{0}_ p^{k-1}+ s_{1}_ p^{k-2}+ … + s_{k}* p^{0}) mod M

A careful choice of the parameters `M`

, `p`

is important to obtain “good”
properties of the hash function, i.e., low collision rate.

This approach has the desirable attribute of involving all the characters in the input string. The calculated key value can then be hashed into an array index in the usual way:

```
function hash(key, arraySize) {
const base = 13
let hash = 0
for (let charIndex = 0; charIndex < key.length; charIndex += 1) {
const charCode = key.charCodeAt(charIndex)
hash += charCode * base ** (key.length - charIndex - 1)
}
return hash % arraySize
}
```

The `hash()`

method is not as efficient as it might be. Other than the
character conversion, there are two multiplications and an addition inside
the loop. We can eliminate one multiplication by using **Horner’s method*:

a

_{4}_ x^{4}+ a_{3}_ x^{3}+ a_{2}_ x^{2}+ a_{1}_ x^{1}+ a_{0}= (((a_{4}_ x + a_{3}) _ x + a_{2}) _ x + a_{1}) _ x + a_{0}

In other words:

H

_{i}= (P * H_{i-1}+ S_{i}) mod M

The `hash()`

cannot handle long strings because the hashVal exceeds the size of
int. Notice that the key always ends up being less than the array size.
In Horner’s method we can apply the modulo (%) operator at each step in the
calculation. This gives the same result as applying the modulo operator once at
the end, but avoids the overflow.

```
function hash(key, arraySize) {
const base = 13
let hash = 0
for (let charIndex = 0; charIndex < key.length; charIndex += 1) {
const charCode = key.charCodeAt(charIndex)
hash = (hash * base + charCode) % arraySize
}
return hash
}
```

Polynomial hashing has a rolling property: the fingerprints can be updated efficiently when symbols are added or removed at the ends of the string (provided that an array of powers of p modulo M of sufficient length is stored). The popular Rabin–Karp pattern matching algorithm is based on this property