# Hashing Algorithms

#### What is hashing?

Hashing is a method that reliably takes an input and converts it into a hash (output), the hash cannot be converted back into the input (usually).

#### Why Hash in the first place?

Hashing can also be used to quickly (Like really quickly) find where you previously stored something. This is what we will explore.

Hashmap: A table/array with a hash function
Hash Function: A function that takes input x and spits out output y –> Cannot convert y to x again.

Hash Function Properties:

• Can map any key to the table with index 0 -> M-1
• Should have fast computation
• Should distribute keys uniformly to the table
• Should try to avoid collisions

Hash Function Types:

Division Hash: h(k) = k % m / (% = modulus, m is a large prime number)
Example: Given table size m = 10
h(89) = 89 % 10 = 9 –> insert element 89 into slot 9

Multiplication Hash: h(k) = Truncate(m * frac(k*A)) Where 0 < A < 1, frac = get fraction portion

Simple Uniform Hash (SUH): Each item is equally likely to hash to a slot independently of other items.

Theorem: SUH with Chain based collision resolution takes O(1+@) to search

Rule of thumb: For chaining, use load factor @ ~1

Sounds good so far, what’s the issue?

Collisions man, collisions suck. A collision is when a hashing function hashes 2 separate elements into 1 slot on a table (Tables/arrays aren’t infinite in size ya know).
How do we solve them?
—-> Collision Resolution Algorithms

### Collision Resolution Algorithms

#### Chaining / (Open Hashing)

Chaining: Each table entry is a linked list
Collision Resolution: When a collision happens, just append to the linked list and carry on
Issue: When a collision happens, we append to a linked list. Usually, hashing has an O(1) constant lookup time but with open hashing, we run the risk of having to search through a linked list. If somehow our hashing function sucks and all the elements are in one table entry, then our lookup time becomes O(n). (BAD)
Ideally: For good hash functions, get n/m elements in each slot

### Closed Hashing/ (Open Addressed Hash)

Each slot can only hold one element, Load factor < 1, ideally Load Factor = 0.5

#### Linear Probing

Linear Probing: Collision Function is linear
Hashing Function: hi(k) = [h(k) + i] % m
Intuition: Oops my parking spot is taken, go to next available spot
Problem: Primary Clustering: Interference with neighboring hashes, can take significantly longer to find the element you are looking for.

Hashing Function: hi(k) = [h(k) + C0 + C1*i + C2*i2] % m
Intuition: Oops my parking spot is taken, go to next available spot but look for the next available spot in a quadratic fashion (I guess).
Problems:

• – Secondary Clustering: Interference between Items of the same hash.
• – May get a half-empty table with unreachable spots (Due to quadratic nature of collisions)
• – Need to use a lazy form of deleting

#### Double Hashing

Double Hashing: Use 2 hash functions
Hashing Function: hi(k) = [h1(k) + i * h2(k)] % m
Intuition: Oops this parking spot is full, let’s magically look in another spot for some unknown reason.
Problem: Overhead of computing 2 hashes.

#### Cuckoo Hashing

Cuckoo Hashing: Use 1 hash function, but 2 tables
Hashing Function: Can be any hash function
Intuition: Oops this parking spot is full, let’s tow their car to another lot. and repeat.
Algorithm for inserting (x): Apply the first hash on x, h1(x), place x in table 1 with h1(x). If the cell is occupied by y, move y to location h2(y) in table 2. Repeat RECURSIVELY
Guarantees: Constant lookup time given (num elements/size of table) < 1/2.
Note: The probability of a cycle is very low.
Problem: Can get stuck in an infinite loop (rare but possible). if this happens then rehash enough elements to break the loop.