**What is hashing? **

**What is hashing?**

Hashing is a method that reliably takes an input and converts it into a hash (output), the hash cannot be converted back into the input (usually).

**Why Hash in the first place?**

**Why Hash in the first place?**

To understand why we hash things, first ask yourself. How can facebook check that your password is correct without having it saved? The answer is Hashing.

Hashing can also be used to quickly (Like really quickly) find where you previously stored something. This is what we will explore.

**Let’s start with some definitions.**

**Let’s start with some definitions.**

Hashmap: A table/array with a hash function

Hash Function: A function that takes input x and spits out output y –> Cannot convert y to x again.

**Hash Function Properties**:

- Can map any key to the table with index 0 -> M-1
- Should have fast computation
- Should distribute keys uniformly to the table
- Should try to avoid collisions

**Hash Function Types:**

*: h(k) = k % m / (% = modulus, m is a large prime number)*

**Division Hash***Example*: Given table size m = 10

h(89) = 89 % 10 = 9 –>

**insert element 89 into slot 9**

*h(k) = Truncate(m * frac(k*A)) Where 0 < A < 1, frac = get fraction portion*

**Multiplication Hash:***Each item is equally likely to hash to a slot independently of other items.*

**Simple Uniform Hash (SUH):***: SUH with Chain based collision resolution takes O(1+@) to search*

**Theorem***: For chaining, use load factor @ ~1*

**Rule of thumb**Sounds good so far, what’s the issue?

* Collisions man, collisions suck.* A collision is when a hashing function hashes 2 separate elements into 1 slot on a table (Tables/arrays aren’t infinite in size ya know).

*—->*

**How do we solve them?**

*Collision Resolution Algorithms*### *Collision Resolution Algorithms*

*Collision Resolution Algorithms*

#### Chaining / (Open Hashing)

* Chaining:* Each table entry is a linked list

**: When a collision happens, just append to the linked list and carry on**

*Collision Resolution**: When a collision happens, we append to a linked list. Usually, hashing has an O(1) constant lookup time but with open hashing, we run the risk of having to search through a linked list. If somehow our hashing function sucks and all the elements are in one table entry, then our lookup time becomes O(n). (BAD)*

**Issue***For good hash functions, get n/m elements in each slot*

**Ideally:**### Closed Hashing/ (Open Addressed Hash)

Each slot can only hold one element, Load factor < 1, ideally Load Factor = 0.5

* Linear Probing *

** Linear Probing**: Collision Function is linear

*: h*

**Hashing Function**_{i}(k) = [h

^{‘}(k) + i] % m

*: Oops my parking spot is taken, go to next available spot*

**Intuition***Primary Clustering: Interference with neighboring hashes, can take significantly longer to find the element you are looking for.*

**Problem:*** Quadratic Probing *

** Quadratic Probing**: Collision Function is quadratic in nature

*: h*

**Hashing Function**_{i}(k) = [h

^{‘}(k) + C

_{0}+ C

_{1}*i + C

_{2}*i

^{2}] % m

*: Oops my parking spot is taken, go to next available spot but look for the next available spot in a quadratic fashion (I guess).*

**Intuition**

**Problems:**- – Secondary Clustering: Interference between Items of the same hash.
- – May get a half-empty table with unreachable spots (Due to quadratic nature of collisions)
- – Need to use a lazy form of deleting

* Double Hashing *

** Double Hashing**: Use 2 hash functions

*: h*

**Hashing Function**_{i}(k) = [h

_{1}(k) + i * h

_{2}(k)] % m

*: Oops this parking spot is full, let’s magically look in another spot for some unknown reason.*

**Intuition***Overhead of computing 2 hashes.*

**Problem:*** Cuckoo Hashing *

** Cuckoo Hashing:** Use 1 hash function, but 2 tables

*: Can be any hash function*

**Hashing Function***: Oops this parking spot is full, let’s tow their car to another lot. and repeat.*

**Intuition***Apply the first hash on x, h*

**Algorithm for inserting (x):**_{1}(x), place x in table 1 with h

_{1}(x). If the cell is occupied by y, move y to location h

_{2}(y) in table 2. Repeat

**RECURSIVELY**

*Constant lookup time given (num elements/size of table) < 1/2.*

**Guarantees:***The probability of a cycle is very low*

**Note:**

**.***Can get stuck in an infinite loop (rare but possible). if this happens then rehash enough elements to break the loop.*

**Problem:**