Hashing Data Structure

Why Hashing?

To get O(1) insertion and O(1) search.

You take a set of elements that are sparsely populated across large domains and you use a mathematical hash function to map these uniformly to a new but smaller spectrum of numbers.


How it is different from arrays and linked lists.

In arrays, we do get the insertion time as O(1) but the worst case for finding any elements inside the array is O(N). In Linked List, we get the insertion time as O(1), if we are adding at the beginning and O(N) if we are adding the element at the end. In stacks and queues, the search time is O(N).

But how does Hashing actually work?

Using Bucket Sort/Count sort.

Concepts of Hashing

  1. Range of collection => The collection could be of any range, if we are storing Integer type, then it could range from -2,147,483,648 to 2,147,483,647
  2. Size of collection => The size of the collection that we are pushing. 3. Range of Bucket/Hashed Values => The range for the bucket. 4. Hashing Functions => Hashing function that will reduce the clash of elements as low as possible.

Searching in HashMap

Step 1: Value => HashFunction => Hashed Key (O(1)) Step 2: Go to the hashed key bucket, you will find the bucket collection, you need to search for the value in that collection. O(K)

No matter how good a hashing function you create, just because you are mapping a much bigger domain to a much smaller range "Collision can always happen". And the collision is always costly as you need to manage the subcollection of elements. The hashing function should be defined in such a way that collision is as minimum as possible.

How we handle collision is what defines the time complexity hit we get.

Time Complexity for Linked List collision management tool = O(K) Time Complexity for Binary Search Tree Collision Management Tool=O(log K)