Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gyakobo/hashing-scheme-design
In this project I'll try to demonstrate a hashing scheme for storing and retrieving data by implement Modular Division Hashing Function.
https://github.com/gyakobo/hashing-scheme-design
hash-tables hashing hashing-function modular-division njit python3
Last synced: about 2 months ago
JSON representation
In this project I'll try to demonstrate a hashing scheme for storing and retrieving data by implement Modular Division Hashing Function.
- Host: GitHub
- URL: https://github.com/gyakobo/hashing-scheme-design
- Owner: Gyakobo
- License: mit
- Created: 2024-06-23T03:19:22.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-06-23T08:41:57.000Z (6 months ago)
- Last Synced: 2024-06-23T09:25:49.301Z (6 months ago)
- Topics: hash-tables, hashing, hashing-function, modular-division, njit, python3
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Hashing Scheme Design
![image](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)
![image](https://img.shields.io/badge/windows%20terminal-4D4D4D?style=for-the-badge&logo=windows%20terminal&logoColor=white)Author: [Andrew Gyakobo](https://github.com/Gyakobo)
In this project I'll try to demonstrate a hashing scheme for storing and retrieving data.
## Introduction
Hearkening from the introduction, we'll be designing a *hash function* which would take in a variety of keys with little collision errors between the keys.
In our case we'll be working to retrieve student records (admissions creadentials, transcripts, degreeworks, etc.). We'll assume to have `12,000 students` and use an address space of `15,000`.
## Methodology
To start off, let's make our hash function. A straitforwared and commonly used method is the `Modular Division`:
$$h(k) = k \bmod N $$
Where:
* $k$ is the student ID.
* $N$ is the address space `15,000`.This method is simpler and more intuitive. We can then empirically test the number of collisions with this hash function.
1) To start off, I import the "sample" method from the random library in order to generate equidistant unique random IDs per the assignment's requirements. I then subsequently instantiate two constants: num_students and address_space
```python
from random import sample# Constants
num_students = 12000
address_space = 15000
```2) Afterwards, I create all the student IDs with a random values which range [1, 12000]. I then create a "Modular Division" hash function as well as the hash table consisting of 15000 elements. Just to note, the hash function will compress each inputted key into the range of the address space [0, 15000] no matter whether an arbitrary key is bigger or smaller than the said address space.
```python
# Generate 12,000 unique student IDs
student_ids = sample(range(1, num_students), num_students) # ensure unique IDsdef hash_function(id, N):
# N - address space
return id % N# Apply the hash function and store in hash table
hash_table = [0] * address_space
```3) Lastly, let's now test the program for any collisions. Simply put, we run a for loop to go through all the random generated IDs and check whether they were encountered before.
```python
collisions = 0
for student_id in student_ids:
hash_value = hash_function(student_id, address_space)
if hash_table[hash_value] == 0:
hash_table[hash_value] = 1
else:
collisions += 1print(collisions)
```## Results and Analysis
When it comes to the results of this simulation I actually get total sum of 0 collisions which is just fantastic. I however decided to test the hash function on more IDs and realized that the collisions start to appear and amplify at this range [1, num_students + 3000] and onwards going. If I were to improve this algorithm I would change the given variables a bit to accommodate for the collisions. For instance, I would use a prime number modular divisor (something like 15013) for the hash function or lessen the spread of the random IDs.
## License
MIT