Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/robcyberlab/machine-learning-search
🔎Machine Learning Search🔍
https://github.com/robcyberlab/machine-learning-search
ai big-data data-mining data-science deep-learning inverted-index locality-sensitive-hashing machine-learning search-algorithm similarity-search
Last synced: 9 days ago
JSON representation
🔎Machine Learning Search🔍
- Host: GitHub
- URL: https://github.com/robcyberlab/machine-learning-search
- Owner: RobCyberLab
- Created: 2024-11-16T11:47:23.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-11-16T12:16:07.000Z (about 1 month ago)
- Last Synced: 2024-11-16T12:29:19.858Z (about 1 month ago)
- Topics: ai, big-data, data-mining, data-science, deep-learning, inverted-index, locality-sensitive-hashing, machine-learning, search-algorithm, similarity-search
- Language: Python
- Homepage:
- Size: 0 Bytes
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🔎Machine Learning Search🔍
In this project, we will use the database `features.db` to search for similar items using the inverted index and locality-sensitive hashing (LSH) techniques.
Note: Due to privacy policies, I am not allowed to post the dataset publicly.
---
## Table of Contents đź“‹
1. [Familiarization with Map-Reduce](#1-familiarization-with-map-reduce-)
2. [Constructing the Inverted Index](#2-constructing-the-inverted-index-)
3. [Searching the Inverted Index](#3-searching-the-inverted-index-)
4. [Constructing LSH Groups](#4-constructing-lsh-groups-)
5. [Searching with LSH](#5-searching-with-lsh-)
6. [Counting Function Calls](#6-counting-function-calls-)---
## 1. Familiarization with Map-Reduce 🔄
Study the provided framework and the `dummyMapReduce.py` library, along with the example for counting words. Modify the given example so that the map method counts the occurrences of each word within the document and calls the `emit()` method only once for each word.
---
## 2. Constructing the Inverted Index 🔍
Using the previously built framework, create the `inverted.db` database, which contains the inverted index for the dataset.
---
## 3. Searching the Inverted Index 🔎
Implement the `search_inv()` function, which performs the search for similar items using the inverted index.
---
## 4. Constructing LSH Groups 🧩
Build the `lsh.db` database, which contains a table with the same number of rows as in `features.db`, with one column for each hash band. You can use constants `b=30` and `r=5` for this task.
---
## 5. Searching with LSH 🔑
Using the previous database, search for similar elements to a given item by implementing the `search_lsh()` function. Compare the results with those obtained from the inverted index. **Important**: It is essential to use the same minhash functions as those used when constructing the database.
---
## 6. Counting Function Calls 🧮
Measure how many times the distance calculation function is called on average for both types of searches.