{"id":22868482,"url":"https://github.com/robcyberlab/machine-learning-search","last_synced_at":"2025-03-31T10:50:49.497Z","repository":{"id":263131010,"uuid":"889440169","full_name":"RobCyberLab/Machine-Learning-Search","owner":"RobCyberLab","description":"🔎Machine Learning Search🔍","archived":false,"fork":false,"pushed_at":"2024-11-16T15:45:47.000Z","size":1709,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-06T15:53:29.192Z","etag":null,"topics":["ai","big-data","data-mining","data-science","deep-learning","inverted-index","locality-sensitive-hashing","machine-learning","search-algorithm","similarity-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RobCyberLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-16T11:47:23.000Z","updated_at":"2024-11-16T15:45:50.000Z","dependencies_parsed_at":"2024-11-16T12:29:48.846Z","dependency_job_id":"a8f1fa32-5cfd-45a3-83d3-e631b134d143","html_url":"https://github.com/RobCyberLab/Machine-Learning-Search","commit_stats":null,"previous_names":["robcyberlab/machine-learning-search"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobCyberLab%2FMachine-Learning-Search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobCyberLab%2FMachine-Learning-Search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobCyberLab%2FMachine-Learning-Search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobCyberLab%2FMachine-Learning-Search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RobCyberLab","download_url":"https://codeload.github.com/RobCyberLab/Machine-Learning-Search/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246458022,"owners_count":20780675,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","big-data","data-mining","data-science","deep-learning","inverted-index","locality-sensitive-hashing","machine-learning","search-algorithm","similarity-search"],"created_at":"2024-12-13T12:35:21.210Z","updated_at":"2025-03-31T10:50:49.480Z","avatar_url":"https://github.com/RobCyberLab.png","language":"Python","readme":"# 🔎Machine Learning Search🔍\n\n\nIn this project, we will use the database `features.db` to search for similar items using the inverted index and locality-sensitive hashing (LSH) techniques.\n\nNote: Due to privacy policies, I am not allowed to post the dataset publicly.\n\n---\n\n## Table of Contents 📋\n1. [Familiarization with Map-Reduce](#1-familiarization-with-map-reduce-)\n2. [Constructing the Inverted Index](#2-constructing-the-inverted-index-)\n3. [Searching the Inverted Index](#3-searching-the-inverted-index-)\n4. [Constructing LSH Groups](#4-constructing-lsh-groups-)\n5. [Searching with LSH](#5-searching-with-lsh-)\n6. [Counting Function Calls](#6-counting-function-calls-)\n\n---\n\n## 1. Familiarization with Map-Reduce 🔄\n\nStudy the provided framework and the `dummyMapReduce.py` library, along with the example for counting words. Modify the given example so that the map method counts the occurrences of each word within the document and calls the `emit()` method only once for each word.\n\n---\n\n## 2. Constructing the Inverted Index 🔍\n\nUsing the previously built framework, create the `inverted.db` database, which contains the inverted index for the dataset.\n\n---\n\n## 3. Searching the Inverted Index 🔎\n\nImplement the `search_inv()` function, which performs the search for similar items using the inverted index.\n\n---\n\n## 4. Constructing LSH Groups 🧩\n\nBuild the `lsh.db` database, which contains a table with the same number of rows as in `features.db`, with one column for each hash band. You can use constants `b=30` and `r=5` for this task.\n\n---\n\n## 5. Searching with LSH 🔑\n\nUsing the previous database, search for similar elements to a given item by implementing the `search_lsh()` function. Compare the results with those obtained from the inverted index. **Important**: It is essential to use the same minhash functions as those used when constructing the database.\n\n---\n\n## 6. Counting Function Calls 🧮\n\nMeasure how many times the distance calculation function is called on average for both types of searches.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobcyberlab%2Fmachine-learning-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobcyberlab%2Fmachine-learning-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobcyberlab%2Fmachine-learning-search/lists"}