{"id":25013061,"url":"https://github.com/miltiadiss/ceid_ne4338-multidimensional-data-structures","last_synced_at":"2026-05-07T13:16:07.037Z","repository":{"id":217397195,"uuid":"743769551","full_name":"miltiadiss/CEID_NE4338-Multidimensional-Data-Structures","owner":"miltiadiss","description":"This project implements multi-dimensional indices (k-d trees, quad trees, range trees, R-trees) for querying computer scientists' data by surname, awards, and publications, with education similarity measured using LSH, comparing the methods experimentally.","archived":false,"fork":false,"pushed_at":"2024-10-08T20:16:35.000Z","size":3454,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-30T04:12:26.123Z","etag":null,"topics":["jaccard-similarity","kdtree","lsh","octtree","rangetree","rtree","web-crawler"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miltiadiss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-16T00:21:15.000Z","updated_at":"2025-03-18T19:17:19.000Z","dependencies_parsed_at":"2024-06-05T22:08:07.343Z","dependency_job_id":"e4fe3541-606f-4e3e-9247-8ef6560f578e","html_url":"https://github.com/miltiadiss/CEID_NE4338-Multidimensional-Data-Structures","commit_stats":null,"previous_names":["miltiadiss/multidimensional-data-structures","miltiadiss/ceid_ne4338-multidimensional-data-structures"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miltiadiss%2FCEID_NE4338-Multidimensional-Data-Structures","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miltiadiss%2FCEID_NE4338-Multidimensional-Data-Structures/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miltiadiss%2FCEID_NE4338-Multidimensional-Data-Structures/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miltiadiss%2FCEID_NE4338-Multidimensional-Data-Structures/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miltiadiss","download_url":"https://codeload.github.com/miltiadiss/CEID_NE4338-Multidimensional-Data-Structures/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246273555,"owners_count":20750906,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["jaccard-similarity","kdtree","lsh","octtree","rangetree","rtree","web-crawler"],"created_at":"2025-02-05T06:19:55.298Z","updated_at":"2026-05-07T13:16:06.871Z","avatar_url":"https://github.com/miltiadiss.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Overview\nThis project is part of **Multidimensional Data Strucrures \u0026 Computational Geometry** elective course in Computer Engineering \u0026 Informatics Department of University of Patras for Winter Semester 2023-2024 (Semester 7).\n\n## Dataset\nThe Scientists Dataset that we will be using for this project is created with the aid of a web crawler that extracts data form this link: https://en.wikipedia.org/wiki/List_of_computer_scientists and creates a CSV file.\n\nEvery tuple of the final CSV file will have this format: (**Surname**:String, **#Awards**:Integer, **Education**:text-vector, **#DBLP_Record**). The 3 features **Surname**, **#Awards** and **#DBLP_Record** will be used for the indexing that will be performed by the Multidimensional Structures. Also, we will implement **Locality Sensitive Hashing (LSH)** on the text vectors of the feature **Education** in order to find common semantic content between the different scientists. \n\n## Goals\nOur goal is to build the following Multidimensional Data Structures: **kd-Tree**, **Quad-Tree**, **R-Tree** and **Range Tree** and implement them on the initial Dataset in order to answer to spatial range, interval or similarity queries. For each query the user must enter:\n1. the range (A-Z) of the first letter of the scientists **Surname**\n2. the minimum threshold of the **#Awards** of the scientists\n3. the range of the **#DBLP_Record** of the scientists.\n\nThen, we will use the **LSH** method in order to filter the results returned by the trees and keep only those scientists that share common **Education** in a percentage greater than a user defined threshold. For the **LSH** we will split the text vectors into shingles of size **k=3** and the shingle signatures will be placed in buckets of **12 rows** and **15 bands**. The similarity can be calculated using the **Jaccard Coefficient**.\n\nFinally, we will compare the average case complexity and speed of the 4 structures in order to find which one is the most efficient.\n\n![Στιγμιότυπο οθόνης 2024-10-01 165405](https://github.com/user-attachments/assets/ce8d2f0e-a551-4d02-9f9f-e76e6e3411f3)\n\nFurther information about the whole implementation can be found in the technical report at the **Documentation** folder. Also, the code for the different structures can be found in the **Trees** folder.\n\n## Programming Tools \u0026 Environment\nPython, PyCharm Community Edition 2024.1.1\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiltiadiss%2Fceid_ne4338-multidimensional-data-structures","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiltiadiss%2Fceid_ne4338-multidimensional-data-structures","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiltiadiss%2Fceid_ne4338-multidimensional-data-structures/lists"}