{"id":23879217,"url":"https://github.com/tsembp/epl231-groupassignment","last_synced_at":"2026-04-21T03:31:59.826Z","repository":{"id":263118242,"uuid":"889402452","full_name":"tsembp/EPL231-GroupAssignment","owner":"tsembp","description":"Search Engine Implementation using TrieNode/TrieTree data structure","archived":false,"fork":false,"pushed_at":"2024-12-02T12:20:22.000Z","size":79395,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-03T18:37:04.715Z","etag":null,"topics":["algorithms","algorithms-and-data-structures","data-structures","hashing","heaps","java","trie-tree"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tsembp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-16T09:35:19.000Z","updated_at":"2024-12-24T17:52:33.000Z","dependencies_parsed_at":"2024-11-30T13:39:29.287Z","dependency_job_id":null,"html_url":"https://github.com/tsembp/EPL231-GroupAssignment","commit_stats":null,"previous_names":["tsembp/epl231-groupassignment"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tsembp/EPL231-GroupAssignment","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsembp%2FEPL231-GroupAssignment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsembp%2FEPL231-GroupAssignment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsembp%2FEPL231-GroupAssignment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsembp%2FEPL231-GroupAssignment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tsembp","download_url":"https://codeload.github.com/tsembp/EPL231-GroupAssignment/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsembp%2FEPL231-GroupAssignment/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32075225,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-21T02:38:07.213Z","status":"ssl_error","status_checked_at":"2026-04-21T02:38:06.559Z","response_time":128,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","algorithms-and-data-structures","data-structures","hashing","heaps","java","trie-tree"],"created_at":"2025-01-03T22:34:35.078Z","updated_at":"2026-04-21T03:31:59.808Z","avatar_url":"https://github.com/tsembp.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EPL231 GroupAssignment (Group 28)\n\n## Project Overview\nThis project implements a search engine and data structure utilities using Java. The system uses Tries, Robin Hood Hashing, and a MinHeap to store and process data efficiently. The functionality includes loading dictionaries, validating word presence, processing scripts, and finding relevant words based on various criteria.\n\n## Components\n\n### Main Class: `Main`\n- Entry point of the application.\n- Accepts command-line arguments for dictionary and script file paths.\n- Steps:\n  1. Loads a dictionary into a Trie.\n  2. Validates the dictionary entries.\n  3. Processes a script file.\n  4. Starts the search engine.\n\n### Utility Classes\n\n#### 1. `BatchTrieTest`\n- Automates the process of testing Tries with varying input sizes.\n- Measures memory usage and writes averages to an output file.\n\n#### 2. `DictionaryLoader`\n- Handles the loading and validation of dictionary files.\n- Supports processing script files and cleaning input data.\n\n#### 3. `SearchEngine`\n- Provides user interaction to search for words and their alternatives.\n- Implements three criteria to find relevant words:\n  1. Matching words.\n  2. Words differing by two characters.\n  3. Valid words based on input length.\n- Uses a `MinHeap` to prioritize and display results.\n\n#### 4. `MinHeap`\n- Data structure to store strings along with their importance values.\n- Ensures efficient retrieval of the least important elements in order to update the heap with the most important elements.\n\n#### 5. `RobinHoodHashing`\n- Implements hash-based storage with Robin Hood hashing.\n- Includes insertion, search, and rehashing functionalities.\n\n#### 6. `Element`\n- Represents an element in a hash table.\n- Attributes:\n  - `key`: Character key.\n  - `importance`: Integer importance.\n  - `node`: Reference to a TrieNode.\n\n#### 7. `TrieNode` and `TrieNodeStatic`\n- Core data structures for the Trie:\n  - `TrieNode` supports dynamic insertion and search.\n  - `TrieNodeStatic` provides static implementation and memory calculations.\n- Stores words, validates presence, and computes memory usage.\n\n#### 8. `FixedLengthWordGenerator` and `RandomWordGenerator`\n- **FixedLengthWordGenerator**:\n  - Generates a specified number of random words, all of a fixed length.\n  - Saves the generated words into a text file in the specified directory.\n  - Ensures directories are created if they don't already exist.\n\n- **RandomWordGenerator**:\n  - Generates random words with lengths determined probabilistically.\n  - Uses cumulative probabilities to decide the word length.\n  - Saves the generated words to a text file in the specified directory.\n  - Supports flexible word generation for varying lengths.\n\n#### Word Length Distribution\n\nThe chart below illustrates the distribution of word lengths generated using the **RandomWordGenerator**, showing that the average word length is approximately 9.34:\n\n![Word Length Distribution](word-length-distribution-chart.png)\n\n## Execution\n\n### Requirements\n- Java 8 or later.\n- Input files (dictionary and script).\n\n### Compilation\n```bash\njavac *.java\n```\n\n### Running the Program\n```bash\njava Main \u003cdictionaryFile\u003e \u003cscriptFile\u003e\n```\nExample:\n```bash\njava Main \"./Dictionaries/Plan Example/planDict.txt\" \"./Dictionaries/Plan Example/planScript.txt\"\n```\n\n\n## File Structure\n- `Main.java`: Entry point.\n- `BatchTrieTest.java`: Automation of Trie tests.\n- `DictionaryLoader.java`: Handles dictionary operations.\n- `SearchEngine.java`: Search-related operations.\n- `MinHeap.java`: Min-Heap implementation.\n- `RobinHoodHashing.java`: Hash table implementation.\n- `Element.java`: Represents hash table elements.\n- `TrieNode.java` / `TrieNodeStatic.java`: Trie implementations.\n- `FixedLengthWordGenerator.java`: Generates fixed-length words.\n- `RandomWordGenerator.java`: Generates random words with varying lengths.\n\n## Memory Analysis\n- `BatchTrieTest` provides memory usage statistics for different Trie sizes.\n- Static memory usage calculations:\n  - `TrieNodeStatic`: 112 bytes per node.\n  - `TrieNode`: Dynamic based on Robin Hood hash size.\n\n## Limitations\n- Only supports ASCII character sets.\n- Dictionary and script files must follow specific formats.\n\n\n## Sources\n### Average English Word Length Distribution\n- https://www.thefreelibrary.com/On%2Bword-length%2Band%2Bdictionary%2Bsize.-a0189832222?utm_source=chatgpt.com\n### 466k English Word Dictionary\n- https://github.com/dwyl/english-words\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftsembp%2Fepl231-groupassignment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftsembp%2Fepl231-groupassignment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftsembp%2Fepl231-groupassignment/lists"}