{"id":28403190,"url":"https://github.com/manishrwt15/triebenchmark","last_synced_at":"2025-06-25T18:09:43.261Z","repository":{"id":293589809,"uuid":"984521673","full_name":"Manishrwt15/TrieBenchmark","owner":"Manishrwt15","description":"A benchmark study of Trie data structure performance on real-world datasets","archived":false,"fork":false,"pushed_at":"2025-05-18T06:07:14.000Z","size":2939,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-02T01:47:59.460Z","etag":null,"topics":["benchmark","data-structures","java","performance-analysis","trie"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Manishrwt15.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-16T04:24:36.000Z","updated_at":"2025-05-18T06:07:17.000Z","dependencies_parsed_at":"2025-05-16T05:40:24.106Z","dependency_job_id":null,"html_url":"https://github.com/Manishrwt15/TrieBenchmark","commit_stats":null,"previous_names":["manishrwt15/triebenchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Manishrwt15/TrieBenchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Manishrwt15%2FTrieBenchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Manishrwt15%2FTrieBenchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Manishrwt15%2FTrieBenchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Manishrwt15%2FTrieBenchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Manishrwt15","download_url":"https://codeload.github.com/Manishrwt15/TrieBenchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Manishrwt15%2FTrieBenchmark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261927585,"owners_count":23231379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","data-structures","java","performance-analysis","trie"],"created_at":"2025-06-01T17:11:01.982Z","updated_at":"2025-06-25T18:09:43.244Z","avatar_url":"https://github.com/Manishrwt15.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TrieBenchmark\n\nA high-performance Trie data structure implementation in Java, benchmarked with a large real-world English word dataset. This project evaluates the insert and search efficiency of a Trie at multiple dataset sizes, showcasing its suitability for search-heavy applications such as autocomplete and dictionary lookup.\n\n---\n\n## Features\n\n- Efficient insertion of large datasets (up to 370,105 words)\n- Fast search operations with near constant-time performance\n- Benchmarking across multiple dataset sizes (1K, 10K, 100K, full dataset)\n- Detailed performance metrics: insert time, search time, average search time per word\n- Simple and extensible Java implementation\n\n---\n\n## Dataset\n\nThe benchmarking uses a real-world English word list (`words.txt`) containing **370,105** words sourced from an open dataset.\n\n---\n\n## How to Run\n\n### Prerequisites\n\n- Java JDK 11 or higher installed\n- (Optional) Python 3 environment with `matplotlib` and `pandas` for graph plotting\n\n### Compile and Run Benchmark\n\n```bash\njavac Trie.java TrieBenchmark.java\njava TrieBenchmark\n```\n\n## Results Summary\n\n| Dataset Size | Insert Time (ms) | Search Time (ms) | Avg Search Time per Word (ms) |\n|--------------|------------------|------------------|-------------------------------|\n| 1,000        | 1                | 0.246334         | 0.000246                      |\n| 10,000       | 15               | 0.151625         | 0.000152                      |\n| 100,000      | 26               | 1.475125         | 0.001475                      |\n| 370,105      | 93               | 0.659375         | 0.000659                      |\n\n## Analysis\n\n- **Insert time** increases approximately linearly with the size of the dataset, indicating scalable insertion performance.\n- **Search time** remains consistently low across all dataset sizes, demonstrating the efficiency of Trie for lookup operations.\n- The average search time per word is in the order of microseconds (~0.0006 ms for the largest dataset), which shows that Trie performs **near constant-time searches**.\n- Interestingly, the search time for 10,000 words was slightly better than 1,000 words, likely due to caching effects or system optimizations.\n- Overall, the Trie data structure is highly suitable for applications requiring fast and frequent searches, such as autocomplete, spell-checking, and dictionary implementations.\n\n## Plotting Graphs\n\nThe benchmark results were visualized using Python's `matplotlib` and `pandas` libraries to better understand the Trie performance trends.\n\n## Visualizations\n\nThe following graphs illustrate the performance of the Trie across various dataset sizes:\n\n### Insert Time vs Dataset Size\nShows how the time taken to insert words increases with dataset size.\n\n![Insert Time vs Dataset Size](results/insert_time_vs_dataset_size.png)\n\n---\n\n### Search Time vs Dataset Size\nIllustrates how the total search time changes with different dataset sizes.\n\n![Search Time vs Dataset Size](results/search_time_vs_dataset_size.png)\n\n---\n\n### Avg Search Time per Word vs Dataset Size\nDemonstrates the near-constant time performance of Trie searches.\n\n![Avg Search Time per Word vs Dataset Size](results/avg_time_per_word_vs_dataset_size.png)\n\n---\n\n\u003e All graphs are auto-generated using `matplotlib` and `pandas` from the benchmarking results. See the [Plotting Graphs](#plotting-graphs) section above for steps to regenerate them.\n\n### How to Generate Graphs:\n1. Make sure Python 3 is installed on your system.\n2. Create and activate a virtual environment to avoid permission issues:\n   ```bash\n   python3 -m venv venv\n   source venv/bin/activate\n   ```\n3. Install required Python libraries:\n  ```bash\n  pip install matplotlib pandas\n  ```\n4. Run the graph plotting script:\n  ```bash\n  python3 plot_graphs.py\n  ```\n5. The graphs will be saved in the results/ folder as image files (e.g., PNG).\n\n# Note:\nIf you encounter errors while installing packages system-wide, using a virtual environment is highly recommended to keep dependencies isolated and manageable.\n\n## Author\n\n**Manish Rawat**\n\n- GitHub: [https://github.com/Manishrwt15](https://github.com/Manishrwt15)\n- Email: manishrwat15@gmail.com\n- LinkedIn: [https://www.linkedin.com/in/manish-rawat-b1b61b269/](https://www.linkedin.com/in/manish-rawat-b1b61b269/)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanishrwt15%2Ftriebenchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmanishrwt15%2Ftriebenchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanishrwt15%2Ftriebenchmark/lists"}