{"id":28180895,"url":"https://github.com/davidogalo/file_search_server","last_synced_at":"2025-05-16T03:11:41.895Z","repository":{"id":255365300,"uuid":"825915115","full_name":"DavidOgalo/File_Search_Server","owner":"DavidOgalo","description":"A Python-based server application designed to efficiently search through a large text file. This project demonstrates the implementation of a multi-threaded server with SSL support, capable of handling multiple client queries simultaneously. The project includes configuration management, logging, and benchmarking different file-search algorithms.","archived":false,"fork":false,"pushed_at":"2024-08-29T12:46:09.000Z","size":1187,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-08-29T15:09:33.959Z","etag":null,"topics":["error-handling","logging","multithreading","scripting","search-algorithms","ssl","unit-testing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DavidOgalo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-08T18:42:54.000Z","updated_at":"2024-08-29T12:46:13.000Z","dependencies_parsed_at":"2024-08-29T15:25:51.458Z","dependency_job_id":null,"html_url":"https://github.com/DavidOgalo/File_Search_Server","commit_stats":null,"previous_names":["davidogalo/file_search_server"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidOgalo%2FFile_Search_Server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidOgalo%2FFile_Search_Server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidOgalo%2FFile_Search_Server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidOgalo%2FFile_Search_Server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DavidOgalo","download_url":"https://codeload.github.com/DavidOgalo/File_Search_Server/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254459113,"owners_count":22074606,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["error-handling","logging","multithreading","scripting","search-algorithms","ssl","unit-testing"],"created_at":"2025-05-16T03:11:35.658Z","updated_at":"2025-05-16T03:11:41.889Z","avatar_url":"https://github.com/DavidOgalo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# File_Search_Server\n\n## Overview\n\nThe `File_Search_Server` is a project designed to implement and benchmark various file search algorithms to efficiently search for strings within large text files. This README provides a comprehensive guide to the project, including setup instructions, detailed descriptions of the implemented algorithms, how to run the benchmarks, and how to interpret the results.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Project Structure](#project-structure)\n- [Setup Instructions](#setup-instructions)\n- [Usage Instructions](#usage-instructions)\n- [Running as a Linux Service](#running-as-a-linux-service)\n- [Unit Testing](#unit-testing)\n- [Implemented Algorithms](#implemented-algorithms)\n- [Running the Benchmarks](#running-the-benchmarks)\n- [Analyzing the Results](#analyzing-the-results)\n- [Limitations and Recommendations](#limitations-and-recommendations)\n- [Future Enhancements](#future-enhancements)\n- [Conclusion](#conclusion)\n\n## Project Structure\n\nThe project is organized into the following directories and files:\n\n\nFile_Search_Server/\n│\n├── README.md\n├── requirements.txt\n├── setup.sh\n├── 200k.txt\n├── server.key\n├── server.csr\n├── server.crt\n├── server.py\n├── client.py\n├── config.ini\n├── file-search_algorithms.py\n├── benchmark_results.txt\n├── test-suite_server.py\n├── multiple-queries_simulation.py\n├── speed_report.py\n└── benchmark_chart.png\n\n\n## Setup Instructions\n\n### Prerequisites\n\nEnsure you have the following installed on your machine:\n- Python 3.7+\n- pip (Python package installer)\n\n### Installation\n\n1. Setup project directory and the virtual environment\n\n2. Install the required Python packages:\n\n   pip install -r requirements.txt\n   \n## Usage Instructions \nRunning the Server\n\nTo start the server, run:\n```python\npython server.py\n```\nRunning the Client\n\nTo interact with the server, run:\n```python\npython client.py\n```\n## Running as a Linux Service\n\nTo run the File_Search_Server as a Linux daemon or service, follow these steps:\n\n1. Create a systemd service file:\n```sh\nsudo nano /etc/systemd/system/File_Search_Server.service\n ```\n\n2. Add the following content to the service file:\n```sh\n[Unit]\nDescription=AS Introductory Task File Search Server\nAfter=network.target\n\n[Service]\nUser=yourusername\nWorkingDirectory=/path/to/File_Search_Server\nExecStart=/usr/bin/python3 /path/to/File_Search_Server/server.py\nRestart=always\n\n[Install]\nWantedBy=multi-user.target\n ```\n\nReplace /path/to/File_Search_Server with the actual path to the project directory and yourusername with your Linux username.\n\n3. Reload the systemd daemon to recognize the new service:\n ```sh\nsudo systemctl daemon-reload\n ```\n4. Start the service:\n ```sh\nsudo systemctl start File_Search_Server.service\n ```\n5. Enable the service to start on boot:\n ```sh\nsudo systemctl enable File_Search_Server.service\n ```\n6. Check the status of the service:\n ```sh\nsudo systemctl status File_Search_Server.service\n ```\n\nThe File_Search_Server should now be running as a Linux service. \n\nYou can stop the service with \n ```sh\nsudo systemctl stop File_Search_Server.service\n ```\n and restart it with \n  ```sh\n sudo systemctl restart File_Search_Server.service\n ```\n \n7. Querying the server:\nNavigate to the project directory where the client.py script is located:\n ```sh\ncd /path/to/File_Search_Server\n ```\n 8. Run the client script:\n ```sh\npython client.py\n ``` \n 9. Enter the string you want to search for in the 200k.txt file\n\n\n## Unit Testing\n### Running the test suite server\n\nRun the test suite server script to run all the unit test cases:\n```sh\npytest -vv test-suite_server.py\n```\nThis will run all the test cases and give you a summary of the test session results.\n\n\n## Implemented Algorithms\nThe following file search algorithms are implemented in this project:\n\n1. **Naive Search**:\n   - A straightforward approach that checks each position in the text for a match.\n\n2. **Binary Search**:\n   - Requires the data to be sorted; it divides the search interval in half repeatedly.\n\n3. **Knuth-Morris-Pratt (KMP) Algorithm**:\n   - Uses the preprocessing of the pattern to avoid unnecessary comparisons.\n\n4. **Boyer-Moore Algorithm**:\n   - Skips sections of the text, making it efficient for large alphabets and long patterns.\n\n5. **Rabin-Karp Algorithm**:\n   - Uses hashing to find any one of a set of pattern strings in a text.\n\n6. **Z Algorithm**:\n   - Computes the Z array which is used for pattern matching in linear time.\n\n## Configuration\n\nThe `config.ini` file contains the following setting:\n\n- `REREAD_ON_QUERY`: If set to `True`, the data will be re-read from the file on every query. If `False`, the data will be read once and reused for all queries.\n\n## Running the Benchmarks\n### Benchmarking Search Algorithms\n\nTo benchmark the search algorithms, run the `file-search_algorithms.py` script:\n```sh\npython file-search_algorithms.py\n```\n\nThis script will benchmark each algorithm on the generated test files and save the results to `benchmark_results.txt`.\n\n### Generating the Speed Report\n\nOnce you have the benchmark results, generate the speed report by running the `speed_report.py` script:\n```sh\npython speed_report.py\n```\n\nThis will create two visual representations of the benchmark results, for when REREAD_ON_QUERY = True, saved as `benchmark_chart_reread.png` and for when REREAD_ON_QUERY = False, saved as `benchmark_chart_no_reread.png`.\n\n## Analyzing the Results\n\nThe benchmark results are saved in `benchmark_results.txt` and can be visualized using the generated charts, `benchmark_chart_reread.png` and `benchmark_chart_no_reread.png`. The results include\n\n- Execution time for each algorithm across different file sizes.\n- Comparative performance analysis of all algorithms with and without re-reading the file.\n\n\n### Performance Analysis\n\nFrom the generated charts, key observations include:\n- **Naive Search** shows a linear increase in execution time with file size.\n- **Binary Search** performs well for smaller datasets but requires sorted data.\n- **KMP Algorithm** and **Boyer-Moore Algorithm** demonstrate efficient performance for large datasets.\n- **Rabin-Karp Algorithm** is effective for multiple pattern searches but less efficient for very large datasets.\n- **Z Algorithm** outperforms other algorithms with linear time complexity.\n- **Impact of REREAD_ON_QUERY** Analyzing the two charts can help in understanding how re-reading the file impacts the performance of each algorithm\n\n## Limitations and Recommendations\n\n### Identified Limitations\n\n1. **Memory Usage**: Higher memory consumption for certain algorithms (e.g., Rabin-Karp).\n2. **Execution Time**: Inefficiency of Naive Search and Binary Search for large datasets.\n3. **Scalability**: Performance degradation with high concurrency or extremely large datasets.\n4. **File Re-reading Overhead**: Additional overhead when REREAD_ON_QUERY is set to True, which might affect execution time.\n\n### Recommendations\n\n1. **Use Z Algorithm for Large Datasets**: Implement the Z Algorithm for efficient performance.\n2. **Optimize Memory Usage**: Optimize algorithms to reduce memory consumption.\n3. **Enhance Scalability**: Implement load balancing and optimize server code for better scalability.\n4. **Assess Impact of REREAD_ON_QUERY**: Choose the REREAD_ON_QUERY setting based on specific use cases to balance between overhead and performance.\n5. **Further Testing**: Conduct additional tests with diverse data and query patterns.\n\n## Future Enhancements\n\n1. **Advanced Query Options**: Support regex and fuzzy searching.\n2. **Real-Time Analytics**: Monitor server performance and client query statistics in real-time.\n3. **Scalability Improvements**: Explore distributed computing techniques.\n4. **Enhanced Benchmarking**: Incorporate more detailed benchmarking to evaluate the impact of file re-reading and other parameters on performance.\n\n## Conclusion\n\nThe `File_Search_Server` project provides an efficient solution for searching strings in large text files. By benchmarking various algorithms, we have identified the most suitable algorithm for different scenarios. The project offers a robust framework for further enhancements and optimizations.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidogalo%2Ffile_search_server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidogalo%2Ffile_search_server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidogalo%2Ffile_search_server/lists"}