{"id":17018328,"url":"https://github.com/devprojectekla/back_end_indexer","last_synced_at":"2025-03-22T16:11:49.421Z","repository":{"id":232222066,"uuid":"783781610","full_name":"DevprojectEkla/back_end_indexer","owner":"DevprojectEkla","description":"This Rust program could be more extensively implemented as a real backend, I actually designed it as a basic library providing an API for indexing files using TF-IDF (Term Frequency-Inverse Document Frequency). It provides functionality to traverse directories, calculate TF-IDF scores, and index files for efficient searching and retrieval.","archived":false,"fork":false,"pushed_at":"2024-06-20T23:24:54.000Z","size":2194,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-27T16:47:45.765Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DevprojectEkla.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-08T15:00:10.000Z","updated_at":"2024-06-20T23:24:56.000Z","dependencies_parsed_at":"2024-06-21T10:14:00.190Z","dependency_job_id":"52376d51-9999-45cf-9d2e-77497bed0da1","html_url":"https://github.com/DevprojectEkla/back_end_indexer","commit_stats":null,"previous_names":["devprojectekla/back_end_indexer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DevprojectEkla%2Fback_end_indexer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DevprojectEkla%2Fback_end_indexer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DevprojectEkla%2Fback_end_indexer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DevprojectEkla%2Fback_end_indexer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DevprojectEkla","download_url":"https://codeload.github.com/DevprojectEkla/back_end_indexer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244982059,"owners_count":20542300,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-14T06:45:33.498Z","updated_at":"2025-03-22T16:11:49.402Z","avatar_url":"https://github.com/DevprojectEkla.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rust Backend Application for File Indexing with TF-IDF\n\n## Description\n\nWhile this Rust program could be more extensively implemented to be used as a cli application I rather designed it as a basic library for indexing files using TF-IDF (Term Frequency-Inverse Document Frequency). It provides functionality to traverse directories, extract metadata, calculate TF-IDF scores, and index files for efficient searching and retrieval.  \nI like to see it as a backend for my Rust client application and it could easily become one but it surely is an API providing mainly the class `Index` (a `Struct` in Rust) implementing the `.index_all()` method that a client app can use.\n\n\n\u003e   The code is freely inspired by this public [repo](https://github.com/tsoding/seroost/) from tsoding.  \n\u003e   I added functionnalities to crawl in multiple directories and parse every file in a direcotory tree, ignoring binaries.  \n\n## Features\n\n- **File Crawling:** Traverse directories recursively to discover files.\n- **Metadata Extraction:** Extract essential metadata such as file size, modification date, and type.\n- **TF-IDF Calculation:** Compute TF-IDF scores to represent the importance of terms in files.\n- **Indexing:** Build an index of files based on their contents and TF-IDF scores.\n- **Search Capability:** Implement search functionality using TF-IDF scores to retrieve files based on queries.\n\n## Technology Stack\n\n- **Language:** Rust  \n- **Libraries/Frameworks:**  \n  - serde: A framework for serializing and deserializing Rust data structures efficiently.  \n  - serde_json: Provides functions to parse JSON data and convert it into Rust data structures and vice versa.  \n  - walkdir: A simple Rust crate for iterating over directories recursively.  \n  - poppler: A Rust binding for the Poppler PDF rendering library, used for PDF file handling and text extraction.  \n\n## Installation\n\n### Prerequisites\n\n- Rust installed on your system ([Install Rust](https://www.rust-lang.org/tools/install)).\n\n### Clone Repository\n\n```bash\ngit clone https://github.com/DevprojectEkla/back_end_indexer.git\ncd back_end_indexer\n```\n## Dependencies\n\nAdd the following dependencies to your `Cargo.toml` file:\n\n```bash\ntoml\n[dependencies]\nrand = \"0.8.5\"\npoppler = \"0.3.2\"\nserde_json = \"1.0.108\"\nserde =  { version = \"1.0.190\", features = [\"derive\"] }\nwalkdir = \"2.4.0\"\nxml-rs = \"0.8.19\"\n```\n\n### Build and Run\n\n\u003e  there you can simply test basic functions and functionnalities\n\u003e  of the library like indexing a directory, calculating the tf-idf ratio ...\n\u003e  edit main.rs file to do so \n\n```bash\ncargo build --release\ncargo run\n```\n\n## Usage\n\n### How It Works\n\n1. **Indexing Files with TF-IDF:**\n   - The application starts by crawling specified directories.\n   - For each file encountered, it extracts metadata and calculates TF-IDF scores for terms.\n   - Files are indexed based on their TF-IDF scores.\n\n###  With the associated RustIndexer Client app'\n\n2. **Searching Files Using TF-IDF:**\n   - Users can search for files based on keywords or terms with high TF-IDF scores.\n   - Results are returned based on relevance to the search query using TF-IDF scores.\n\n## Future Improvements\n\n- Implement additional text preprocessing techniques for better TF-IDF calculations.\n- Enhance search functionality with advanced querying options.\n- Optimize indexing process and search algorithms for large file sets.\n\n## Contribution\n\nContributions are welcome! If you find any issues or have suggestions for improvements, please submit a pull request or open an issue on [GitHub](https://github.com/DevprojectEkla/back_end_indexer).\n\n## License\n\nThis project is licensed under the [GPL-3.0 license](https://github.com/DevprojectEkla/back_end_indexer/blob/main/LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevprojectekla%2Fback_end_indexer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevprojectekla%2Fback_end_indexer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevprojectekla%2Fback_end_indexer/lists"}