{"id":19693141,"url":"https://github.com/breadrock1/doc-searcher","last_synced_at":"2026-03-04T14:00:41.377Z","repository":{"id":197216505,"uuid":"697648319","full_name":"breadrock1/doc-searcher","owner":"breadrock1","description":"There is documents searcher project based on Rust and Opensearch technologies.","archived":false,"fork":false,"pushed_at":"2026-02-24T09:50:38.000Z","size":1570,"stargazers_count":4,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-02-24T14:48:33.328Z","etag":null,"topics":["elasticsearch","fulltext-search","opensearch","rag","rest-api","rust","semantic-search"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/breadrock1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-09-28T07:28:57.000Z","updated_at":"2026-02-24T08:58:38.000Z","dependencies_parsed_at":null,"dependency_job_id":"d202b0c3-4e07-4737-9c93-b80091255695","html_url":"https://github.com/breadrock1/doc-searcher","commit_stats":{"total_commits":289,"total_committers":2,"mean_commits":144.5,"dds":0.4948096885813149,"last_synced_commit":"0db2fc6344fa266a99db613dcd26a54e8e73b46e"},"previous_names":["breadrock1/docsearcher"],"tags_count":25,"template":false,"template_full_name":null,"purl":"pkg:github/breadrock1/doc-searcher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2Fdoc-searcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2Fdoc-searcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2Fdoc-searcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2Fdoc-searcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/breadrock1","download_url":"https://codeload.github.com/breadrock1/doc-searcher/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2Fdoc-searcher/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30082988,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T13:22:36.021Z","status":"ssl_error","status_checked_at":"2026-03-04T13:20:45.750Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elasticsearch","fulltext-search","opensearch","rag","rest-api","rust","semantic-search"],"created_at":"2024-11-11T19:15:52.593Z","updated_at":"2026-03-04T14:00:41.289Z","avatar_url":"https://github.com/breadrock1.png","language":"Rust","readme":"[![Pull Request Actions](https://github.com/breadrock1/doc-searcher/actions/workflows/pull-request.yml/badge.svg)](https://github.com/breadrock1/doc-searcher/actions/workflows/pull-request.yml)\n\n[![Target - Linux](https://img.shields.io/badge/OS-Linux-blue?logo=linux\u0026logoColor=white)](https://www.linux.org/ \"Go to Linux homepage\")\n[![Target - MacOS](https://img.shields.io/badge/OS-MacOS-blue?logo=linux\u0026logoColor=white)](https://www.apple.com/ \"Go to Apple homepage\")\n[![Target - Windows](https://img.shields.io/badge/OS-Windows-blue?logo=linux\u0026logoColor=white)](https://www.microsoft.com/ \"Go to Apple homepage\")\n\n# Doc-Search Metaverse project\n\nDoc-Search is the simple and flexible searching documents application, leveraging the capabilities of Rust and Opensearch\nto provide efficient and effective full-text search in documents. This project aims to offer a straightforward solution for\nindexing and searching through a large corpus of documents with the speed and accuracy provided by Opensearch.\n\nThe main goal is implement simple and powerful system of storing and indexing documents with searching functionality \n(full-text, semantic and hybrid). I decided to use opensearch as default searching engine, but you may use own solutions \nby implementing several async traits for Tantivy, QDrant or own solution:\n\nThe principle schema:\n![architecture.png](docs/architecture.png)\n\nDoc-Search includes following sub-services:\n - Cache Service                  - API of caching service like Redis;\n - Metrics Service                - API of metrics to Prometheus monitoring;\n - Storage Service                - API (CRUD) of indexed folders and documents;\n - Searcher Service               - API of searcher functionalities (fulltext, semantic, hybrid);\n - Embeddings Service (removed)   - API of embeddings service if you would like to use own model.\n\n#### Changelog:\n\n**OpenSearch instead Elasticsearch**\nSearcher and Storage services at this moment has common implementation with opensearch\n\n**Removed custom embeddings functionality**\nAfter switching on OpenSearch instead Elasticsearch the neccessary of custon embeddings model integration has gone, \nbecause the newer versions of OpenSearch provides ML plugin with neccessary functionality (chunking and emebdding).\nSo Embeddings module was been removed from code base. When i add Qdrant supporting his functionality will be added into\ninfrastructure with Qdrant client implementation.\n\n## Features\nService based: \n- **Rust Performance**: Benefit from the speed and safety of Rust;\n- **REST API**: Easy to use REST API for searching documents and control management of indexing;\n- **Swagger**: Using swagger documentation service for all available endpoints;\n- **Remote logging**: Send error or warning messages or other metrics to remote server;\n- **Docker Support**: Easy deployment with Docker and docker-compose;\n- **Caching Queries**: Store data to cache service like Redis or own solutions;\n\nSearching: \n- **Full-Text Search**: Quickly find documents based on content based on choose searching engine;\n- **Semantic Search**: Fast semantic searching by external embeddings service;\n- **Hybrid Search**: Fast hybrid searching by external embeddings service;\n\n## Domain\n\nThere are following domains:\n\n```\ndomain\n   |----\u003e Document storage (core)\n   |        |----\u003e Index\n   |        |       |----\u003e Context: index management into vector storage\n   |        |       |----\u003e Services: IIndexStorage\n   |        |----\u003e Document\n   |                |----\u003e Context: splits document on parts and stores into vector storage\n   |                |----\u003e Services: IDocumentPartStorage\n   |\n   |----\u003e Document searching (core)\n   |        |----\u003e Founded document\n   |        |       |----\u003e Context: multiple searching kind results \n   |        |       |----\u003e Services: ISearcher\n   |        |----\u003e Pagination\n   |                |----\u003e Context: paginating of founded results\n   |                |----\u003e Services: IPAginator\n```\n\nAnd there are usecases:\n\n```\nusecase\n   |----\u003e Storage Use Case\n   |        |----\u003e CRUD of index and document\n   |        |----\u003e split large document on parts to store \n   |        |----\u003e upload file to storage and create new task processing event\n   |\n   |----\u003e Searching Use Case\n   |        |----\u003e searching document parts by multiple algorithms\n   |        |----\u003e paginate founded document parts results\n```\n\nThere is context map:\n\n```\n+----------------+         +-----------------+\n| StorageUseCase | \u003c────── | SearcherUseCase |\n+----------------+         +-----------------+\n        |                           |\n        ▼                           ▼\n+----------------+         +-----------------+\n| Storage Domain |         | Searcher Domain |\n+----------------+         +-----------------+\n```\n\nContext data flow:\n\n```\nHTTP Request\n     │\n     ▼\nHTTP Handler (ServerState)\n     │\n     ▼\nServerAppState\n    ├── StorageUseCase (application)\n    │       │\n    │       ▼\n    │    Storage (domain)\n    │\n    └── SearcherUseCase (application)\n            │\n            ▼\n          Task (domain)\n```\n\n## Getting Started\n\nThese instructions will get you a copy of the project up and running on your local machine for development and testing purposes.\n\n### Prerequisites\n\n- Rust\n- Docker \u0026 docker-compose\n- Cache (Redis)\n- Opensearch\n\n### Quick Start\n\n0. Check `docs/opensearch` scripts how load ml cluster into single node and setup infrastructure as ingest and searching pipelines and deploying model.\n1. Clone the repository\n2. Run `cargo install --path .` to build project\n3. Setting up `.env` file with services creds\n4. Run `cargo run --bin init-infrastructure` to init elasticsearch schemas\n4. Run `cargo run --bin launch` to launch service\n\n### Features of project\n\nFeatures to parse and store documents localy from current service (Not stable):\n - enable-unique-doc-id  - enable generating unique document id based on index and document ids.\n\n[![Bread White - doc-search](https://img.shields.io/static/v1?label=Bread%20White\u0026message=author\u0026color=blue\u0026logo=github)](https://github.com/breadrock1/doc-searcher)\n\n[![stars - doc-search](https://img.shields.io/github/stars/breadrock1/doc-searcher?style=social)](https://github.com/breadrock1/doc-searcher)\n[![forks - doc-search](https://img.shields.io/github/forks/breadrock1/doc-searcher?style=social)](https://github.com/breadrock1/doc-searcher)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbreadrock1%2Fdoc-searcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbreadrock1%2Fdoc-searcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbreadrock1%2Fdoc-searcher/lists"}