https://github.com/breadrock1/doc-searcher
There is documents searcher project based on Rust and Opensearch technologies.
https://github.com/breadrock1/doc-searcher
elasticsearch fulltext-search opensearch rag rest-api rust semantic-search
Last synced: 5 days ago
JSON representation
There is documents searcher project based on Rust and Opensearch technologies.
- Host: GitHub
- URL: https://github.com/breadrock1/doc-searcher
- Owner: breadrock1
- Created: 2023-09-28T07:28:57.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2026-02-24T09:50:38.000Z (13 days ago)
- Last Synced: 2026-02-24T14:48:33.328Z (13 days ago)
- Topics: elasticsearch, fulltext-search, opensearch, rag, rest-api, rust, semantic-search
- Language: Rust
- Homepage:
- Size: 1.5 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://github.com/breadrock1/doc-searcher/actions/workflows/pull-request.yml)
[](https://www.linux.org/ "Go to Linux homepage")
[](https://www.apple.com/ "Go to Apple homepage")
[](https://www.microsoft.com/ "Go to Apple homepage")
# Doc-Search Metaverse project
Doc-Search is the simple and flexible searching documents application, leveraging the capabilities of Rust and Opensearch
to provide efficient and effective full-text search in documents. This project aims to offer a straightforward solution for
indexing and searching through a large corpus of documents with the speed and accuracy provided by Opensearch.
The main goal is implement simple and powerful system of storing and indexing documents with searching functionality
(full-text, semantic and hybrid). I decided to use opensearch as default searching engine, but you may use own solutions
by implementing several async traits for Tantivy, QDrant or own solution:
The principle schema:

Doc-Search includes following sub-services:
- Cache Service - API of caching service like Redis;
- Metrics Service - API of metrics to Prometheus monitoring;
- Storage Service - API (CRUD) of indexed folders and documents;
- Searcher Service - API of searcher functionalities (fulltext, semantic, hybrid);
- Embeddings Service (removed) - API of embeddings service if you would like to use own model.
#### Changelog:
**OpenSearch instead Elasticsearch**
Searcher and Storage services at this moment has common implementation with opensearch
**Removed custom embeddings functionality**
After switching on OpenSearch instead Elasticsearch the neccessary of custon embeddings model integration has gone,
because the newer versions of OpenSearch provides ML plugin with neccessary functionality (chunking and emebdding).
So Embeddings module was been removed from code base. When i add Qdrant supporting his functionality will be added into
infrastructure with Qdrant client implementation.
## Features
Service based:
- **Rust Performance**: Benefit from the speed and safety of Rust;
- **REST API**: Easy to use REST API for searching documents and control management of indexing;
- **Swagger**: Using swagger documentation service for all available endpoints;
- **Remote logging**: Send error or warning messages or other metrics to remote server;
- **Docker Support**: Easy deployment with Docker and docker-compose;
- **Caching Queries**: Store data to cache service like Redis or own solutions;
Searching:
- **Full-Text Search**: Quickly find documents based on content based on choose searching engine;
- **Semantic Search**: Fast semantic searching by external embeddings service;
- **Hybrid Search**: Fast hybrid searching by external embeddings service;
## Domain
There are following domains:
```
domain
|----> Document storage (core)
| |----> Index
| | |----> Context: index management into vector storage
| | |----> Services: IIndexStorage
| |----> Document
| |----> Context: splits document on parts and stores into vector storage
| |----> Services: IDocumentPartStorage
|
|----> Document searching (core)
| |----> Founded document
| | |----> Context: multiple searching kind results
| | |----> Services: ISearcher
| |----> Pagination
| |----> Context: paginating of founded results
| |----> Services: IPAginator
```
And there are usecases:
```
usecase
|----> Storage Use Case
| |----> CRUD of index and document
| |----> split large document on parts to store
| |----> upload file to storage and create new task processing event
|
|----> Searching Use Case
| |----> searching document parts by multiple algorithms
| |----> paginate founded document parts results
```
There is context map:
```
+----------------+ +-----------------+
| StorageUseCase | <────── | SearcherUseCase |
+----------------+ +-----------------+
| |
▼ ▼
+----------------+ +-----------------+
| Storage Domain | | Searcher Domain |
+----------------+ +-----------------+
```
Context data flow:
```
HTTP Request
│
▼
HTTP Handler (ServerState)
│
▼
ServerAppState
├── StorageUseCase (application)
│ │
│ ▼
│ Storage (domain)
│
└── SearcherUseCase (application)
│
▼
Task (domain)
```
## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
### Prerequisites
- Rust
- Docker & docker-compose
- Cache (Redis)
- Opensearch
### Quick Start
0. Check `docs/opensearch` scripts how load ml cluster into single node and setup infrastructure as ingest and searching pipelines and deploying model.
1. Clone the repository
2. Run `cargo install --path .` to build project
3. Setting up `.env` file with services creds
4. Run `cargo run --bin init-infrastructure` to init elasticsearch schemas
4. Run `cargo run --bin launch` to launch service
### Features of project
Features to parse and store documents localy from current service (Not stable):
- enable-unique-doc-id - enable generating unique document id based on index and document ids.
[](https://github.com/breadrock1/doc-searcher)
[](https://github.com/breadrock1/doc-searcher)
[](https://github.com/breadrock1/doc-searcher)