https://github.com/pletessier/taranis
Similarity search engine built around Faiss library
https://github.com/pletessier/taranis
Last synced: 4 months ago
JSON representation
Similarity search engine built around Faiss library
- Host: GitHub
- URL: https://github.com/pletessier/taranis
- Owner: pletessier
- License: bsd-3-clause
- Created: 2019-04-22T12:01:24.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T05:18:33.000Z (about 2 years ago)
- Last Synced: 2024-07-23T05:49:01.749Z (7 months ago)
- Language: Python
- Size: 104 KB
- Stars: 76
- Watchers: 5
- Forks: 9
- Open Issues: 5
-
Metadata Files:
- Readme: README.adoc
- License: LICENSE
Awesome Lists containing this project
README
= Taranis
Taranis is a similarity search engine built around https://github.com/facebookresearch/faiss[Faiss] library.
It allows you to find the most similar vectors (a common mathematical and simplified representation of an image or a sound) of a query vector among hundreds of millions, or billions if you have enough RAM.== Why ?
Many computer scientists are now able to use machine learning frameworks to classify images without even having to understand how it works. It is very easy to obtain a probability of belonging to a class.
In a production environment, regularly adding images or classes causes a bottleneck since you must constantly re-learn your model. One solution is to use a non-evolving model that simply produces an N-dimensional vector for each input image. These vectors can then be compressed, indexed and/or searched by similarity with an external and incremental system.== What is it ?
Taranis is similarity search engine (Think Elasticsearch, but for vectors, not text documents). Taranis is in fact just a wrapper around the https://github.com/facebookresearch/faiss[Faiss library]. It aims at providing what is missing in such a scientific library:
* Data persitency: reliable storage of raw and compressed vectors (Faiss stores the data in RAM, and can persist to disk on demand, but the writing is not incremental).
* Web services: Taranis provides a gRPC server, and some rest API endpoints for management/monitoring.
* Packaging in a ready-to-run Docker image.[.center.text-center]
[#archi]
.Global Architecture
image::doc/archi.svg[Archi,,512,align="center"]== Quick Start
.Start the Taranis server with dependencies (Mongo + Redis)
```
docker-compose up
```== Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
=== Prerequisites
==== Databases
* MongoDB 4.0+ (for storing databases and indices informations, and raw features)
* Redis 5.0+ (for storing Faiss indices and encoded vectors, inverted lists, etc.)==== C$$++$$ (11+) dependencies
* cmake 3.7+
* openblas
* tacopie 3.2.0: https://github.com/Cylix/tacopie[https://github.com/Cylix/tacopie]
* cpp_redis 4.3.1: https://github.com/cpp-redis/cpp_redis[https://github.com/cpp-redis/cpp_redis]
* faiss 1.5.3: https://github.com/facebookresearch/faiss[https://github.com/facebookresearch/faiss]
* pybind11 2.2.4: https://github.com/pybind/pybind11[https://github.com/pybind/pybind11]
* fmtlib 5.3.0: https://github.com/fmtlib/fmt[https://github.com/fmtlib/fmt]==== Python (3.7) dependencies
See list in requirements.txt
.install dependencies with:
```
pip install -r requirements.txt
```=== Installing
.from sources:
```
mkdir build
cd build
cmake ..
make
```.in a Docker image:
```
docker build -t pletessier/taranis .
```== Deployment
```
docker-compose up -d
```A default configuration file is provided inside the the Docker image.
There are 4 complementary ways to configure Taranis:* One or more configuration files (yaml|tomljson|ini|xml) provided by the `--config-file /my/config/file/path` command line arg.
* One or more configuration directories provided with the `--config-path /my/config/directory` command line arg. Example: providing the path `/run/secrets`, Taranis will read every files in all subdirectories of `/run/secrets` and associate a key (the subpath) to a value (file content). If there is a file `/run/secrets/db/redis/password` containing the text `notagoodpassword`, the configuration will be: `db.redis.password=notagoodpassword`
* Every environment variables starting with `$$TARANIS__$$` will be parsed. For instance, `$$TARANIS__DB__REDIS__HOST$$=my-redis-host` is equivalent to `db.redis.host=my-redis-host`.
* Every additional command lines provided with the arg `--additional-config` or `-C`, such as `-C db.redis.host=my-redis-host`.== Tests
NOTE: Explain how to run the Taranis client code with a 1M vectors dataset.
== Author
I am **Pierre Letessier**, R&D engineer. Taranis is a personal project developed only in my free time, so please be indulgent !
== Contributing
Feedbacks, tests, benchmarks, issues and pull requests are welcome. For pull requests, please fork and create a new branch before to submit it.
== License
This project is licensed under the BSD 3 License - see the LICENSE file for details
== Acknowledgments
Thanks to Matthijs Douze for answering my questions about Faiss.