Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/m1guelpf/tinyvector

A tiny embedding database in pure Rust.
https://github.com/m1guelpf/tinyvector

embeddings embeddings-similarity machine-learning rust search-engines similarity-search vector-database vector-search

Last synced: about 1 month ago
JSON representation

A tiny embedding database in pure Rust.

Host: GitHub
URL: https://github.com/m1guelpf/tinyvector
Owner: m1guelpf
License: mit
Created: 2023-07-03T16:53:18.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2023-12-28T08:45:39.000Z (7 months ago)
Last Synced: 2024-05-10T12:04:01.156Z (2 months ago)
Topics: embeddings, embeddings-similarity, machine-learning, rust, search-engines, similarity-search, vector-database, vector-search
Language: Rust
Homepage: https://crates.io/crates/tinyvector
Size: 121 KB
Stars: 340
Watchers: 8
Forks: 17
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

my-awesome-stars - tinyvector
awesome-stars - m1guelpf/tinyvector - A tiny embedding database in pure Rust. (Rust)
awesome-stars - m1guelpf/tinyvector - A tiny embedding database in pure Rust. (rust)

README

tinyvector logo

tinyvector - a tiny embedding database in pure Rust

## ✨ Features
- **Tiny**: It's in the name. It's literally just an axum server. Extremely easy to customize, around 600 lines of code.
- **Fast**: Tinyvector _should_ have comparable speed to advanced vector databases when it comes on small to medium datasets, and slightly better accuracy.
- **Vertically Scales**: Tinyvector stores all indexes in memory for fast querying. Very easy to scale up to 100 million+ vector dimensions without issue.
- **Open Source**: MIT Licensed, free forever.

### Soon
- **Powerful Queries**: Allow filtering by the provided vector metadata without slowing the search down.
- **Integrated Models**: Soon you won't have to bring your own vectors, just generate them on the server automaticaly. Aiming to support support SBert, Hugging Face models, OpenAI, Cohere, etc.
- **Typescript/Python Libraries**: Should be able to auto-generate pretty good clients using the included OpenAPI schema.

## 🚀 Getting Started

### 🐳 Docker

We provide a lightweight Docker container that you can run anywhere. It only takes one command to get up and running with the latest changes:

```sh
docker run \
-p 8000:8000 \
ghcr.io/m1guelpf/tinyvector:edge
```

> **Note**
> When running via Docker Compose or Kubernetes, make sure to bind a volume to `/tinyvector/storage` for persistence. This is handled automatically in the command above.

### 🛠️ Building from scratch

You can build tinyvector from the latest tagged release by running `cargo install tinyvector` (you might need to [install Rust](https://rustup.rs/) first). Then, run `tinyvector` to start up the server.

You can also build it from the latest commit by cloning the repo and running `cargo build --release`, and run it with `./target/release/tinyvector`.

## 💡 Why use tinyvector?

Most vector databases are overkill for simple setups. For example:
- Using embeddings to chat with your documents. Most document search is nowhere close to what you'd need to justify accelerating search speed with [HNSW](https://github.com/nmslib/hnswlib) or [FAISS](https://github.com/facebookresearch/faiss).
- Doing search for your website or store. Unless you're selling 1,000,000 items, you don't need Pinecone.

## 🧩 Embeddings?

Embeddings are a way to compare similar things, in the same way humans compare similar things, by converting text into a small list of numbers. Similar pieces of text will have similar numbers, different ones have very different numbers.

Read OpenAI's [explanation](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings).

## 🙏 Acknowledgements

- Will Depue's [tinyvector](https://twitter.com/willdepue/status/1675796236304252928) (python+sqlite+numpy) inspired me to build a vector database from scratch (and borrow the name). Will also contributed plenty of ideas to optimize performance.

## 📄 License

This project is open-sourced under the MIT license. See [the License file](LICENSE) for more information.