Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arcmindai/arcmindvector
ArcMind Vector DB
https://github.com/arcmindai/arcmindvector
ai approximate-nearest-neighbor-search internetcomputer nearest-neighbor-search retrieval-augmented-generation rust-lang similarity-search smart-contracts vector-database
Last synced: 3 months ago
JSON representation
ArcMind Vector DB
- Host: GitHub
- URL: https://github.com/arcmindai/arcmindvector
- Owner: arcmindai
- License: mit
- Created: 2023-11-11T03:19:22.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-05T06:30:14.000Z (9 months ago)
- Last Synced: 2024-08-02T06:19:11.745Z (6 months ago)
- Topics: ai, approximate-nearest-neighbor-search, internetcomputer, nearest-neighbor-search, retrieval-augmented-generation, rust-lang, similarity-search, smart-contracts, vector-database
- Language: Rust
- Homepage: https://arcmindai.app
- Size: 233 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-internet-computer - ArcMind Vector DB - A Vector DB with similarity search supporting text, image, and audio embeddings, based on k-d tree, useful for AI applications like recommendation and Retrieval-Augmented Generation. (Decentralized AI / Solana)
README
# Arcmind Vector DB
Arcmind Vector DB is a high-performance, flexible, and ergonomic vector similarity search database for the [Internet Computer](https://internetcomputer.org). It is designed to be a general-purpose vector similarity search database that can be used for a wide range of AI-powered applications, including recommendation systems, search engines, [Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401) (RAG), and long-term memory of Autonomous AI agents like [ArcMind AI](https://github.com/arcmindai/arcmindai).
## Architecture
Sequence Flow Diagram
![ArcMind Vector DB](/diagram/architecture.png)## Prerequisites
- Install Rust Toolchain using Rustup
Follows https://www.rust-lang.org/tools/install
- Install cargo-audit```
cargo install cargo-audit
```- Install dfx sdk
Follow https://github.com/dfinity/sdk## Quick Start
If you want to test your project locally, you can use the following commands:
```bash
# Starts the replica, running in the background
dfx start --background# Deploys controller and brain canisters to the local replica
# Setup the environment variable: CONTROLLER_PRINCIPAL using using > dfx identity get-principal./scripts/provision.sh
```The provision script will deploy a `arcmindvectordb` canister.
## API
See [Candid](/src/arcmindvectordb/arcmindvectordb.did) for the full API.
## Interacting with the canisters
Sample shell scripts are provided to interact with the canisters in the [interact](/interact/) directory.
Sample embeddings content and their embedding vectors are provided in the [embeddings](/embeddings/) directory.### Add a vector to the VectorStore
Open and Edit:
```bash
./interact/add_vector.sh
```Try adding multiple vectors of different topics to the VectorStore.
### Search the VectorStore
Then search for similar vectors by using one of the vectors you added as input.
It should return the same vector as the most similar vector and other similar vectors of the same topic.
See how it can understand the semantic meanings of the vectors with many dimensions.Open and Edit:
```bash
./interact/search_vector.sh
```Note that the same embedding model must be used for adding and searching vectors.
It is recommended that you use the same embedding model in a single VectorStore for consistent results.The embeddings in /embeddings/ are generated using the [OpenAI text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings/embedding-models) model with its [Embedding API](https://platform.openai.com/docs/api-reference/embeddings)
## Setting up Github Action CI / CD
Get the string using commands below then put it into Github Secrets.
Note: Replace default by the identity name you need.### DFX_IDENTITY
```
awk 'NF {sub(/\r/, ""); printf "%s\\r\\n",$0;}' ~/.config/dfx/identity/default/identity.pem
```### DFX_WALLETS
```
cat ~/.config/dfx/identity/default/wallets.json
```## Roadmap
- [x] Backend - Research and implement primary canister as long-term VectorStore with Nearest Neighbours distance metric, embedding API and indexing
- [x] Backend - Integrate with ArcMind AI Autonomous Agent for long-term memory
- [ ] Doc - Add documentation for the VectorStore API
- [ ] Backend - Self-hosted machine learning models for generating text (NLP), image and audio embeddings
- [ ] Backend - Scalable storage buckets for large-scale vector data beyond the canister storage limit## License
See the [License](LICENSE) file for license rights and limitations (MIT).
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for details about how to contribute to this project.
## Authors
Code & Architecture: Henry Chan, [[email protected]](mailto:[email protected]), Twitter: [@kinwo](https://twitter.com/kinwo)
## References
- [Internet Computer](https://internetcomputer.org)
- [Cloudflare - What is a Vector Database?](https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/)
- [RAG](https://arxiv.org/abs/2005.11401)
- [Open-source vector similarity search for Postgres](https://github.com/pgvector/pgvector)
- [Spotify Annoy Library - Approximate Nearest Neighbors in C++/Python](https://github.com/spotify/annoy)
- [What is similarity Search](https://www.pinecone.io/learn/what-is-similarity-search/)
- [Semantic Search: Measuring Meaning From Jaccard to Bert](https://www.pinecone.io/learn/semantic-search/)
- [A high-performance, flexible, ergonomic k-d tree Rust library](https://github.com/sdd/kiddo)
- [K-d tree](https://en.wikipedia.org/wiki/K-d_tree)
- [Depplearing.ai course - Building Applications with Vector Databases](https://www.deeplearning.ai/short-courses/building-applications-vector-databases/)