{"id":26123019,"url":"https://github.com/babycommando/entity-db","last_synced_at":"2026-01-19T21:56:30.841Z","repository":{"id":270407000,"uuid":"910277051","full_name":"babycommando/entity-db","owner":"babycommando","description":"EntityDB is an in-browser vector database wrapping indexedDB and Transformers.js over WebAssembly","archived":false,"fork":false,"pushed_at":"2025-05-08T01:39:56.000Z","size":9614,"stargazers_count":152,"open_issues_count":2,"forks_count":9,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-08T02:35:01.053Z","etag":null,"topics":["db","dbbrowser","embeddings","idb","indexed-db","indexeddb","indexeddb-wrapper","transformers","vector-database","wasm","webassembly"],"latest_commit_sha":null,"homepage":"https://entity-db-landing.vercel.app/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/babycommando.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-30T21:30:14.000Z","updated_at":"2025-05-08T01:39:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"b0035df7-6968-462d-aee1-1324d10ee36e","html_url":"https://github.com/babycommando/entity-db","commit_stats":null,"previous_names":["babycommando/entity-db"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/babycommando/entity-db","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babycommando%2Fentity-db","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babycommando%2Fentity-db/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babycommando%2Fentity-db/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babycommando%2Fentity-db/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/babycommando","download_url":"https://codeload.github.com/babycommando/entity-db/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/babycommando%2Fentity-db/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28579362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-19T17:42:58.221Z","status":"ssl_error","status_checked_at":"2026-01-19T17:40:54.158Z","response_time":67,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["db","dbbrowser","embeddings","idb","indexed-db","indexeddb","indexeddb-wrapper","transformers","vector-database","wasm","webassembly"],"created_at":"2025-03-10T15:19:15.283Z","updated_at":"2026-01-19T21:56:30.836Z","avatar_url":"https://github.com/babycommando.png","language":"JavaScript","funding_links":[],"categories":["SDKs \u0026 Libraries"],"sub_categories":[],"readme":"# EntityDB - Decentralized Ai Memory\n\n### Storing Vector Embeddings In The Browser wrapping indexedDB and Transformers.js\n\n![EntityDB](https://raw.githubusercontent.com/babycommando/entity-db/refs/heads/main/cover.gif)\n\n## Demo: [See EntityDB in action!](https://entity-db-landing.vercel.app/)\n\n## Overview\n\n**EntityDB** is a powerful, lightweight in-browser database designed for storing and querying vectors. It integrates seamlessly with [IndexedDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API) for persistent storage and [Transformers.js](https://github.com/xenova/transformers) to generate embeddings from text, allowing you to build fast and efficient search systems with state-of-the-art models. Whether you're building search engines, recommendation systems, Ai memory or any app requiring vector similarity, EntityDB has got you covered.\n\n## Installation\n\nTo install **EntityDB** in your project, run:\n\n```bash\nnpm install @babycommando/entity-db\n```\n\n```bash\nyarn add @babycommando/entity-db\n```\n\n```bash\npnpm add @babycommando/entity-db\n```\n\n```bash\nbun add @babycommando/entity-db\n```\n\n## Features\n\n- **In-browser**: Runs entirely in the browser using IndexedDB for local storage.\n- **Seamless Integration with Transformers**: Easily generate text embeddings with Hugging Face models via Transformers.js.\n- **Cosine Similarity Search**: Efficient querying based on cosine similarity between vectors.\n- **Flexible**: Supports both automatic embedding generation and manual insertion of pre-computed embeddings.\n- **Lightweight**: No need for a server-side component or complex setup.\n\n![EntityDB_in_action](sc-db.png)\n\n## Usage\n\n### Importing the Library\n\n```js\nimport { EntityDB } from \"@babycommando/entity-db\";\n\n// Initialize the VectorDB instance\nconst db = new EntityDB({\n  vectorPath: \"db_name\",\n  model: \"Xenova/all-MiniLM-L6-v2\", // a HuggingFace embeddings model\n});\n```\n\n### Inserting Data with Automatic Embedding Generation\n\nYou can insert data by simply passing a text field. The library will automatically generate embeddings using the specified transformer model.\n\n```js\nawait db.insert({\n  text: \"This is a sample text to embed\",\n});\n```\n\n### Inserting Manual Vectors\n\nIf you already have precomputed vectors, you can insert them directly into the database.\n\n```js\nawait db.insertManualVectors({\n  text: \"Another sample\",\n  embedding: [0.1, 0.2, 0.3, ...] // your precomputed embedding\n});\n```\n\n### Querying (Cosine Similarity)\n\nYou can query the database by providing a text, and EntityDB will return the most similar results based on cosine similarity.\n\n```js\nconst results = await db.query(\"Find similar texts based on this query\");\nconsole.log(results);\n```\n\n### Querying Manual Vectors\n\nIf you have precomputed vectors and want to query them directly, use the queryManualVectors method.\n\n```js\n  const queryVector = [0.1, 0.2, 0.3, ...]; // your precomputed query vector\n  const results = await db.queryManualVectors(queryVector);\n  console.log(results);\n```\n\n### Updating a Vector in the Database\n\nIf you need to update an existing vector in the database:\n\n```js\nawait db.update(\"1234\", {\n  vector: [0.4, 0.5, 0.6], // Updated vector data\n  metadata: { name: \"Updated Item\" }, // Additional updated data\n});\n```\n\n### Deleting Data\n\nYou can delete a vector by its key.\n\n```js\nawait db.delete(1);\n```\n\n---\n\n## Experimental: Binary Vectors\n\nWhile querying vectors by cosine similarity is already extremely fast, sometimes you want to go faster than light. Binary vectors are simplified versions of dense vectors where each value is turned into either a 0 or a 1 by comparing it to the middle value (median). This makes them smaller to store and faster to compare, which helps when working with a lot of data.\n\nNote that this simplification can reduce the quality of the results because some detailed information in the original dense vector is lost. Use it for very long searches. For example, the set of vectors produced by _all-MiniLM-L6-v2_:\n\n`[ -0.020319879055023193,  0.07605013996362686, 0.020568927749991417, ...]`\nafter being binarized becomes:\n`[ 0, 1, 1, ...]`\n\nFor better JS processing, the binary vectors are packed into 64-bit integers (e.g., using BigUint64Array).\nEach 64-bit integer represents 64 binary values, and we use the XOR operation on 64-bit integers to find mismatched bits and then count the 1s.\n\n### Inserting Data and Generate Binarized Embeddings\n\nTo insert data to be vectorized and then binarized, use _insertBinary_.\nNote: to query over binarized vectors, use _queryBinary_ or _queryBinarySIMD_.\n\n```js\nawait db.insertBinary({\n  text: \"This is a sample text to embed and binarize\",\n});\n```\n\n### (Very Fast) Query Binary Embeddings Using Hamming Distance Over Native JS (64 bits at a time max)\n\n![EntityDB_in_action_binary_wasm](binary_js.png)\n\nTo query over binarized vectors use _queryBinary_.\nWhile cosine similarity measures the angle between two vectors in a multi-dimensional space, Hamming distance counts the number of positions where two binary vectors differ. It measures dissimilarity as a simple count of mismatches. For binarized vectors Hamming really is the tool for the job.\n\n```js\nconst results = await db.queryBinary(\"Find similar texts based on this query\");\nconsole.log(results);\n```\n\nExample of a binary hamming distance query over BigUint64Array (64 bits processed at a time using pure JS):\n![EntityDB_in_action_binary](sc-binary.png)\n\n### (Insanely Fast) Query Binary Embeddings Using Hamming Distance Over WebAssembly SIMD (+128 bits at a time, shorter CPU cicles)\n\n![EntityDB_in_action_binary_wasm](binary_wasm.png)\n\nThe WebAssembly SIMD implementation processes 128 bits per iteration (via v128.xor) compared to 64 bits per iteration in the JavaScript implementation using BigUint64Array. This alone gives a theoretical 2x speedup.\n\nHowever SIMD instructions execute XOR, popcount, and similar operations on multiple data lanes in parallel. This reduces the number of CPU cycles required for the same amount of work compared to sequential bitwise operations in JavaScript. SIMD in WebAssembly is likely 2x to 4x faster or more over big vectors.\n\nCheck haming_distance_simd.wat for the WASM source code. Compiled using wat2wasm.\n\n```js\nconst results = await db.queryBinarySIMD(\n  \"Find similar texts based on this query\"\n);\nconsole.log(results);\n```\n\nExample of a binary hamming distance query over WebAssembly SIMD (+128 bits at a time, shorter CPU cicles):\n![EntityDB_in_action_binary_wasm](sc-binary-wasm.png)\nThe logs show offsets (0, 16, 32), which means the code processes 128 bits (16 bytes) at a time. Since the total vector is 384 bits, it takes 3 steps (384 ÷ 128 = 3), confirming 128-bit SIMD processing.\n\n---\n\n#### For Nextjs\n\nIf you're using Next.js, you may need to configure Webpack to work with Transformers.js. Add the following to your next.config.js file:\n\n```js\n  webpack: (config) =\u003e {\n    // Override the default webpack configuration\n    config.resolve.alias = {\n      ...config.resolve.alias,\n      \"onnxruntime-node$\": false, // Disable onnxruntime-node for browser environments\n      \"sharp$\": false, // optional - Disable sharp package (used by some image processing packages)\n    };\n\n    return config;\n  },\n```\n\n## Contributing\n\nFeel free to fork the repository, create issues, and submit pull requests. We welcome contributions and suggestions!\n\n## License\n\nThis project is licensed under the Apache License 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbabycommando%2Fentity-db","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbabycommando%2Fentity-db","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbabycommando%2Fentity-db/lists"}