{"id":37063249,"url":"https://github.com/atasoglu/sqlite-vec-client","last_synced_at":"2026-04-20T23:01:21.946Z","repository":{"id":314068566,"uuid":"1051297640","full_name":"atasoglu/sqlite-vec-client","owner":"atasoglu","description":"A lightweight Python client around sqlite-vec for CRUD and similarity search.","archived":false,"fork":false,"pushed_at":"2025-11-01T12:48:09.000Z","size":167,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-14T09:09:12.473Z","etag":null,"topics":["embeddings","similarity-search","sqlite","sqlite-extension","sqlite-vec","vector-database"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/sqlite-vec-client/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/atasoglu.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY_IMPROVEMENTS.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-05T18:47:53.000Z","updated_at":"2025-11-01T12:47:26.000Z","dependencies_parsed_at":"2025-09-10T12:28:32.139Z","dependency_job_id":"8aa0c071-2b1f-4258-9bc5-601eac40488b","html_url":"https://github.com/atasoglu/sqlite-vec-client","commit_stats":null,"previous_names":["atasoglu/sqlite-vec-client"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/atasoglu/sqlite-vec-client","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/atasoglu%2Fsqlite-vec-client","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/atasoglu%2Fsqlite-vec-client/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/atasoglu%2Fsqlite-vec-client/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/atasoglu%2Fsqlite-vec-client/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/atasoglu","download_url":"https://codeload.github.com/atasoglu/sqlite-vec-client/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/atasoglu%2Fsqlite-vec-client/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32069440,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-20T21:26:33.338Z","status":"ssl_error","status_checked_at":"2026-04-20T21:26:22.081Z","response_time":94,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embeddings","similarity-search","sqlite","sqlite-extension","sqlite-vec","vector-database"],"created_at":"2026-01-14T07:04:09.162Z","updated_at":"2026-04-20T23:01:21.793Z","avatar_url":"https://github.com/atasoglu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sqlite-vec-client\n\n[![PyPI version](https://img.shields.io/pypi/v/sqlite-vec-client)](https://pypi.org/project/sqlite-vec-client/)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)\n[![CI](https://github.com/atasoglu/sqlite-vec-client/actions/workflows/test.yml/badge.svg)](https://github.com/atasoglu/sqlite-vec-client/actions/workflows/test.yml)\n\nA lightweight Python client around [sqlite-vec](https://github.com/asg017/sqlite-vec) that lets you store texts, JSON metadata, and float32 embeddings in SQLite and run fast similarity search.\n\n## Features\n- **Simple API**: One class, `SQLiteVecClient`, for CRUD and search.\n- **Vector index via sqlite-vec**: Uses a `vec0` virtual table under the hood.\n- **Automatic sync**: Triggers keep the base table and vector index aligned.\n- **Typed results**: Clear return types for results and searches.\n- **Filtering helpers**: Fetch by `rowid`, `text`, or `metadata`.\n- **Pagination \u0026 sorting**: List records with `limit`, `offset`, and order.\n- **Bulk operations**: Efficient `update_many()`, `get_all()` generator, and transaction support.\n- **Backup tooling**: High-level `backup()` and `restore()` helpers for disaster recovery workflows.\n\n## Requirements\n- Python 3.9+\n- [SQLite version 3.41 or higher](https://alexgarcia.xyz/sqlite-vec/python.html#updated-sqlite)\n- [The sqlite-vec extension](https://github.com/asg017/sqlite-vec)\n\n## Installation\nInstall from PyPI:\n\n```bash\npip install sqlite-vec-client\n```\n\nOr:\n\n```bash\ngit clone https://github.com/atasoglu/sqlite-vec-client\ncd sqlite-vec-client\npip install .\n```\n\n## Quick start\n```python\nfrom sqlite_vec_client import SQLiteVecClient\n\n# Initialize a client bound to a specific table in a database file\nclient = SQLiteVecClient(table=\"documents\", db_path=\"./example.db\")\n\n# Create schema (base table + vec index); choose embedding dimension and distance\nclient.create_table(dim=384, distance=\"cosine\")\n\n# Add some texts with embeddings (one embedding per text)\ntexts = [\"hello world\", \"lorem ipsum\", \"vector databases\"]\nembs = [\n    [0.1, 0.2, 0.3, *([0.0] * 381)],\n    [0.05, 0.04, 0.03, *([0.0] * 381)],\n    [0.2, 0.1, 0.05, *([0.0] * 381)],\n]\nrowids = client.add(texts=texts, embeddings=embs)\n\n# Similarity search returns (rowid, text, distance)\nquery_emb = [0.1, 0.2, 0.3, *([0.0] * 381)]\nhits = client.similarity_search(embedding=query_emb, top_k=3)\n\n# Fetch full rows (rowid, text, metadata, embedding)\nrows = client.get_many(rowids)\n\nclient.close()\n```\n\n## Export/Import\n\nExport and import data in JSON or CSV formats for backups, migrations, and data sharing:\n\n```python\n# Export to JSON (includes embeddings)\ncount = client.export_to_json(\"backup.jsonl\")\n\n# Export to CSV (human-readable, optional embeddings)\ncount = client.export_to_csv(\"data.csv\", include_embeddings=False)\n\n# Export filtered data\ncount = client.export_to_json(\n    \"important.jsonl\",\n    filters={\"priority\": \"high\"}\n)\n\n# Import from JSON\ncount = client.import_from_json(\"backup.jsonl\")\n\n# Import from CSV\ncount = client.import_from_csv(\"data.csv\")\n\n# Backup and restore workflow\nclient.export_to_json(\"backup.jsonl\")\n# ... data loss ...\nclient.import_from_json(\"backup.jsonl\")\n```\n\nSee [examples/export_import_example.py](examples/export_import_example.py) for more examples.\n\n### Quick backup \u0026 restore helpers\n\n```python\n# Create a JSONL backup\nclient.backup(\"backup.jsonl\")\n\n# Restore later (optionally skip duplicates)\nclient.restore(\"backup.jsonl\", skip_duplicates=True)\n\n# Work with CSV\nclient.backup(\"backup.csv\", format=\"csv\", include_embeddings=True)\nclient.restore(\"backup.csv\", format=\"csv\", skip_duplicates=True)\n```\n\n## Metadata Filtering\n\nEfficiently filter records by metadata fields using SQLite's JSON functions:\n\n```python\n# Filter by single field\nresults = client.filter_by_metadata({\"category\": \"python\"})\n\n# Filter by multiple fields\nresults = client.filter_by_metadata({\"category\": \"python\", \"year\": 2024})\n\n# Nested JSON paths\nresults = client.filter_by_metadata({\"author.name\": \"Alice\"})\n\n# Count matching records\ncount = client.count_by_metadata({\"category\": \"python\"})\n\n# Combined similarity search + metadata filtering\nhits = client.similarity_search_with_filter(\n    embedding=query_vector,\n    filters={\"category\": \"python\"},\n    top_k=5\n)\n\n# Pagination\nresults = client.filter_by_metadata(\n    {\"category\": \"python\"},\n    limit=10,\n    offset=0\n)\n```\n\nSee [examples/metadata_filtering.py](examples/metadata_filtering.py) and [examples/advanced_metadata_queries.py](examples/advanced_metadata_queries.py) for more examples.\n\n## Bulk Operations\n\nThe client provides optimized methods for bulk operations:\n\n```python\n# Bulk update multiple records\nupdates = [\n    (rowid1, \"new text\", {\"key\": \"value\"}, None),\n    (rowid2, None, {\"updated\": True}, new_embedding),\n]\ncount = client.update_many(updates)\n\n# Memory-efficient iteration over all records\nfor rowid, text, metadata, embedding in client.get_all(batch_size=100):\n    process(text)\n\n# Atomic transactions\nwith client.transaction():\n    client.add(texts, embeddings)\n    client.update_many(updates)\n    client.delete_many(old_ids)\n```\n\nSee [examples/batch_operations.py](examples/batch_operations.py) for more examples.\n\n## How it works\n`SQLiteVecClient` stores data in `{table}` and mirrors embeddings in `{table}_vec` (a `vec0` virtual table). SQLite triggers keep both in sync when rows are inserted, updated, or deleted. Embeddings are serialized as packed float32 bytes for compact storage.\n\n## Logging\n\nThe library includes built-in logging support using Python's standard logging module. By default, logging is set to WARNING level.\n\n**Configure log level via environment variable:**\n```bash\nexport SQLITE_VEC_CLIENT_LOG_LEVEL=DEBUG  # Linux/macOS\nset SQLITE_VEC_CLIENT_LOG_LEVEL=DEBUG     # Windows\n```\n\n**Or programmatically:**\n```python\nimport logging\nfrom sqlite_vec_client import get_logger\n\nlogger = get_logger()\nlogger.setLevel(logging.DEBUG)  # DEBUG, INFO, WARNING, ERROR, CRITICAL\n```\n\n**Available log levels:**\n- `DEBUG`: Detailed information for diagnosing issues\n- `INFO`: General informational messages about operations\n- `WARNING`: Warning messages (default)\n- `ERROR`: Error messages\n- `CRITICAL`: Critical error messages\n\nSee [examples/logging_example.py](examples/logging_example.py) for a complete example.\n\n## Testing\n\nThe project has comprehensive test coverage (91%+) with 75 tests covering:\n- Unit tests for utilities and validation\n- Integration tests for all client operations\n- Security tests for SQL injection prevention\n- Edge cases and error handling\n\nSee [TESTING.md](TESTING.md) for detailed testing documentation.\n\n## Development\n\n### Setup\n\nInstall development dependencies:\n```bash\npip install -r requirements-dev.txt\npre-commit install\n```\n\n### Testing\n\nThe project uses pytest with comprehensive test coverage (89%+).\n\n**Run all tests:**\n```bash\npytest\n```\n\n**Run with verbose output:**\n```bash\npytest -v\n```\n\n**Run specific test categories:**\n```bash\npytest -m unit          # Unit tests only\npytest -m integration   # Integration tests only\n```\n\n**Coverage (terminal + XML for CI):**\n```bash\npytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml\n```\nThe CI workflow uploads the generated `coverage.xml` as an artifact for downstream dashboards.\n\n**Run specific test file:**\n```bash\npytest tests/test_client.py\npytest tests/test_validation.py\npytest tests/test_security.py\npytest tests/test_utils.py\n```\n\n### Code Quality\n\n**Format code:**\n```bash\nruff format .\n```\n\n**Lint code:**\n```bash\nruff check .\n```\n\n**Type checking:**\n```bash\nmypy sqlite_vec_client/\n```\n\n**Run all quality checks:**\n```bash\nruff check . \u0026\u0026 ruff format . \u0026\u0026 mypy sqlite_vec_client/ \u0026\u0026 pytest\n```\n\n### Benchmarks\n\n**Run benchmarks:**\n```bash\npython -m benchmarks\n```\n\n**Configure benchmarks:**\nEdit [benchmarks/config.yaml](benchmarks/config.yaml) to customize:\n- Dataset sizes (default: 100, 1000, 10000, 50000)\n- Embedding dimension (default: 384)\n- Distance metric (default: cosine)\n- Database modes (file, memory)\n- Similarity search iterations and top-k values\n\n## Documentation\n\n- [CONTRIBUTING.md](CONTRIBUTING.md) - Contribution guidelines\n- [CHANGELOG.md](CHANGELOG.md) - Version history\n- [TESTING.md](TESTING.md) - Testing documentation\n- [Docs site (MkDocs)](docs/index.md) - Serve locally with `mkdocs serve`\n- [Examples](examples/) - Usage examples\n  - [basic_usage.py](examples/basic_usage.py) - Basic CRUD operations\n  - [metadata_filtering.py](examples/metadata_filtering.py) - Metadata filtering and queries\n  - [advanced_metadata_queries.py](examples/advanced_metadata_queries.py) - Advanced metadata filtering with nested paths\n  - [export_import_example.py](examples/export_import_example.py) - Export/import data in JSON and CSV formats\n  - [transaction_example.py](examples/transaction_example.py) - Transaction management with all CRUD operations\n  - [batch_operations.py](examples/batch_operations.py) - Bulk operations\n  - [logging_example.py](examples/logging_example.py) - Logging configuration\n- [Benchmarks](benchmarks/) - Performance benchmarks\n\n## Contributing\n\nContributions are very welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## License\n\nMIT - See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatasoglu%2Fsqlite-vec-client","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fatasoglu%2Fsqlite-vec-client","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatasoglu%2Fsqlite-vec-client/lists"}