{"id":20425486,"url":"https://github.com/davidmstraub/sifts","last_synced_at":"2025-04-12T18:55:35.019Z","repository":{"id":248872645,"uuid":"830120728","full_name":"DavidMStraub/sifts","owner":"DavidMStraub","description":"Simple full text \u0026 vector search engine Python library","archived":false,"fork":false,"pushed_at":"2024-08-23T09:09:15.000Z","size":88,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-07T13:40:27.458Z","etag":null,"topics":["full-text-search","pgvector","postgresql","python","semantic-search","sqlite","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DavidMStraub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-17T16:20:52.000Z","updated_at":"2025-02-01T19:02:14.000Z","dependencies_parsed_at":"2024-07-21T18:39:08.631Z","dependency_job_id":null,"html_url":"https://github.com/DavidMStraub/sifts","commit_stats":null,"previous_names":["davidmstraub/sifts"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidMStraub%2Fsifts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidMStraub%2Fsifts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidMStraub%2Fsifts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidMStraub%2Fsifts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DavidMStraub","download_url":"https://codeload.github.com/DavidMStraub/sifts/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248618273,"owners_count":21134200,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["full-text-search","pgvector","postgresql","python","semantic-search","sqlite","vector-database"],"created_at":"2024-11-15T07:13:30.626Z","updated_at":"2025-04-12T18:55:34.981Z","avatar_url":"https://github.com/DavidMStraub.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sifts \u0026ndash; Simple Full Text \u0026 Semantic Search\n\n🔎 Sifts is a simple but powerful Python package for managing and querying document collections with support for both SQLite and PostgreSQL databases.\n\nIt is designed to efficiently handle full-text search and vector search, making it ideal for applications that involve large-scale text data retrieval.\n\n\n\n## Features\n\n- **Dual Database Support**: Sifts works with both SQLite and PostgreSQL, offering the simplicity of SQLite for lightweight applications and the scalability of PostgreSQL for larger, production environments.\n- **Full-Text Search (FTS)**: Perform advanced text search queries with full-text search support.\n- **Vector Search**: Integrate with embedding models to perform vector-based similarity searches, perfect for applications involving natural language processing.\n- **Flexible Querying**: Supports complex queries with filtering, ordering, and pagination.\n\n## Background\n\nThe main idea of Sifts is to leverage the built-in full-text search capabilities in SQLite and PostgreSQL and to make them available via a unified, Pythonic API. You can use SQLite for small projects or development and trivially switch to PostgreSQL to scale your application.\n\nFor vector search, cosine similarity is computed in PostgreSQL via the pgvector extension, while with SQLite similarity is calculated in memory.\n\nSifts does not come with a server mode as it's meant as a library to be imported by other apps. The original motivation for its development was to replace whoosh as search backend in [Gramps Web](https://www.grampsweb.org/), which is based on Flask.\n\n\n## Installation\n\nYou can install Sifts via pip:\n\n```bash\npip install sifts\n```\n\n## Usage\n\n### Full-text search\n\n```python\nimport sifts\n\n# by default, creates a new SQLite database in the working directory\ncollection = sifts.Collection(name=\"my_collection\")\n\n# Add docs to the index. Can also update and delete.\ncollection.add(\n    documents=[\"Lorem ipsum dolor\", \"sit amet\"],\n    metadatas=[{\"foo\": \"bar\"}, {\"foo\": \"baz\"}], # otpional, can filter on these\n    ids=[\"doc1\", \"doc2\"], # unique for each doc. Uses UUIDs if omitted\n)\n\nresults = collection.query(\n    \"Lorem\",\n    # limit=2,  # optionally limit the number of results\n    # where={\"foo\": \"bar\"},  # optional filter\n    # order_by=\"foo\",  # sort by metadata key (rather than rank)\n)\n```\n\nThe API is inspired by [chroma](https://github.com/chroma-core/chroma).\n\n\n### Full-text search syntax\n\nSifts supports the following search syntax:\n\n- Search for individual words\n- Search for multiple words (will match documents where all words are present)\n- `and` operator\n- `or` operator\n- `*` wildcard (in SQLite, supported anywhere in the search term, in PostgreSQL only at the end of the search term)\n\nThe search syntax is the same regardless of backend.\n\n### Vector search (semantic search)\n\nSifts can also be used as vector store, used for semantic search engines or retrieval-augmented generation (RAG) with large language models (LLMs).\n\nSimply pass the `embedding_function` to the `Collection` factory to enable vector storage and set `vector_search=True` in the query method. For instance, using the [Sentence Transformers](https://sbert.net/) library,\n\n```python\nfrom sentence_transformers import SentenceTransformer\n\nmodel = SentenceTransformer(\"intfloat/multilingual-e5-small\")\n\ndef embedding_function(queries: list[str]):\n    return model.encode(queries)\n\ncollection = sifts.Collection(\n    db_url=\"sqlite:///vector_store.db\",\n    name=\"my_vector_store\",\n    embedding_function=embedding_function\n)\n\n# Adding vector data to the collection\ncollection.add([\"This is a test sentence.\", \"Another example query.\"])\n\n# Querying the collection with semantic search\nresults = collection.query(\"Find similar sentences.\", vector_search=True)\n```\n\nPostgreSQL collections require installing and enabling the `pgvector` extension.\n\n\n### Updating and Deleting Documents\n\nDocuments can be updated or deleted using their IDs.\n\n```python\n# Update a document\ncollection.update(ids=[\"document_id\"], contents=[\"Updated content\"])\n\n# Delete a document\ncollection.delete(ids=[\"document_id\"])\n```\n\n## Contributing\n\nContributions are welcome! Feel free to create an [issue](https://github.com/DavidMStraub/sifts/issues) if you encounter problems or have an improvement suggestion, and even better submit a PR along with it!\n\n## License\n\nSifts is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n\n---\n\nHappy Sifting! 🚀\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidmstraub%2Fsifts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidmstraub%2Fsifts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidmstraub%2Fsifts/lists"}