{"id":18710785,"url":"https://github.com/apify/actor-vector-database-integrations","last_synced_at":"2025-11-03T16:30:10.373Z","repository":{"id":243640369,"uuid":"798139497","full_name":"apify/actor-vector-database-integrations","owner":"apify","description":"Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)","archived":false,"fork":false,"pushed_at":"2025-04-07T09:51:26.000Z","size":110314,"stargazers_count":7,"open_issues_count":1,"forks_count":6,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-11T22:11:25.759Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apify.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-09T07:06:29.000Z","updated_at":"2025-04-07T09:51:34.000Z","dependencies_parsed_at":"2024-07-23T09:08:03.742Z","dependency_job_id":"ec337a4d-6064-4b99-8c6d-df2a2924ba6e","html_url":"https://github.com/apify/actor-vector-database-integrations","commit_stats":null,"previous_names":["apify/store-vector-db","apify/actor-vector-database-integrations"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-vector-database-integrations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-vector-database-integrations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-vector-database-integrations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-vector-database-integrations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apify","download_url":"https://codeload.github.com/apify/actor-vector-database-integrations/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248560576,"owners_count":21124682,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T12:35:40.345Z","updated_at":"2025-11-03T16:30:10.368Z","avatar_url":"https://github.com/apify.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Apify Vector Database Integrations\n\n#### Vector database integrations (Actors)\n\n| Actor                       | Actor badge |\n|-----------------------------|---------------------|\n| [Chroma](https://apify.com/apify/chroma-integration) | [![Chroma integration](https://apify.com/actor-badge?actor=apify/chroma-integration)](https://apify.com/apify/chroma-integration) |\n| [Milvus](https://apify.com/apify/milvus-integration) | [![Milvus integration](https://apify.com/actor-badge?actor=apify/milvus-integration)](https://apify.com/apify/milvus-integration) |\n| [OpenSearch](https://apify.com/apify/opensearch-integration) | [![OpenSearch integration](https://apify.com/actor-badge?actor=apify/opensearch-integration)](https://apify.com/apify/opensearch-integration) |\n| [PGVector](https://apify.com/apify/pgvector-integration) | [![PGVector integration](https://apify.com/actor-badge?actor=apify/pgvector-integration)](https://apify.com/apify/pgvector-integration) |\n| [Pinecone](https://apify.com/apify/pinecone-integration) | [![Pinecone integration](https://apify.com/actor-badge?actor=apify/pinecone-integration)](https://apify.com/apify/pinecone-integration) |\n| [Qdrant](https://apify.com/apify/qdrant-integration) | [![Qdrant integration](https://apify.com/actor-badge?actor=apify/qdrant-integration)](https://apify.com/apify/adrant-integration) |\n| [Weaviate](https://apify.com/apify/weaviate-integration) | [![Weaviate integration](https://apify.com/actor-badge?actor=apify/weaviate-integration)](https://apify.com/apify/weaviate-integration) |\n\nThe Apify Vector Database Integrations facilitate the transfer of data from Apify Actors to a vector database. \nThis process includes data processing, optional splitting into chunks, embedding computation, and data storage\n\nThese integrations support incremental updates, ensuring that only changed data is updated. \nThis reduces unnecessary embedding computation and storage operations, making it ideal for search and retrieval augmented generation (RAG) use cases.\n\nThis repository contains Actors for different vector databases. \n\n## How does it work?\n\n1. Retrieve a dataset as output from an Actor.\n2. _[Optional]_ Split text data into chunks using [langchain](https://python.langchain.com).\n3. _[Optional]_ Update only changed data.\n4. Compute embeddings, e.g. using [OpenAI](https://platform.openai.com/docs/guides/embeddings) or [Cohere](https://cohere.com/embeddings).\n5. Save data into the database.\n\n## Supported Vector Embeddings\n- [OpenAI](https://platform.openai.com/docs/guides/embeddings)\n- [Cohere](https://cohere.com/embeddings)\n\n## How to add a new integration (an example for PG-Vector)?\n\n1. Add database to [docker-compose.yml](docker-compose.yaml) for local testing (if the database is available in docker).\n\n```\nversion: '3.8'\n\nservices:\n  pgvector-container:\n    image: pgvector/pgvector:pg16\n    environment:\n      - POSTGRES_PASSWORD=password\n      - POSTGRES_DB=apify\n    ports:\n      - \"5432:5432\"\n```\n\n1. Add postgres dependency to `pyproject.toml`:\n   ```bash\n   poetry add --group=pgvector \"langchain_postgres\"\n   ```\n   and mark the group pgvector as optional (in `pyproject.toml`):\n   ```toml\n   [tool.poetry.group.postgres]\n   optional = true\n   ```\n   \n1. Create a new Actor in the `actors` directory, e.g. `actors/pgvector` and add the following files: \n   - `README.md` - the Actor documentation\n   - `.actor/actor.json` - the Actor definition\n   - `.actor/input_schema.json` - the Actor input schema\n   - \n1. Create a pydantic model for the Actor input schema. Edit Makefile to generate the input schema from the model:\n   ```bash\n    datamodel-codegen --input $(DIRS_WITH_ACTORS)/pgvector/.actor/input_schema.json --output $(DIRS_WITH_CODE)/src/models/pgvector_input_model.py  --input-file-type jsonschema  --field-constraints\n   ```\n   and then run\n   ```bash\n   make pydantic-model\n   ```\n1. Import the created model in `src/models/__init__.py`:\n   ```python\n   from .pgvector_input_model import PgvectorIntegration\n   ``\n1. Create a new module (`pgvector.py`) in the `vector_stores` directory, e.g. `vector_stores/pgvector` and implement all class `PGVectorDatabase` and all required methods.\n1. Add PGVector into `SupportedVectorStores` in the `constants.py` \n   ```python\n      class SupportedVectorStores(str, enum.Enum):\n          pgvector = \"pgvector\"\n   ```\n\n1. Add PGVectorDatabase into `entrypoint.py`\n   ```python\n      if actor_type == SupportedVectorStores.pgvector.value:\n          await run_actor(PgvectorIntegration(**actor_input), actor_input)\n   ```\n\n1. Add `PGVectorDatabase` and `PgvectorIntegration`  into `_types.py`\n   ```python\n       ActorInputsDb: TypeAlias = ChromaIntegration | PgvectorIntegration | PineconeIntegration | QdrantIntegration\n       VectorDb: TypeAlias = ChromaDatabase | PGVectorDatabase | PineconeDatabase | QdrantDatabase\n   ```\n\n1. Add `PGVectorDatabase` into `vector_stores/vcs.py`\n   ```python\n       if isinstance(actor_input, PgvectorIntegration):\n           from .vector_stores.pgvector import PGVectorDatabase\n\n           return PGVectorDatabase(actor_input, embeddings)\n   ```\n\n1. Add `PGVectorDatabase` fixture into `tests/conftets.py`\n   ```python\n      @pytest.fixture()\n      def db_pgvector(crawl_1: list[Document]) -\u003e PGVectorDatabase:\n          db = PGVectorDatabase(\n              actor_input=PgvectorIntegration(\n                  postgresSqlConnectionStr=os.getenv(\"POSTGRESQL_CONNECTION_STR\"),\n                  postgresCollectionName=INDEX_NAME,\n                  embeddingsProvider=\"OpenAI\",\n                  embeddingsApiKey=os.getenv(\"OPENAI_API_KEY\"),\n                  datasetFields=[\"text\"],\n              ),\n              embeddings=embeddings,\n          )\n\n          db.unit_test_wait_for_index = 0\n\n          db.delete_all()\n          # Insert initially crawled objects\n          db.add_documents(documents=crawl_1, ids=[d.metadata[\"id\"] for d in crawl_1])\n\n          yield db\n\n          db.delete_all()\n   ```\n\n1. Add the `db_pgvector` fixture into `tests/test_vector_stores.py`\n   ```python\n      DATABASE_FIXTURES = [\"db_pinecone\", \"db_chroma\", \"db_qdrant\", \"db_pgvector\"]\n   ```\n1. Update README.md in the `actors/pgvector` directory\n\n1. Add the `pgvector` to the README.md in the root directory\n\n1. Run tests\n   ```bash  \n   make test\n   ```\n\n1. Run the Actor locally\n   ```bash\n   export ACTOR_PATH_IN_DOCKER_CONTEXT=actors/pgvector\n   apify run -p\n   ````\n\n1. Setup Actor on Apify platform at `https://console.apify.com`\n\n   Build configuration\n   ```\n   Git URL: https://github.com/apify/store-vector-db\n   Branch: master\n   Folder: actors/pgvector\n   ```\n\n1. Test the Actor on the Apify platform","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapify%2Factor-vector-database-integrations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapify%2Factor-vector-database-integrations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapify%2Factor-vector-database-integrations/lists"}