https://github.com/rhecosystemappeng/populate-vectors-pipeline
Populate vectors to Vector DB from 3 different sources, S3 bucket, code repository, and list of URLs
https://github.com/rhecosystemappeng/populate-vectors-pipeline
Last synced: over 1 year ago
JSON representation
Populate vectors to Vector DB from 3 different sources, S3 bucket, code repository, and list of URLs
- Host: GitHub
- URL: https://github.com/rhecosystemappeng/populate-vectors-pipeline
- Owner: RHEcosystemAppEng
- Created: 2025-01-28T09:46:10.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-28T10:38:51.000Z (over 1 year ago)
- Last Synced: 2025-01-28T11:33:47.743Z (over 1 year ago)
- Language: Python
- Size: 6.84 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Populate Vectors Pipeline
This repo compiled a pipeline.yaml that populates vectors from 3 different sources by the user choice:
* S3 Bucket
* Code Repository
* List of URLs
Currently, the repository only supports processing PDFs. However, it can be extended to handle other data types as needed.
### Upload the pipeline as a job
If you want to upload the complied pipeline using a job this can be done using this repo: [ml-pipeline-importer-runner](https://github.com/RHEcosystemAppEng/ml-pipeline-importer-runner)
### How to execute
`pip install -r requirements.txt`
`python3 ./main.py`