https://github.com/ptthanh02/airflow-vector-search-pipeline
A pipeline designed for intelligent, semantic document searching
https://github.com/ptthanh02/airflow-vector-search-pipeline
Last synced: 7 months ago
JSON representation
A pipeline designed for intelligent, semantic document searching
- Host: GitHub
- URL: https://github.com/ptthanh02/airflow-vector-search-pipeline
- Owner: ptthanh02
- License: mit
- Created: 2025-03-05T14:26:25.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-05T14:31:03.000Z (7 months ago)
- Last Synced: 2025-03-05T15:31:03.760Z (7 months ago)
- Language: Jupyter Notebook
- Size: 1.46 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🚀 Airflow Vector Search Pipeline
## Project Overview
This project implements a robust data pipeline for intelligent document search, leveraging modern technologies to create a seamless information retrieval system.
### Key Technologies
- Apache Airflow
- MongoDB
- QdrantDB (Vector Database)
- Python
- Docker Compose## 🌟 Features
- Automated data ingestion pipeline
- Vector-based semantic search
- Scalable microservices architecture
- Easy deployment with Docker## 🛠 Prerequisites
- Docker
- Docker Compose
- Python 3.9+## 🚦 Getting Started
### 1. Clone the Repository
```bash
git clone https://github.com/yourusername/intelligent-document-search.git
cd intelligent-document-search
```### 2. Load Docker Images
```bash
docker load -i mongo.tar
docker load -i qdrant.tar
docker load -i postgres.tar
docker load -i redis.tar
docker load -i python3911.tar
```### 3. Start the Data Pipeline
```bash
cd PhamTienThanh_12345678
docker compose up --build
```## 🔍 Access Points
- **Airflow UI**: http://localhost:8080
- Username: airflow
- Password: airflow- **QdrantDB Dashboard**: http://localhost:6333/dashboard
- **MongoDB Compass Connection**:
- URL: mongodb://localhost:27017
- Username: admin
- Password: admin## 📝 Pipeline Workflow
The Airflow DAG performs the following steps:
1. Initialize collection in QdrantDB
2. Insert random data into MongoDB
3. Transfer data to QdrantDB
4. Count and verify data
5. Perform vector-based search