https://github.com/rootz491/ai-image-tagging-pipeline
https://github.com/rootz491/ai-image-tagging-pipeline
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/rootz491/ai-image-tagging-pipeline
- Owner: rootz491
- Created: 2025-07-08T19:49:47.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-09T00:27:20.000Z (11 months ago)
- Last Synced: 2025-07-21T19:29:07.301Z (11 months ago)
- Language: Python
- Size: 28.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI Image Tagging Pipeline
A scalable system for automatically tagging images using AI models (BLIP and CLIP), built with a clean architecture approach.
## Architecture Overview
This project follows clean architecture principles, separating concerns into distinct layers:
- **Presentation Layer**: Express.js API endpoints and Bull Board UI
- **Application Layer**: Queue and job processing logic
- **Domain Layer**: Image classification, captioning, and embedding services
- **Infrastructure Layer**: Redis for job queue, Docker for containerization
## Project Structure
```
/ai-image-tagging-pipeline
├── backend/ # Node.js API service
│ ├── server.js # Express server with endpoints
│ ├── queue.js # Queue management abstraction
│ ├── Dockerfile # Container definition
│ ├── package.json # Dependencies
│ └── .env # Environment configuration
├── worker/ # Python AI processing worker
│ ├── main.py # Worker entry point
│ ├── caption.py # BLIP image captioning service
│ ├── embed.py # CLIP image embedding service
│ ├── classify.py # Image classification rules
│ ├── Dockerfile # Container definition
│ └── requirements.txt # Python dependencies
├── uploads/ # Shared volume for image storage
└── docker-compose.yml # Multi-container orchestration
```
## Technology Stack
- **Backend API**: Node.js with Express
- **Queue**: Bull (Redis-based queue)
- **Worker**: Python 3.10
- **AI Models**:
- BLIP (Bootstrapping Language-Image Pre-training) for image captioning
- CLIP (Contrastive Language-Image Pre-training) for image embeddings
- **Containerization**: Docker and Docker Compose
## Features
- **Image Captioning**: Generate descriptive captions for images using BLIP
- **Vector Embeddings**: Create vector embeddings using CLIP for similarity search
- **Automatic Classification**: Categorize images based on caption content
- **Scalable Processing**: Distributed architecture with worker processes
- **Monitoring**: Bull Board UI for job queue monitoring
## Getting Started
### Prerequisites
- Docker and Docker Compose
- Node.js (for local development)
- Python 3.10 (for local development)
### Installation & Setup
1. Clone the repository:
```bash
git clone https://github.com/rootz491/ai-image-tagging-pipeline.git
cd ai-image-tagging-pipeline
```
2. Make the management script executable:
```bash
chmod +x manage.sh
```
3. Choose your environment setup:
**Development Setup (with hot-reload)**:
```bash
./manage.sh start dev
```
**Production Setup**:
```bash
./manage.sh start prod
```
Or use Docker Compose directly:
```bash
docker-compose -f docker-compose.dev.yml up --build # Development
docker-compose -f docker-compose.prod.yml up --build # Production
```
4. The services will be available at:
- Backend API: http://localhost:3000
- Bull Board Queue UI: http://localhost:3000/admin/queues
### Usage
1. Upload an image or specify an image path via the API:
```bash
curl -X POST http://localhost:3000/upload \
-H "Content-Type: application/json" \
-d '{"imagePath": "/uploads/example.jpg"}'
```
2. The image will be processed asynchronously, and the results will be saved as a JSON file alongside the image.
3. The resulting JSON will contain:
- Caption: A descriptive text of the image
- Tags: Key words extracted from the caption
- Category: Automatic classification based on tags
- Embedding: Vector representation of the image (for similarity search)
## Development
### Development Container (VSCode)
This project includes a development container configuration for VSCode. This allows you to develop inside a Docker container with all dependencies pre-installed.
1. Install the [Remote - Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension for VSCode.
2. Open the project in VSCode and click the green button in the bottom-left corner or run the "Remote-Containers: Reopen in Container" command.
3. VSCode will build the development container and connect to it. All code changes will be immediately reflected in the running application.
4. The services will be available at:
- Backend API: http://localhost:3000
- Bull Board Queue UI: http://localhost:3000/admin/queues
### Development Environment Features
- **Hot Reloading**:
- Backend: Code changes are automatically applied using Nodemon
- Worker: Code changes trigger automatic restarts using Watchdog
- **Volume Mounting**:
- Your local codebase is mounted into the containers
- Changes made locally are immediately reflected in the running services
- **Shared Dependencies**:
- Node modules are stored in a Docker volume for performance
- Python packages are pre-installed in the container
### Hot-Reloading
The development environment supports hot-reloading, which automatically reloads your code when changes are detected:
- **Backend**: Uses Nodemon to watch for JavaScript file changes
- **Worker**: Uses Watchdog to monitor Python file changes
If you encounter issues with hot-reloading, see the `DEV_GUIDE.md` file for troubleshooting steps.
You can also manually trigger a reload:
```bash
./manage.sh reload dev backend # Reload backend service
./manage.sh reload dev worker # Reload worker service
```
### Manual Local Setup
1. Install backend dependencies:
```bash
cd backend
npm install
```
2. Install worker dependencies:
```bash
cd worker
pip install -r requirements.txt
```
3. Start Redis locally or use Docker:
```bash
docker run -p 6379:6379 redis:latest
```
### Environment Variables
- `REDIS_URL`: Redis connection string (default: "redis://localhost:6379")
## Version Control
### .gitignore
The project includes a comprehensive `.gitignore` file that excludes:
- **Python artifacts**: `__pycache__`, `.pyc`, `.pyo`, etc.
- **Node.js artifacts**: `node_modules`, npm logs, etc.
- **Environment files**: `.env` files containing sensitive information
- **Model caches**: Large pre-trained model files and caches
- **Generated data**: Uploaded images and generated results
- **Temporary files**: Logs, cache files, and temporary data
- **IDE-specific files**: Editor configurations (except shared VS Code settings)
This ensures that only the necessary source code and configuration files are included in the repository, keeping it clean and efficient.
## Extending the Pipeline
### Adding New Classification Categories
Edit the `classify.py` file to add new rules for image categorization.
### Using Different AI Models
1. Modify `caption.py` to use alternative image captioning models
2. Update `embed.py` to use different embedding models
## License
[MIT License](LICENSE)
## Queue Processing
This project uses BullMQ for job queue management, with:
- **Backend**: Node.js Bull implementation for job creation
- **Worker**: Python BullMQ implementation (python-bullmq) for job processing
- **Monitoring**: Bull Board UI for queue visualization
This architecture ensures compatibility between Node.js and Python services while providing reliable job processing with features like:
- Job progress tracking
- Concurrency control
- Error handling and retries
- Queue monitoring
### Production Deployment
The production setup is optimized for performance and stability:
1. Deploy using the production Docker Compose file:
```bash
docker-compose -f docker-compose.prod.yml up -d
```
2. The production environment includes:
- Resource limits for containers
- Restart policies for container reliability
- Optimized builds without development dependencies
- Production-ready environment variables
3. Scaling the worker service:
```bash
docker-compose -f docker-compose.prod.yml up -d --scale worker=3
```
4. Monitoring the services:
```bash
docker-compose -f docker-compose.prod.yml ps
docker-compose -f docker-compose.prod.yml logs -f
```