https://github.com/emotu/llm-classifier
LLM (RAG) powered apis that classify businesses by crawling their website and extracts relevant industry, scopes and business activities
https://github.com/emotu/llm-classifier
ai fastapi llm openai python rag
Last synced: 2 months ago
JSON representation
LLM (RAG) powered apis that classify businesses by crawling their website and extracts relevant industry, scopes and business activities
- Host: GitHub
- URL: https://github.com/emotu/llm-classifier
- Owner: emotu
- Created: 2025-01-08T00:52:38.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-08T01:11:20.000Z (over 1 year ago)
- Last Synced: 2025-03-15T22:24:39.113Z (over 1 year ago)
- Topics: ai, fastapi, llm, openai, python, rag
- Language: Python
- Homepage:
- Size: 990 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Business Activity Classifier API
An intelligent API service that uses Large Language Models to classify businesses according to the European NACE Rev. 2 statistical classification system.
## Features
- Automated business classification using LLMs
- Company information extraction from websites
- Policy document generation
- Asynchronous background processing
- Vector similarity search for accurate classifications
- Email notifications
## Tech Stack
- **Framework**: FastAPI
- **Database**: PostgreSQL with asyncpg
- **LLM Providers**:
- OpenAI
- Google Vertex AI
- **Key Libraries**:
- Pydantic for data validation
- SQLAlchemy for ORM
- LangChain for LLM orchestration
- Rich for CLI formatting
- Typer for CLI commands
## Project Structure
### Core Components
- `/api` - FastAPI routes and endpoints
- `/models` - Database models and Pydantic schemas
- `/services` - Business logic and LLM integration
- `/tasks` - Background job processors
- `/utils` - Helper functions and utilities
### Key Features Explained
- **Company Classification**: Utilizes LLMs to classify companies into NACE Rev. 2 categories.
- **Policy Generation**: Generates ISO-compliant policy documents for companies.
- **Background Processing**: Asynchronous tasks for policy generation and email notifications.
- **Vector Search**: Efficient similarity search for accurate classifications.
- **Email Notifications**: Sends policy documents via email to company representatives.
## Requirements
### System Requirements
- Python 3.13+
- PostgreSQL 15+
- Google Cloud SDK (for GCP features)
- Node.js 18+ (for frontend development)
### Python Dependencies
- fastapi>=0.104.1
- pydantic>=2.5.2
- sqlalchemy>=2.0.23
- asyncpg>=0.29.0
- langchain>=0.0.350
- openai>=1.3.7
- google-cloud-aiplatform>=1.36.4
- jinja2>=3.1.2
- python-multipart>=0.0.6
- rich>=13.7.0
- typer>=0.9.0
- uvicorn>=0.24.0
- weasyprint>=60.1
- resend>=0.6.0
### Development Dependencies
- black>=23.11.0
- isort>=5.12.0
- mypy>=1.7.1
- pytest>=7.4.3
- pytest-asyncio>=0.21.1
- pytest-cov>=4.1.0
- ruff>=0.1.6
### Infrastructure Requirements
- Docker and Docker Compose for containerization
- Google Cloud Platform account for:
- Cloud SQL (PostgreSQL)
- Cloud Storage
- Vertex AI (optional)
- OpenAI API key (if using OpenAI)
- Resend API key for email functionality
### Getting Started
1. Clone the repository
2. Install dependencies with `uv sync`
3. Set up your environment variables (see `app/config.py` for reference)
4. Run the application with `uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4`
### Configuration
- **Environment Variables**: Set in `.env.local` or `.env`
- **Database**: Configured in `app/config.py`
- **LLM Providers**: Configured in `app/config.py`
- **Email**: Configured in `app/config.py`
- **GCP**: Configured in `app/config.py`