{"id":24534431,"url":"https://github.com/emotu/llm-classifier","last_synced_at":"2026-04-11T21:45:29.536Z","repository":{"id":271476331,"uuid":"913580617","full_name":"emotu/llm-classifier","owner":"emotu","description":"LLM (RAG) powered apis that classify businesses by crawling their website and extracts relevant industry, scopes and business activities","archived":false,"fork":false,"pushed_at":"2025-01-08T01:11:20.000Z","size":1014,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-15T22:24:39.113Z","etag":null,"topics":["ai","fastapi","llm","openai","python","rag"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/emotu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-08T00:52:38.000Z","updated_at":"2025-01-08T01:26:35.000Z","dependencies_parsed_at":"2025-01-08T02:21:25.749Z","dependency_job_id":"68b814cf-df34-4f16-926d-816fa16de34c","html_url":"https://github.com/emotu/llm-classifier","commit_stats":null,"previous_names":["emotu/llm-classifier"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/emotu/llm-classifier","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emotu%2Fllm-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emotu%2Fllm-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emotu%2Fllm-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emotu%2Fllm-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/emotu","download_url":"https://codeload.github.com/emotu/llm-classifier/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emotu%2Fllm-classifier/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31696743,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-11T21:17:31.016Z","status":"ssl_error","status_checked_at":"2026-04-11T21:17:24.556Z","response_time":54,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","fastapi","llm","openai","python","rag"],"created_at":"2025-01-22T11:17:09.687Z","updated_at":"2026-04-11T21:45:29.498Z","avatar_url":"https://github.com/emotu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Business Activity Classifier API\n\nAn intelligent API service that uses Large Language Models to classify businesses according to the European NACE Rev. 2 statistical classification system.\n\n## Features\n\n- Automated business classification using LLMs\n- Company information extraction from websites\n- Policy document generation\n- Asynchronous background processing\n- Vector similarity search for accurate classifications\n- Email notifications\n\n## Tech Stack\n\n- **Framework**: FastAPI\n- **Database**: PostgreSQL with asyncpg\n- **LLM Providers**: \n  - OpenAI\n  - Google Vertex AI\n- **Key Libraries**:\n  - Pydantic for data validation\n  - SQLAlchemy for ORM\n  - LangChain for LLM orchestration\n  - Rich for CLI formatting\n  - Typer for CLI commands\n\n## Project Structure\n\n### Core Components\n\n- `/api` - FastAPI routes and endpoints\n- `/models` - Database models and Pydantic schemas\n- `/services` - Business logic and LLM integration\n- `/tasks` - Background job processors\n- `/utils` - Helper functions and utilities\n\n### Key Features Explained\n\n- **Company Classification**: Utilizes LLMs to classify companies into NACE Rev. 2 categories.\n- **Policy Generation**: Generates ISO-compliant policy documents for companies.\n- **Background Processing**: Asynchronous tasks for policy generation and email notifications.\n- **Vector Search**: Efficient similarity search for accurate classifications.\n- **Email Notifications**: Sends policy documents via email to company representatives.\n\n## Requirements\n\n### System Requirements\n- Python 3.13+\n- PostgreSQL 15+\n- Google Cloud SDK (for GCP features)\n- Node.js 18+ (for frontend development)\n\n### Python Dependencies\n- fastapi\u003e=0.104.1\n- pydantic\u003e=2.5.2\n- sqlalchemy\u003e=2.0.23\n- asyncpg\u003e=0.29.0\n- langchain\u003e=0.0.350\n- openai\u003e=1.3.7\n- google-cloud-aiplatform\u003e=1.36.4\n- jinja2\u003e=3.1.2\n- python-multipart\u003e=0.0.6\n- rich\u003e=13.7.0\n- typer\u003e=0.9.0\n- uvicorn\u003e=0.24.0\n- weasyprint\u003e=60.1\n- resend\u003e=0.6.0\n\n### Development Dependencies\n- black\u003e=23.11.0\n- isort\u003e=5.12.0\n- mypy\u003e=1.7.1\n- pytest\u003e=7.4.3\n- pytest-asyncio\u003e=0.21.1\n- pytest-cov\u003e=4.1.0\n- ruff\u003e=0.1.6\n\n### Infrastructure Requirements\n- Docker and Docker Compose for containerization\n- Google Cloud Platform account for:\n  - Cloud SQL (PostgreSQL)\n  - Cloud Storage\n  - Vertex AI (optional)\n- OpenAI API key (if using OpenAI)\n- Resend API key for email functionality\n\n\n### Getting Started\n\n1. Clone the repository\n2. Install dependencies with `uv sync`\n3. Set up your environment variables (see `app/config.py` for reference)\n4. Run the application with `uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4`\n\n### Configuration\n\n- **Environment Variables**: Set in `.env.local` or `.env`\n- **Database**: Configured in `app/config.py`\n- **LLM Providers**: Configured in `app/config.py`\n- **Email**: Configured in `app/config.py`\n- **GCP**: Configured in `app/config.py`\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femotu%2Fllm-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femotu%2Fllm-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femotu%2Fllm-classifier/lists"}