{"id":31079595,"url":"https://github.com/carlosyazid/ai-data-challenge","last_synced_at":"2026-04-08T16:02:14.940Z","repository":{"id":311073766,"uuid":"1042374106","full_name":"CarlosYazid/Ai-Data-Challenge","owner":"CarlosYazid","description":"Challenge Project","archived":false,"fork":false,"pushed_at":"2025-08-26T15:51:45.000Z","size":36,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-16T10:55:12.312Z","etag":null,"topics":["fastapi","langchain","openai","streamlit","supabase"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CarlosYazid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-21T23:18:58.000Z","updated_at":"2025-08-26T15:51:49.000Z","dependencies_parsed_at":"2025-09-16T10:28:28.301Z","dependency_job_id":"319108f6-e454-4af3-bd69-f310512d5755","html_url":"https://github.com/CarlosYazid/Ai-Data-Challenge","commit_stats":null,"previous_names":["carlosyazid/challenge-de-clasificacion-biomedica-con-ia","carlosyazid/ai-data-challenge"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CarlosYazid/Ai-Data-Challenge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CarlosYazid%2FAi-Data-Challenge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CarlosYazid%2FAi-Data-Challenge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CarlosYazid%2FAi-Data-Challenge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CarlosYazid%2FAi-Data-Challenge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CarlosYazid","download_url":"https://codeload.github.com/CarlosYazid/Ai-Data-Challenge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CarlosYazid%2FAi-Data-Challenge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31562697,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","langchain","openai","streamlit","supabase"],"created_at":"2025-09-16T10:28:03.599Z","updated_at":"2026-04-08T16:02:14.918Z","avatar_url":"https://github.com/CarlosYazid.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🔬 Scientific Paper Classifier\n\nAn AI-powered application that automatically classifies scientific papers into medical categories using advanced language models and retrieval-augmented generation (RAG).\n\n![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)\n![Python](https://img.shields.io/badge/python-3.8+-blue.svg)\n![FastAPI](https://img.shields.io/badge/FastAPI-0.116.1-green.svg)\n![Streamlit](https://img.shields.io/badge/Streamlit-1.37.0-red.svg)\n\n## 📋 Table of Contents\n\n- [Overview](#overview)\n- [Features](#features)\n- [Architecture](#architecture)\n- [Categories](#categories)\n- [Installation](#installation)\n- [Configuration](#configuration)\n- [Usage](#usage)\n- [API Documentation](#api-documentation)\n- [Project Structure](#project-structure)\n- [Technologies](#technologies)\n- [Contributing](#contributing)\n- [License](#license)\n\n## 🎯 Overview\n\nThe Scientific Paper Classifier is a comprehensive AI system designed to categorize biomedical research papers into specific medical domains. The system combines the power of Large Language Models (LLMs) with vector databases and web search capabilities to provide accurate, contextual classifications with detailed explanations.\n\n### Key Components\n\n- **Backend API**: FastAPI-based service with intelligent agent classification\n- **Frontend Interface**: Modern Streamlit web application\n- **AI Agent**: LangChain/LangGraph agent with multiple tools\n- **Vector Database**: Supabase with embedding-based retrieval\n- **Web Search**: Tavily integration for real-time research\n\n## ✨ Features\n\n### 🤖 Intelligent Classification\n- Multi-modal AI agent using OpenAI GPT models\n- Vector similarity search for context-aware classification\n- Real-time web search for current medical research\n- Confidence scoring and detailed rationale\n\n### 🎨 Modern Interface\n- Clean, professional Streamlit frontend\n- Category-specific color coding and icons\n- Real-time API status monitoring\n- Interactive examples and guidance\n\n### 🔧 Robust Architecture\n- RESTful API with FastAPI\n- Asynchronous processing for scalability\n- Comprehensive error handling\n- Modular, maintainable codebase\n\n## 🏗️ Architecture\n\n```mermaid\ngraph TB\n    A[User Interface\u003cbr/\u003eStreamlit] --\u003e B[FastAPI Backend]\n    B --\u003e C[Agent Service]\n    C --\u003e D[OpenAI LLM]\n    C --\u003e E[Vector Database\u003cbr/\u003eSupabase]\n    C --\u003e F[Web Search\u003cbr/\u003eTavily]\n    E --\u003e G[Embedding Model\u003cbr/\u003eOpenAI]\n    \n    subgraph \"Tools\"\n        E\n        F\n    end\n    \n    subgraph \"AI Stack\"\n        D\n        G\n        H[LangChain/LangGraph]\n    end\n    \n    C --\u003e H\n```\n\n## 🏥 Categories\n\nThe system classifies papers into four main medical categories:\n\n| Category | Icon | Description | Color |\n|----------|------|-------------|-------|\n| **Cardiovascular** | ❤️ | Heart, blood vessels, circulatory system | Red |\n| **Neurological** | 🧠 | Brain, nervous system, neurological disorders | Green |\n| **Hepatorenal** | 🫀 | Liver, kidney, hepatic and renal systems | Orange |\n| **Oncological** | 🎗️ | Cancer, tumors, oncology research | Purple |\n\n## 🚀 Installation\n\n### Prerequisites\n\n- Python 3.8+\n- OpenAI API key\n- Supabase account and database\n- Tavily API key\n\n### Backend Setup\n\n1. **Clone the repository**\n   ```bash\n   git clone https://github.com/CarlosYazid/scientific-paper-classifier.git\n   cd scientific-paper-classifier\n   ```\n\n2. **Install backend dependencies**\n   ```bash\n   cd backend\n   pip install -r requirements.txt\n   ```\n\n3. **Configure environment variables**\n   ```bash\n   cp .env.example .env\n   # Edit .env with your API keys and configuration\n   ```\n\n4. **Run the FastAPI server**\n   ```bash\n   cd src\n   uvicorn main:app --reload --host 0.0.0.0 --port 8000\n   ```\n\n### Frontend Setup\n\n1. **Install frontend dependencies**\n   ```bash\n   cd frontend\n   pip install -r requirements.txt\n   ```\n\n2. **Run the Streamlit application**\n   ```bash\n   streamlit run app.py --server.port 8501\n   ```\n\n3. **Access the application**\n   - Frontend: http://localhost:8501\n   - API Docs: http://localhost:8000/docs\n\n## ⚙️ Configuration\n\n### Environment Variables\n\nCreate a `.env` file in the backend directory:\n\n```env\n# Database Configuration\nDATABASE_URL=your_supabase_url\nDATABASE_KEY=your_supabase_anon_key\nMAX_RESULT_RAG=10\nEMBEDDING_MODEL=text-embedding-3-small\nTHRESHOLD=0.7\n\n# OpenAI Configuration\nOPENAI_API_KEY=your_openai_api_key\nMODEL=gpt-4o-mini\n\n# Tavily Configuration\nTAVILY_API_KEY=your_tavily_api_key\nMAX_RESULT_TAVILY=5\n```\n\n### Database Setup\n\nThe system requires a Supabase database with:\n- Vector embeddings table for paper storage\n- `match_documents` function for similarity search\n- Proper indexing for performance\n\n## 📖 Usage\n\n### Web Interface\n\n1. **Access the Streamlit app** at http://localhost:8501\n2. **Enter paper details**:\n   - Title: Complete paper title\n   - Abstract: Full abstract text\n3. **Click \"Classify Paper\"** to get results\n4. **Review the output**:\n   - Category classification\n   - Confidence score (0-1)\n   - Detailed rationale\n\n### API Usage\n\n```python\nimport requests\n\n# Classification endpoint\nurl = \"http://localhost:8000/classify/\"\ndata = {\n    \"title\": \"Effects of ACE inhibitors on cardiovascular outcomes\",\n    \"abstract\": \"Background: Heart failure remains a leading cause...\"\n}\n\nresponse = requests.post(url, json=data)\nresult = response.json()\n\nprint(f\"Category: {result['category']}\")\nprint(f\"Confidence: {result['confidence']:.2%}\")\nprint(f\"Rationale: {result['rationale']}\")\n```\n\n## 📚 API Documentation\n\n### Endpoints\n\n#### `POST /classify/`\n\nClassifies a scientific paper into medical categories.\n\n**Request Body:**\n```json\n{\n  \"title\": \"string\",\n  \"abstract\": \"string\"\n}\n```\n\n**Response:**\n```json\n{\n  \"category\": \"Cardiovascular|Neurological|Hepatorenal|Oncological\",\n  \"confidence\": 0.95,\n  \"rationale\": \"Detailed explanation of the classification decision\"\n}\n```\n\n#### `GET /`\n\nHealth check endpoint.\n\n**Response:**\n```json\n{\n  \"status\": \"Ok\"\n}\n```\n\n## 📁 Project Structure\n\n```\nscientific-paper-classifier/\n├── backend/\n│   ├── core/\n│   │   ├── __init__.py\n│   │   └── settings.py\n│   ├── db/\n│   │   ├── __init__.py\n│   │   └── main.py\n│   ├── models/\n│   │   ├── __init__.py\n│   │   └── agent.py\n│   ├── routes/\n│   │   ├── __init__.py\n│   │   └── agent.py\n│   ├── services/\n│   │   ├── __init__.py\n│   │   └── agent.py\n│   └── main.py\n├── frontend/\n│   └── main.py\n├── requirements.txt\n├── .gitignore\n├── LICENSE\n└── README.md\n```\n\n## 🛠️ Technologies\n\n### Backend\n- **FastAPI**: Modern, fast web framework\n- **LangChain/LangGraph**: AI agent framework\n- **OpenAI**: Language model and embeddings\n- **Supabase**: Vector database and storage\n- **Tavily**: Web search API\n- **Pydantic**: Data validation and settings\n\n### Frontend\n- **Streamlit**: Interactive web applications\n- **Requests**: HTTP client for API calls\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n### Contribution Guidelines\n\n- Follow PEP 8 style guidelines\n- Write comprehensive tests\n- Update documentation\n- Ensure backward compatibility\n\n## 📋 Roadmap\n\n- [ ] **Multi-language support** for international papers\n- [ ] **Batch processing** for multiple papers\n- [ ] **Advanced analytics** and reporting dashboard\n- [ ] **Custom category** training and fine-tuning\n- [ ] **Integration APIs** for research platforms\n- [ ] **Performance optimization** and caching\n\n## 🐛 Known Issues\n\n- Large abstracts may take longer to process\n- Rate limiting on API calls may affect performance\n- Vector database requires proper indexing for optimal speed\n\n## 📄 License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## 🙋‍♂️ Support\n\nFor support and questions:\n\n- **Issues**: GitHub Issues\n- **Discussions**: GitHub Discussions\n- **Email**: [contact@carlospadilla.co]\n\n---\n\n**⭐ If you find this project helpful, please consider giving it a star!**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcarlosyazid%2Fai-data-challenge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcarlosyazid%2Fai-data-challenge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcarlosyazid%2Fai-data-challenge/lists"}