{"id":50010481,"url":"https://github.com/nikhil-kadapala/checkthat","last_synced_at":"2026-05-19T22:05:52.756Z","repository":{"id":278520466,"uuid":"925274301","full_name":"Nikhil-Kadapala/checkthat","owner":"Nikhil-Kadapala","description":"This Project transforms noisy social media posts into concise, fact-checkable statements, through iterative refinement pipelines with custom G-Eval (by DeepEval) feedback.","archived":false,"fork":false,"pushed_at":"2026-05-01T06:23:08.000Z","size":113513,"stargazers_count":0,"open_issues_count":14,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-01T08:23:53.907Z","etag":null,"topics":["check-worthyness","claim-detection","claim-normalization","fact-checking","fact-verification","information-retrieval"],"latest_commit_sha":null,"homepage":"https://www.checkthat-ai.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Nikhil-Kadapala.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-31T15:11:35.000Z","updated_at":"2026-05-01T06:22:09.000Z","dependencies_parsed_at":"2025-05-09T05:22:01.126Z","dependency_job_id":"3cafbca9-11e3-42af-85d2-4e34a77b52ea","html_url":"https://github.com/Nikhil-Kadapala/checkthat","commit_stats":null,"previous_names":["nikhil-kadapala/clef2025-checkthat-lab-task2","checkthat-ai/checkthat-ai","nikhil-kadapala/checkthat"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/Nikhil-Kadapala/checkthat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nikhil-Kadapala%2Fcheckthat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nikhil-Kadapala%2Fcheckthat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nikhil-Kadapala%2Fcheckthat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nikhil-Kadapala%2Fcheckthat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Nikhil-Kadapala","download_url":"https://codeload.github.com/Nikhil-Kadapala/checkthat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nikhil-Kadapala%2Fcheckthat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33235085,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-19T15:49:41.270Z","status":"ssl_error","status_checked_at":"2026-05-19T15:49:22.917Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["check-worthyness","claim-detection","claim-normalization","fact-checking","fact-verification","information-retrieval"],"created_at":"2026-05-19T22:05:48.727Z","updated_at":"2026-05-19T22:05:52.748Z","avatar_url":"https://github.com/Nikhil-Kadapala.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Claim Extraction and Normalization\n\nThis project is a part of \u003ca href=\"https://checkthat.gitlab.io/clef2025/task2/\" rel=\"noopener nofollow ugc\"\u003eCLEF-CheckThat! Lab's Task2 (2025)\u003c/a\u003e. Given a noisy, unstructured social media post, the task is to simplify it into a concise form. This system leverages advanced AI models to transform complex, informal claims into clear, normalized statements suitable for fact-checking and analysis.\n\n## 🎥 Demo Video \n\nhttps://github.com/user-attachments/assets/0363cfca-8a38-4abf-b04f-244f92cd878e\n\n![Claim Normalization Demo](https://img.shields.io/badge/Status-Live%20Demo-brightgreen?style=for-the-badge)\n\n## 🌐 Live Demo\n\n- **🚀 Web Application**: [https://nikhil-kadapala.github.io/clef2025-checkthat-lab-task2/](https://nikhil-kadapala.github.io/clef2025-checkthat-lab-task2/)\n- **⚡ API Backend**: Available via the web application interface\n- **🐍 Python SDK**: In development - will be released soon for programmatic access\n\n## 🚀 Features\n\n### 💬 Web Application\n- **Interactive Chat Interface**: Real-time claim normalization with streaming responses\n- **Batch Evaluation**: Upload datasets for comprehensive evaluation with multiple models\n- **Model Support**: GPT-4, Claude, Gemini, Llama, and Grok models\n- **Real-time Progress**: WebSocket-based live evaluation tracking\n- **Self-Refine \u0026 Cross-Refine**: Advanced refinement algorithms\n- **METEOR Scoring**: Automatic evaluation with detailed metrics\n- **Modern UI**: Responsive design with dark theme\n\n### 🔌 API Features\n- **RESTful Endpoints**: Clean API for claim normalization\n- **WebSocket Support**: Real-time evaluation progress updates\n- **Multiple Models**: Support for 8+ AI models\n- **Streaming Responses**: Efficient real-time text generation\n- **CORS Configured**: Ready for cross-origin requests\n\n## 🧠 How It Works\n\nThe system follows a sophisticated pipeline to normalize social media claims:\n\n1. **Input Processing**: Receives noisy, unstructured social media posts\n2. **Model Selection**: Chooses from multiple AI models (GPT-4, Claude, Gemini, etc.)\n3. **Normalization**: Applies selected prompting strategy:\n   - **Zero-shot**: Direct claim normalization\n   - **Few-shot**: Example-based learning\n   - **Chain-of-Thought**: Step-by-step reasoning\n   - **Self-Refine**: Iterative improvement process\n   - **Cross-Refine**: Multi-model collaborative refinement\n4. **Evaluation**: Automated METEOR scoring for quality assessment\n5. **Output**: Clean, normalized claims ready for fact-checking\n\n## 🛠️ Technology Stack\n\n- **Frontend**: [React](https://reactjs.org/) + [TypeScript](https://www.typescriptlang.org/) + [Vite](https://vitejs.dev/) + [Tailwind CSS](https://tailwindcss.com/)\n- **Backend**: [FastAPI](https://fastapi.tiangolo.com/) + [WebSocket](https://websockets.readthedocs.io/) + [Streaming](https://fastapi.tiangolo.com/advanced/streaming-response/)\n- **AI Models**: [OpenAI GPT](https://openai.com/api/), [Anthropic Claude](https://www.anthropic.com/), [Google Gemini](https://ai.google.dev/), [Meta Llama](https://llama.meta.com/), [xAI Grok](https://grok.x.ai/)\n- **Evaluation**: [METEOR](https://aclanthology.org/W05-0909/) scoring (nltk) with [pandas](https://pandas.pydata.org/) + [numpy](https://numpy.org/)\n- **Deployment**: [GitHub Pages](https://pages.github.com/) + [Render](https://render.com/)\n\n## 📋 Prerequisites\n\n- **Node.js** (v18 or higher)\n- **Python** (v3.8 or higher)\n- **API Keys** for chosen models (OpenAI, Anthropic, Gemini, xAI)\n\n## 🚀 Getting Started: Development and Local Testing\n\nFollow these steps to get the application running locally for development and testing.\n\n### Option 1: Automated Setup (Recommended)\n\nThe project includes automation scripts that handle the entire setup process:\n\n#### ⚡ **Quick Start**\n```bash\n# 1. Clone the repository\ngit clone \u003crepository-url\u003e\ncd clef2025-checkthat-lab-task2\n\n# 2. Set up environment variables (see below)\n\n# 3. Run automated setup\n./setup-project.sh\n\n# 4. Start the application\n./run-project.sh\n```\n\n\u003e **📝 Note**: You only need to run `./setup-project.sh` once for initial setup. After that, use `./run-project.sh` to start the application.\n\n#### 🔧 **Environment Variables**\nSet these before running the setup script:\n\n```bash\n# Linux/macOS:\nexport OPENAI_API_KEY=\"your-openai-key\"\nexport ANTHROPIC_API_KEY=\"your-anthropic-key\"\nexport GEMINI_API_KEY=\"your-gemini-key\"\nexport GROK_API_KEY=\"your-grok-key\"\n\n# Windows (PowerShell):\n$env:OPENAI_API_KEY=\"your-openai-key\"\n$env:ANTHROPIC_API_KEY=\"your-anthropic-key\"\n$env:GEMINI_API_KEY=\"your-gemini-key\"\n$env:GROK_API_KEY=\"your-grok-key\"\n```\n\n#### 📝 **What the Scripts Do**\n\n**`setup-project.sh`**:\n- ✅ Detects your OS (Linux/macOS/Windows)\n- ✅ Terminates conflicting processes on port 5173\n- ✅ Installs Node.js dependencies for the frontend\n- ✅ Fixes npm vulnerabilities automatically\n- ✅ Creates Python virtual environment\n- ✅ Installs Python dependencies with `uv` (faster) or falls back to `pip`\n- ✅ Handles cross-platform compatibility\n\n**`run-project.sh`**:\n- 🚀 Starts both frontend and backend simultaneously\n- 🎯 Frontend runs on `http://localhost:5173`\n- 🎯 Backend runs on `http://localhost:8000`\n- 🛑 Graceful shutdown with `Ctrl+C`\n- 📊 Shows process IDs for monitoring\n\n### Option 2: Manual Setup\n\nIf you prefer manual setup or encounter issues with the scripts:\n\n#### 1. Install Dependencies\n\n**Backend:**\n```bash\n# Create and activate virtual environment\npython -m venv .venv\nsource .venv/bin/activate  # Linux/macOS\n# .venv\\Scripts\\activate  # Windows\n\n# Install Python dependencies\npip install -r requirements.txt\n```\n\n**Frontend:**\n```bash\ncd src/app\nnpm install\n```\n\n#### 2. Run Development Servers\n\n**Backend \u0026 Frontend:**\n```bash\n# Start both servers\n./run-project.sh\n```\n\n_Alternatively, you can run them separately:_\n```bash\n# Terminal 1: Backend\ncd src/api \u0026\u0026 python main.py\n\n# Terminal 2: Frontend  \ncd src/app \u0026\u0026 npm run dev\n```\n\nOpen your browser and navigate to `http://localhost:5173` to see the application.\n\n## 🤖 Supported Models\n\n| Provider | Model | Free Tier | API Key Required |\n|----------|-------|-----------|------------------|\n| Together.ai | Llama 3.3 70B | ✅ | ❌ |\n| OpenAI | GPT-4o, GPT-4.1 | ❌ | ✅ |\n| Anthropic | Claude 3.7 Sonnet | ❌ | ✅ |\n| Google | Gemini 2.5 Pro, Flash | ❌ | ✅ |\n| xAI | Grok 3 | ❌ | ✅ |\n\n## 📊 Evaluation Methods\n\n- **Zero-shot**: Direct claim normalization\n- **Few-shot**: Example-based learning  \n- **Zero-shot-CoT**: Chain-of-thought reasoning\n- **Few-shot-CoT**: Examples with reasoning\n- **Self-Refine**: Iterative improvement\n- **Cross-Refine**: Multi-model refinement\n\n## 🧪 Example\n\n```\nInput: \"The government is hiding something from us!\"\n\nOutput: \"Government transparency concerns have been raised by citizens regarding public information access.\"\n\nMETEOR Score: 0.847\n```\n\n## 🖥️ Usage\n\n### Web Application\n```bash\n# Start both frontend and backend\n./run-project.sh\n```\n\nVisit `http://localhost:5173` to access the interactive web interface.\n\n### API Usage\n\n#### Start the API Server\n```bash\ncd src/api\npython main.py\n```\n\n#### Example API Call\n```bash\ncurl -X POST \"http://localhost:8000/chat\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"user_query\": \"The government is hiding something from us!\",\n    \"model\": \"meta-llama/Llama-3.3-70B-Instruct-Turbo-Free\"\n  }'\n```\n\n### CLI Tool (Legacy)\n```bash\n# Default usage (Llama 3.3 70B, Zero-shot)\npython src/claim_norm.py\n\n# Custom configuration\npython src/claim_norm.py -m OpenAI -p Zero-Shot-CoT -it 1\n```\n\n## 🌐 Deployment\n\n### Production Mode\n\nFor production deployment, you can run the application in production mode:\n\n```bash\n# Set environment variables\nexport OPENAI_API_KEY=\"your-openai-key\"\nexport ANTHROPIC_API_KEY=\"your-anthropic-key\"\nexport GEMINI_API_KEY=\"your-gemini-key\" \nexport GROK_API_KEY=\"your-grok-key\"\n\n# Build frontend for production\ncd src/app\nnpm run build\n\n# Start backend in production mode\ncd ../api\npython main.py\n```\n\n\u003e **📝 Note**: Docker support is not currently implemented. The application runs natively using Python and Node.js.\n\n### Frontend (GitHub Pages)\nThe frontend is automatically deployed to GitHub Pages:\n\n1. Build: `npm run deploy`\n2. Commit and push changes\n3. GitHub Pages serves from `/docs` folder\n\n### Backend (Cloud Hosting)\nThe FastAPI backend is deployed to a cloud hosting service with standard configuration:\n\n- **Runtime**: Python FastAPI application\n- **Environment**: Production environment with API keys configured\n- **Features**: CORS enabled, WebSocket support, streaming responses\n\n## 🔧 Development\n\n### Project Architecture\n- **Frontend**: React + TypeScript + Vite + Tailwind CSS\n- **Backend**: FastAPI + WebSocket + Streaming\n- **Evaluation**: METEOR scoring with pandas/numpy\n- **Models**: Multiple AI providers with unified interface\n\n### Code Quality\n- TypeScript for type safety\n- ESLint + Prettier for code formatting\n- Python type hints\n- Error handling and logging\n\n## 🛠️ Troubleshooting\n\n### Script Issues\n\n#### Permission Denied\n```bash\n# Make scripts executable\nchmod +x setup-project.sh run-project.sh\n```\n\n#### Windows Script Execution\n```bash\n# Use Git Bash or WSL\nbash setup-project.sh\nbash run-project.sh\n\n# Or install WSL if not available\nwsl --install\n```\n\n#### Port Already in Use\nThe scripts automatically handle port conflicts, but if you encounter issues:\n```bash\n# Kill processes on port 5173 (frontend)\n# Linux/macOS:\nlsof -ti:5173 | xargs kill -9\n\n# Windows:\nnetstat -ano | findstr :5173\ntaskkill /PID \u003cPID\u003e /F\n\n# Kill processes on port 8000 (backend)\n# Linux/macOS:\nlsof -ti:8000 | xargs kill -9\n\n# Windows:\nnetstat -ano | findstr :8000\ntaskkill /PID \u003cPID\u003e /F\n```\n\n### Common Issues\n\n#### API Keys Not Working\n- Ensure environment variables are set before running scripts\n- Check for typos in environment variable names\n- Verify API keys are valid and have sufficient quota\n\n#### Frontend Build Errors\n- Clear node_modules and reinstall dependencies\n- Check Node.js version (requires v18+)\n- Update npm to latest version: `npm install -g npm@latest`\n\n#### Backend Import Errors\n- Ensure virtual environment is activated\n- Install missing dependencies: `pip install -r requirements.txt`\n- Check Python version (requires v3.8+)\n\n## 📁 Project Structure\n\n```\nclef2025-checkthat-lab-task2/\n├── src/\n│   ├── api/                        # FastAPI backend (deployed to Render)\n│   │   └── main.py                 # Main API server with WebSocket support\n│   ├── app/                        # Full-stack web application\n│   │   ├── client/                 # React frontend application\n│   │   │   ├── src/\n│   │   │   │   ├── components/     # React components\n│   │   │   │   ├── contexts/       # React contexts\n│   │   │   │   ├── lib/           # Utility libraries\n│   │   │   │   └── pages/         # Page components\n│   │   │   └── index.html         # Main HTML template\n│   │   ├── server/                 # Development server (Express + Vite)\n│   │   ├── shared/                 # Shared types and utilities\n│   │   │   ├── types.ts           # TypeScript type definitions\n│   │   │   ├── prompts.ts         # Prompt templates\n│   │   │   └── schema.ts          # Database schema\n│   │   ├── vite.config.ts         # Vite configuration\n│   │   └── package.json           # Frontend dependencies\n│   ├── utils/                      # ML utilities and model interfaces\n│   │   ├── evaluate.py            # Evaluation logic with METEOR scoring\n│   │   ├── self_refine.py         # Self-refinement algorithms\n│   │   ├── get_model_response.py  # Model API orchestration\n│   │   ├── prompts.py             # Python prompt templates\n│   │   ├── gpt.py                 # OpenAI GPT integration\n│   │   ├── llama.py               # Llama model integration\n│   │   ├── claude.py              # Anthropic Claude integration\n│   │   ├── gemini.py              # Google Gemini integration\n│   │   └── grok.py                # xAI Grok integration\n│   ├── data/                       # Dataset results and cache\n│   └── claim_norm.py              # CLI tool (legacy interface)\n├── data/                          # Main datasets\n│   ├── dev.csv                    # Development dataset\n│   ├── test.csv                   # Test dataset\n│   ├── dev_data.jsonl             # JSONL format development data\n│   └── dev_data_fixed.jsonl       # Corrected development data\n├── docs/                          # Production build (GitHub Pages)\n├── requirements.txt               # Python dependencies\n└── README.md                      # Project documentation\n```\n\n## 🤝 Contributing\n\nContributions are welcome! Please:\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests if applicable\n5. Submit a pull request\n\n## 📄 License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n\n## 📚 Citation\n\nIf you use this project in your research, please cite:\n\n```bibtex\n@misc{nkadapala-clef2025-checkthat-task2,\n  title={Claim Extraction and Normalization for CLEF-CheckThat! Lab Task 2},\n  author={Nikhil Kadapala},\n  year={2025},\n  url={https://github.com/nikhil-kadapala/clef2025-checkthat-lab-task2}\n}\n```\n\n\n\nI'll help you build a comprehensive Python SDK for claim normalization with a refinement loop using DeepEval's G-Eval metric. Let me first explore your current codebase to understand the existing structure and then provide you with a detailed implementation plan.\n\n[8 tools called]\n\nBased on my analysis of your codebase, I can see you have a solid foundation with LLM integrations, claim normalization logic, and partial DeepEval integration. However, there's a syntax error in your `refine.py` file and several areas that need enhancement for a production-ready SDK. Here's my comprehensive implementation plan:\n\n## 🎯 Implementation Plan: Claim Normalization SDK with DeepEval G-Eval\n\n### Phase 1: Core SDK Architecture (Week 1)\n\n#### 1.1 Fix Current Issues\nFirst, let's fix the syntax error in `api/services/refine.py`:\n\n```python\n# Line 74-76 needs completion\nif feedback is not None and feedback.score \u003e 0.5:\n    # Add refinement logic here\n    refine_user_prompt = f\"\"\"\n        ## Original Query\n        {original_query}\n        \n        ## Current Response\n        {current_claim}\n        \n        ## Feedback\n        {feedback}\n        \n        ## Task\n        Please refine the response based on the feedback provided.\n    \"\"\"\n    current_claim = self._refine_claim(refine_user_prompt, self.refine_sys_prompt)\n```\n\n#### 1.2 SDK Core Structure\nCreate a new SDK module structure:\n\n```\nclef_claim_norm_sdk/\n├── __init__.py\n├── core/\n│   ├── __init__.py\n│   ├── normalizer.py         # Main ClaimNormalizer class\n│   ├── refiner.py           # Enhanced refinement logic\n│   ├── evaluator.py         # DeepEval G-Eval integration\n│   └── session_manager.py   # Session and state management\n├── models/\n│   ├── __init__.py\n│   ├── llm_providers.py     # LLM provider abstractions\n│   ├── schemas.py           # Pydantic models\n│   └── config.py            # Configuration management\n├── utils/\n│   ├── __init__.py\n│   ├── decorators.py        # Custom decorators\n│   ├── logging.py           # Structured logging\n│   ├── metrics.py           # Performance metrics\n│   └── exceptions.py        # Custom exceptions\n├── api/\n│   ├── __init__.py\n│   ├── client.py            # SDK client interface\n│   └── async_client.py      # Async client interface\n└── examples/\n    ├── basic_usage.py\n    ├── advanced_refinement.py\n    └── batch_processing.py\n```\n\n#### 1.3 Main SDK Client Interface\n\n```python\n# clef_claim_norm_sdk/api/client.py\nfrom typing import List, Optional, Dict, Any, Union\nfrom dataclasses import dataclass\nfrom ..core.normalizer import ClaimNormalizer\nfrom ..core.refiner import ClaimRefiner\nfrom ..core.evaluator import GEvalEvaluator\nfrom ..models.config import SDKConfig, ModelConfig, RefinementConfig\nfrom ..utils.decorators import retry, log_execution, monitor_performance\n\n@dataclass\nclass ClaimResult:\n    \"\"\"Result container for claim processing\"\"\"\n    original_claim: str\n    normalized_claim: str\n    confidence_score: float\n    refinement_iterations: int\n    evaluation_metrics: Dict[str, float]\n    metadata: Dict[str, Any]\n\nclass ClaimNormalizationSDK:\n    \"\"\"\n    Main SDK interface for claim normalization with refinement and evaluation.\n    \n    Example:\n        \u003e\u003e\u003e from clef_claim_norm_sdk import ClaimNormalizationSDK\n        \u003e\u003e\u003e sdk = ClaimNormalizationSDK(\n        ...     model_provider=\"openai\",\n        ...     model_name=\"gpt-4\",\n        ...     api_key=\"your-api-key\"\n        ... )\n        \u003e\u003e\u003e result = sdk.normalize_claim(\"The government is hiding something!\")\n        \u003e\u003e\u003e print(result.normalized_claim)\n    \"\"\"\n    \n    def __init__(\n        self,\n        model_provider: str = \"openai\",\n        model_name: str = \"gpt-4\",\n        api_key: Optional[str] = None,\n        config: Optional[SDKConfig] = None,\n        **kwargs\n    ):\n        self.config = config or SDKConfig.from_dict(kwargs)\n        self.normalizer = ClaimNormalizer(\n            model_provider=model_provider,\n            model_name=model_name,\n            api_key=api_key\n        )\n        self.evaluator = GEvalEvaluator(\n            model_provider=model_provider,\n            model_name=model_name,\n            api_key=api_key\n        )\n        self.refiner = ClaimRefiner(\n            normalizer=self.normalizer,\n            evaluator=self.evaluator,\n            config=self.config.refinement\n        )\n    \n    @retry(max_attempts=3, backoff_factor=2)\n    @log_execution\n    @monitor_performance\n    def normalize_claim(\n        self,\n        claim: str,\n        enable_refinement: bool = True,\n        custom_criteria: Optional[List[str]] = None\n    ) -\u003e ClaimResult:\n        \"\"\"\n        Normalize a single claim with optional refinement.\n        \n        Args:\n            claim: Raw claim text to normalize\n            enable_refinement: Whether to use refinement loop\n            custom_criteria: Custom evaluation criteria for G-Eval\n            \n        Returns:\n            ClaimResult with normalized claim and metrics\n        \"\"\"\n        # Implementation here\n        pass\n    \n    @log_execution\n    @monitor_performance\n    def normalize_batch(\n        self,\n        claims: List[str],\n        batch_size: int = 10,\n        enable_refinement: bool = True,\n        progress_callback: Optional[callable] = None\n    ) -\u003e List[ClaimResult]:\n        \"\"\"\n        Process multiple claims in batches.\n        \n        Args:\n            claims: List of raw claims to normalize\n            batch_size: Number of claims to process simultaneously\n            enable_refinement: Whether to use refinement loop\n            progress_callback: Optional callback for progress updates\n            \n        Returns:\n            List of ClaimResult objects\n        \"\"\"\n        # Implementation here\n        pass\n```\n\n### Phase 2: Enhanced G-Eval Integration (Week 2)\n\n#### 2.1 G-Eval Evaluator Implementation\n\n```python\n# clef_claim_norm_sdk/core/evaluator.py\nfrom typing import List, Dict, Any, Optional\nimport asyncio\nfrom deepeval.metrics import GEval\nfrom deepeval.test_case import LLMTestCase, LLMTestCaseParams\nfrom deepeval.models import GPTModel, AnthropicModel, GeminiModel\nfrom ..utils.decorators import async_retry, log_execution\nfrom ..utils.exceptions import EvaluationError\n\nclass GEvalEvaluator:\n    \"\"\"\n    G-Eval based evaluator for claim quality assessment.\n    \n    Supports multiple evaluation criteria and custom scoring.\n    \"\"\"\n    \n    DEFAULT_CRITERIA = [\n        \"Clarity: Is the claim clear and unambiguous?\",\n        \"Verifiability: Can this claim be fact-checked with reliable sources?\",\n        \"Specificity: Is the claim specific rather than vague?\",\n        \"Neutrality: Is the claim presented in a neutral, factual manner?\",\n        \"Completeness: Does the claim contain sufficient context to be understood?\"\n    ]\n    \n    def __init__(\n        self,\n        model_provider: str,\n        model_name: str,\n        api_key: str,\n        criteria: Optional[List[str]] = None,\n        threshold: float = 0.7\n    ):\n        self.model_provider = model_provider\n        self.model_name = model_name\n        self.api_key = api_key\n        self.criteria = criteria or self.DEFAULT_CRITERIA\n        self.threshold = threshold\n        self._setup_model()\n    \n    def _setup_model(self):\n        \"\"\"Initialize the evaluation model based on provider\"\"\"\n        if self.model_provider.lower() == \"openai\":\n            self.model = GPTModel(model=self.model_name, api_key=self.api_key)\n        elif self.model_provider.lower() == \"anthropic\":\n            self.model = AnthropicModel(model=self.model_name, api_key=self.api_key)\n        elif self.model_provider.lower() == \"gemini\":\n            self.model = GeminiModel(model=self.model_name, api_key=self.api_key)\n        else:\n            raise ValueError(f\"Unsupported model provider: {self.model_provider}\")\n    \n    @async_retry(max_attempts=3)\n    @log_execution\n    async def evaluate_claim_quality(\n        self,\n        original_claim: str,\n        normalized_claim: str,\n        custom_criteria: Optional[List[str]] = None\n    ) -\u003e Dict[str, float]:\n        \"\"\"\n        Evaluate normalized claim quality using G-Eval.\n        \n        Args:\n            original_claim: Original raw claim\n            normalized_claim: Processed claim to evaluate\n            custom_criteria: Override default evaluation criteria\n            \n        Returns:\n            Dictionary with evaluation scores\n        \"\"\"\n        try:\n            criteria = custom_criteria or self.criteria\n            evaluation_results = {}\n            \n            for criterion in criteria:\n                metric = GEval(\n                    name=f\"Claim_Quality_{criterion.split(':')[0]}\",\n                    criteria=criterion,\n                    evaluation_params=[\n                        LLMTestCaseParams.INPUT,\n                        LLMTestCaseParams.ACTUAL_OUTPUT\n                    ],\n                    model=self.model\n                )\n                \n                test_case = LLMTestCase(\n                    input=original_claim,\n                    actual_output=normalized_claim\n                )\n                \n                # Run evaluation\n                await metric.a_measure(test_case)\n                \n                criterion_name = criterion.split(':')[0].lower()\n                evaluation_results[criterion_name] = metric.score\n            \n            # Calculate overall score\n            evaluation_results['overall_score'] = sum(evaluation_results.values()) / len(evaluation_results)\n            \n            return evaluation_results\n            \n        except Exception as e:\n            raise EvaluationError(f\"G-Eval evaluation failed: {str(e)}\")\n    \n    def meets_threshold(self, scores: Dict[str, float]) -\u003e bool:\n        \"\"\"Check if evaluation scores meet the quality threshold\"\"\"\n        return scores.get('overall_score', 0) \u003e= self.threshold\n```\n\n#### 2.2 Enhanced Refinement Loop\n\n```python\n# clef_claim_norm_sdk/core/refiner.py\nfrom typing import Optional, Dict, Any, List\nimport asyncio\nfrom ..models.config import RefinementConfig\nfrom ..utils.decorators import log_execution, monitor_performance\nfrom ..utils.exceptions import RefinementError\n\nclass ClaimRefiner:\n    \"\"\"\n    Advanced claim refinement with G-Eval feedback loop.\n    \"\"\"\n    \n    def __init__(\n        self,\n        normalizer,\n        evaluator,\n        config: RefinementConfig\n    ):\n        self.normalizer = normalizer\n        self.evaluator = evaluator\n        self.config = config\n    \n    @log_execution\n    @monitor_performance\n    async def refine_claim(\n        self,\n        original_claim: str,\n        initial_normalized: str,\n        custom_criteria: Optional[List[str]] = None\n    ) -\u003e Dict[str, Any]:\n        \"\"\"\n        Refine claim through iterative G-Eval feedback.\n        \n        Args:\n            original_claim: Original raw claim\n            initial_normalized: Initial normalization attempt\n            custom_criteria: Custom evaluation criteria\n            \n        Returns:\n            Dictionary with final claim and refinement metadata\n        \"\"\"\n        current_claim = initial_normalized\n        iteration = 0\n        refinement_history = []\n        \n        while iteration \u003c self.config.max_iterations:\n            # Evaluate current claim\n            scores = await self.evaluator.evaluate_claim_quality(\n                original_claim, current_claim, custom_criteria\n            )\n            \n            refinement_history.append({\n                'iteration': iteration,\n                'claim': current_claim,\n                'scores': scores\n            })\n            \n            # Check if quality threshold is met\n            if self.evaluator.meets_threshold(scores):\n                break\n            \n            # Generate refinement feedback\n            feedback = self._generate_feedback(scores, original_claim, current_claim)\n            \n            # Apply refinement\n            refined_claim = await self.normalizer.refine_with_feedback(\n                original_claim, current_claim, feedback\n            )\n            \n            current_claim = refined_claim\n            iteration += 1\n        \n        return {\n            'final_claim': current_claim,\n            'iterations': iteration,\n            'final_scores': scores,\n            'history': refinement_history,\n            'converged': self.evaluator.meets_threshold(scores)\n        }\n    \n    def _generate_feedback(\n        self,\n        scores: Dict[str, float],\n        original_claim: str,\n        current_claim: str\n    ) -\u003e str:\n        \"\"\"Generate specific feedback based on G-Eval scores\"\"\"\n        feedback_parts = []\n        \n        for criterion, score in scores.items():\n            if criterion == 'overall_score':\n                continue\n                \n            if score \u003c self.config.improvement_threshold:\n                feedback_parts.append(\n                    f\"Improve {criterion}: Current score {score:.2f} is below threshold. \"\n                    f\"Focus on making the claim more {criterion.lower()}.\"\n                )\n        \n        return \"\\n\".join(feedback_parts)\n```\n\n### Phase 3: Production-Ready Features (Week 3)\n\n#### 3.1 Configuration Management\n\n```python\n# clef_claim_norm_sdk/models/config.py\nfrom pydantic import BaseModel, Field, validator\nfrom typing import Dict, List, Optional, Any\nfrom enum import Enum\n\nclass ModelProvider(str, Enum):\n    OPENAI = \"openai\"\n    ANTHROPIC = \"anthropic\"\n    GEMINI = \"gemini\"\n    XAI = \"xai\"\n\nclass RefinementConfig(BaseModel):\n    \"\"\"Configuration for refinement process\"\"\"\n    max_iterations: int = Field(default=3, ge=1, le=10)\n    improvement_threshold: float = Field(default=0.6, ge=0.0, le=1.0)\n    convergence_threshold: float = Field(default=0.8, ge=0.0, le=1.0)\n    early_stopping: bool = True\n\nclass SDKConfig(BaseModel):\n    \"\"\"Main SDK configuration\"\"\"\n    # Model settings\n    model_provider: ModelProvider = ModelProvider.OPENAI\n    model_name: str = \"gpt-4\"\n    temperature: float = Field(default=0.1, ge=0.0, le=2.0)\n    max_tokens: int = Field(default=1000, ge=1)\n    \n    # Refinement settings\n    refinement: RefinementConfig = Field(default_factory=RefinementConfig)\n    \n    # Evaluation settings\n    evaluation_criteria: Optional[List[str]] = None\n    quality_threshold: float = Field(default=0.7, ge=0.0, le=1.0)\n    \n    # Performance settings\n    request_timeout: int = Field(default=30, ge=1)\n    max_retries: int = Field(default=3, ge=0, le=10)\n    batch_size: int = Field(default=10, ge=1, le=100)\n    \n    # Logging and monitoring\n    enable_logging: bool = True\n    log_level: str = \"INFO\"\n    enable_metrics: bool = True\n    \n    @classmethod\n    def from_dict(cls, config_dict: Dict[str, Any]) -\u003e 'SDKConfig':\n        \"\"\"Create config from dictionary\"\"\"\n        return cls(**config_dict)\n    \n    @classmethod\n    def from_file(cls, config_path: str) -\u003e 'SDKConfig':\n        \"\"\"Load config from YAML/JSON file\"\"\"\n        import json\n        import yaml\n        from pathlib import Path\n        \n        path = Path(config_path)\n        if path.suffix.lower() == '.json':\n            with open(path) as f:\n                config_dict = json.load(f)\n        elif path.suffix.lower() in ['.yml', '.yaml']:\n            with open(path) as f:\n                config_dict = yaml.safe_load(f)\n        else:\n            raise ValueError(f\"Unsupported config file format: {path.suffix}\")\n        \n        return cls.from_dict(config_dict)\n```\n\n#### 3.2 Custom Decorators and Utilities\n\n```python\n# clef_claim_norm_sdk/utils/decorators.py\nimport functools\nimport time\nimport asyncio\nimport logging\nfrom typing import Any, Callable, Optional\nfrom .exceptions import RetryExhaustedError, SDKError\nfrom .metrics import performance_monitor\n\nlogger = logging.getLogger(__name__)\n\ndef retry(max_attempts: int = 3, backoff_factor: float = 2, exceptions: tuple = (Exception,)):\n    \"\"\"Retry decorator with exponential backoff\"\"\"\n    def decorator(func: Callable) -\u003e Callable:\n        @functools.wraps(func)\n        def wrapper(*args, **kwargs):\n            last_exception = None\n            \n            for attempt in range(max_attempts):\n                try:\n                    return func(*args, **kwargs)\n                except exceptions as e:\n                    last_exception = e\n                    if attempt \u003c max_attempts - 1:\n                        wait_time = backoff_factor ** attempt\n                        logger.warning(f\"Attempt {attempt + 1} failed: {e}. Retrying in {wait_time}s...\")\n                        time.sleep(wait_time)\n                    else:\n                        logger.error(f\"All {max_attempts} attempts failed\")\n            \n            raise RetryExhaustedError(f\"Failed after {max_attempts} attempts\") from last_exception\n        return wrapper\n    return decorator\n\ndef async_retry(max_attempts: int = 3, backoff_factor: float = 2, exceptions: tuple = (Exception,)):\n    \"\"\"Async retry decorator with exponential backoff\"\"\"\n    def decorator(func: Callable) -\u003e Callable:\n        @functools.wraps(func)\n        async def wrapper(*args, **kwargs):\n            last_exception = None\n            \n            for attempt in range(max_attempts):\n                try:\n                    return await func(*args, **kwargs)\n                except exceptions as e:\n                    last_exception = e\n                    if attempt \u003c max_attempts - 1:\n                        wait_time = backoff_factor ** attempt\n                        logger.warning(f\"Attempt {attempt + 1} failed: {e}. Retrying in {wait_time}s...\")\n                        await asyncio.sleep(wait_time)\n                    else:\n                        logger.error(f\"All {max_attempts} attempts failed\")\n            \n            raise RetryExhaustedError(f\"Failed after {max_attempts} attempts\") from last_exception\n        return wrapper\n    return decorator\n\ndef log_execution(func: Callable) -\u003e Callable:\n    \"\"\"Log function execution with timing\"\"\"\n    @functools.wraps(func)\n    def wrapper(*args, **kwargs):\n        start_time = time.time()\n        logger.info(f\"Starting {func.__name__}\")\n        \n        try:\n            result = func(*args, **kwargs)\n            execution_time = time.time() - start_time\n            logger.info(f\"Completed {func.__name__} in {execution_time:.2f}s\")\n            return result\n        except Exception as e:\n            execution_time = time.time() - start_time\n            logger.error(f\"Failed {func.__name__} after {execution_time:.2f}s: {e}\")\n            raise\n    return wrapper\n\ndef monitor_performance(func: Callable) -\u003e Callable:\n    \"\"\"Monitor function performance metrics\"\"\"\n    @functools.wraps(func)\n    def wrapper(*args, **kwargs):\n        return performance_monitor.track_execution(func, *args, **kwargs)\n    return wrapper\n\ndef validate_inputs(**validation_rules):\n    \"\"\"Validate function inputs\"\"\"\n    def decorator(func: Callable) -\u003e Callable:\n        @functools.wraps(func)\n        def wrapper(*args, **kwargs):\n            # Perform validation based on rules\n            for param_name, rule in validation_rules.items():\n                if param_name in kwargs:\n                    value = kwargs[param_name]\n                    if not rule(value):\n                        raise ValueError(f\"Invalid value for {param_name}: {value}\")\n            return func(*args, **kwargs)\n        return wrapper\n    return decorator\n```\n\n### Phase 4: API and Distribution (Week 4)\n\n#### 4.1 API Client for Remote Usage\n\n```python\n# clef_claim_norm_sdk/api/async_client.py\nimport aiohttp\nimport asyncio\nfrom typing import List, Dict, Any, Optional, AsyncGenerator\nfrom ..models.schemas import ClaimResult, BatchProcessRequest\nfrom ..utils.exceptions import APIError\n\nclass AsyncClaimNormalizationClient:\n    \"\"\"\n    Async client for remote API access to claim normalization service.\n    \"\"\"\n    \n    def __init__(\n        self,\n        base_url: str,\n        api_key: Optional[str] = None,\n        timeout: int = 30\n    ):\n        self.base_url = base_url.rstrip('/')\n        self.api_key = api_key\n        self.timeout = aiohttp.ClientTimeout(total=timeout)\n    \n    async def __aenter__(self):\n        self.session = aiohttp.ClientSession(timeout=self.timeout)\n        return self\n    \n    async def __aexit__(self, exc_type, exc_val, exc_tb):\n        await self.session.close()\n    \n    async def normalize_claim(\n        self,\n        claim: str,\n        enable_refinement: bool = True,\n        custom_criteria: Optional[List[str]] = None\n    ) -\u003e ClaimResult:\n        \"\"\"Normalize a single claim via API\"\"\"\n        url = f\"{self.base_url}/api/v1/normalize\"\n        \n        payload = {\n            \"claim\": claim,\n            \"enable_refinement\": enable_refinement,\n            \"custom_criteria\": custom_criteria\n        }\n        \n        headers = {\"Content-Type\": \"application/json\"}\n        if self.api_key:\n            headers[\"Authorization\"] = f\"Bearer {self.api_key}\"\n        \n        async with self.session.post(url, json=payload, headers=headers) as response:\n            if response.status == 200:\n                data = await response.json()\n                return ClaimResult(**data)\n            else:\n                error_msg = await response.text()\n                raise APIError(f\"API request failed: {response.status} - {error_msg}\")\n    \n    async def normalize_batch_stream(\n        self,\n        claims: List[str],\n        batch_size: int = 10,\n        enable_refinement: bool = True\n    ) -\u003e AsyncGenerator[ClaimResult, None]:\n        \"\"\"Stream batch normalization results\"\"\"\n        url = f\"{self.base_url}/api/v1/normalize/batch/stream\"\n        \n        payload = {\n            \"claims\": claims,\n            \"batch_size\": batch_size,\n            \"enable_refinement\": enable_refinement\n        }\n        \n        headers = {\"Content-Type\": \"application/json\"}\n        if self.api_key:\n            headers[\"Authorization\"] = f\"Bearer {self.api_key}\"\n        \n        async with self.session.post(url, json=payload, headers=headers) as response:\n            if response.status == 200:\n                async for line in response.content:\n                    if line:\n                        data = await response.json()\n                        yield ClaimResult(**data)\n            else:\n                error_msg = await response.text()\n                raise APIError(f\"Stream request failed: {response.status} - {error_msg}\")\n```\n\n#### 4.2 Package Setup for Distribution\n\n```python\n# setup.py\nfrom setuptools import setup, find_packages\n\nwith open(\"README.md\", \"r\", encoding=\"utf-8\") as fh:\n    long_description = fh.read()\n\nwith open(\"requirements.txt\", \"r\", encoding=\"utf-8\") as fh:\n    requirements = [line.strip() for line in fh if line.strip() and not line.startswith(\"#\")]\n\nsetup(\n    name=\"clef-claim-norm-sdk\",\n    version=\"1.0.0\",\n    author=\"Your Name\",\n    author_email=\"your.email@domain.com\",\n    description=\"Professional SDK for claim normalization with G-Eval refinement\",\n    long_description=long_description,\n    long_description_content_type=\"text/markdown\",\n    url=\"https://github.com/yourusername/clef-claim-norm-sdk\",\n    packages=find_packages(),\n    classifiers=[\n        \"Development Status :: 4 - Beta\",\n        \"Intended Audience :: Developers\",\n        \"Intended Audience :: Science/Research\",\n        \"License :: OSI Approved :: MIT License\",\n        \"Operating System :: OS Independent\",\n        \"Programming Language :: Python :: 3\",\n        \"Programming Language :: Python :: 3.8\",\n        \"Programming Language :: Python :: 3.9\",\n        \"Programming Language :: Python :: 3.10\",\n        \"Programming Language :: Python :: 3.11\",\n        \"Topic :: Scientific/Engineering :: Artificial Intelligence\",\n        \"Topic :: Text Processing :: Linguistic\",\n    ],\n    python_requires=\"\u003e=3.8\",\n    install_requires=requirements,\n    extras_require={\n        \"dev\": [\n            \"pytest\u003e=7.0.0\",\n            \"pytest-asyncio\u003e=0.21.0\",\n            \"pytest-cov\u003e=4.0.0\",\n            \"black\u003e=22.0.0\",\n            \"ruff\u003e=0.0.200\",\n            \"mypy\u003e=1.0.0\",\n        ],\n        \"docs\": [\n            \"sphinx\u003e=5.0.0\",\n            \"sphinx-rtd-theme\u003e=1.0.0\",\n            \"myst-parser\u003e=0.18.0\",\n        ],\n    },\n    entry_points={\n        \"console_scripts\": [\n            \"clef-claim-norm=clef_claim_norm_sdk.cli:main\",\n        ],\n    },\n    include_package_data=True,\n    package_data={\n        \"clef_claim_norm_sdk\": [\"data/*.json\", \"templates/*.txt\"],\n    },\n)\n```\n\n### Phase 5: Testing and Quality Assurance\n\n#### 5.1 Comprehensive Testing Suite\n\n```python\n# tests/test_sdk_integration.py\nimport pytest\nimport asyncio\nfrom clef_claim_norm_sdk import ClaimNormalizationSDK\nfrom clef_claim_norm_sdk.models.config import SDKConfig, RefinementConfig\n\n@pytest.mark.asyncio\nclass TestSDKIntegration:\n    \n    @pytest.fixture\n    def sdk_config(self):\n        return SDKConfig(\n            model_provider=\"openai\",\n            model_name=\"gpt-4\",\n            refinement=RefinementConfig(\n                max_iterations=2,\n                improvement_threshold=0.6\n            )\n        )\n    \n    @pytest.fixture\n    def sdk(self, sdk_config):\n        return ClaimNormalizationSDK(\n            config=sdk_config,\n            api_key=\"test-api-key\"\n        )\n    \n    async def test_single_claim_normalization(self, sdk):\n        \"\"\"Test basic claim normalization\"\"\"\n        result = await sdk.normalize_claim(\n            \"The government is hiding alien technology!\"\n        )\n        \n        assert result.normalized_claim\n        assert result.confidence_score \u003e= 0\n        assert result.refinement_iterations \u003e= 0\n        assert 'overall_score' in result.evaluation_metrics\n    \n    async def test_batch_processing(self, sdk):\n        \"\"\"Test batch claim processing\"\"\"\n        claims = [\n            \"The government is hiding alien technology!\",\n            \"COVID vaccines contain microchips\",\n            \"Climate change is a hoax\"\n        ]\n        \n        results = await sdk.normalize_batch(claims, batch_size=2)\n        \n        assert len(results) == len(claims)\n        for result in results:\n            assert result.normalized_claim\n            assert isinstance(result.evaluation_metrics, dict)\n    \n    async def test_custom_criteria(self, sdk):\n        \"\"\"Test custom evaluation criteria\"\"\"\n        custom_criteria = [\n            \"Scientific accuracy: Is the claim scientifically plausible?\",\n            \"Evidence requirement: Does the claim require specific evidence?\"\n        ]\n        \n        result = await sdk.normalize_claim(\n            \"Water boils at 100°C at sea level\",\n            custom_criteria=custom_criteria\n        )\n        \n        assert 'scientific_accuracy' in result.evaluation_metrics\n        assert 'evidence_requirement' in result.evaluation_metrics\n```\n\n### Summary and Next Steps\n\nThis implementation plan provides:\n\n1. **🏗️ Robust Architecture**: Modular design with clear separation of concerns\n2. **🔧 Industry Standards**: Proper decorators, error handling, logging, and monitoring\n3. **📊 G-Eval Integration**: Comprehensive evaluation with custom criteria support\n4. **🔄 Advanced Refinement**: Iterative improvement loop with convergence detection\n5. **🌐 API Ready**: Both local SDK and remote API client capabilities\n6. **📦 Distribution Ready**: Proper packaging for PyPI publication\n7. **🧪 Testing Coverage**: Comprehensive test suite for reliability\n8. **📚 Documentation**: Clear examples and API documentation\n\n**Implementation Timeline**: 4 weeks to complete all phases\n\n**Next Immediate Steps**:\n1. Fix the syntax error in `refine.py`\n2. Set up the new SDK project structure\n3. Implement the core `ClaimNormalizer` class\n4. Integrate G-Eval evaluation\n5. Create comprehensive tests\n\nWould you like me to start implementing any specific component, or would you prefer to discuss any aspects of this plan in more detail?","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikhil-kadapala%2Fcheckthat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnikhil-kadapala%2Fcheckthat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikhil-kadapala%2Fcheckthat/lists"}