{"id":30359340,"url":"https://github.com/ysskrishna/llm-text-evaluation-framework","last_synced_at":"2026-04-29T16:38:58.903Z","repository":{"id":309556234,"uuid":"1036674439","full_name":"ysskrishna/llm-text-evaluation-framework","owner":"ysskrishna","description":"Production-ready Streamlit app for LLM response evaluation \u0026 benchmarking, scoring outputs across Relevance, Accuracy, Completeness, Coherence, Creativity, Tone, and Intent Alignment. Includes interactive analytics, history tracking, and Docker deployment.","archived":false,"fork":false,"pushed_at":"2025-08-12T13:59:51.000Z","size":609,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-12T15:37:44.819Z","etag":null,"topics":["ai","ai-evaluation","ai-research","benchmarking","dashboard","docker","evaluation","llm","natural-language-processing","nlp","python","streamlit","ysskrishna"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ysskrishna.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-12T12:24:26.000Z","updated_at":"2025-08-12T13:59:54.000Z","dependencies_parsed_at":"2025-08-12T15:37:52.888Z","dependency_job_id":"ab2b76fc-2e8e-40af-8b8c-2844b71f47ab","html_url":"https://github.com/ysskrishna/llm-text-evaluation-framework","commit_stats":null,"previous_names":["ysskrishna/llm-text-evaluation-framework"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ysskrishna/llm-text-evaluation-framework","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysskrishna%2Fllm-text-evaluation-framework","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysskrishna%2Fllm-text-evaluation-framework/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysskrishna%2Fllm-text-evaluation-framework/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysskrishna%2Fllm-text-evaluation-framework/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ysskrishna","download_url":"https://codeload.github.com/ysskrishna/llm-text-evaluation-framework/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysskrishna%2Fllm-text-evaluation-framework/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271151072,"owners_count":24707735,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-evaluation","ai-research","benchmarking","dashboard","docker","evaluation","llm","natural-language-processing","nlp","python","streamlit","ysskrishna"],"created_at":"2025-08-19T12:13:36.640Z","updated_at":"2026-04-29T16:38:58.871Z","avatar_url":"https://github.com/ysskrishna.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM Text Evaluation Framework\n\n[![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![Streamlit](https://img.shields.io/badge/Streamlit-1.48+-red.svg)](https://streamlit.io/)\n[![SQLModel](https://img.shields.io/badge/SQLModel-0.0.24-lightblue.svg)](https://sqlmodel.tiangolo.com/)\n[![NLTK](https://img.shields.io/badge/NLTK-3.9.1-yellow.svg)](https://www.nltk.org/)\n[![Sentence Transformers](https://img.shields.io/badge/SentenceTransformers-5.1.0-purple.svg)](https://www.sbert.net/)\n[![Docker](https://img.shields.io/badge/Docker-Ready-blue.svg)](https://docker.com/)\n[![UV](https://img.shields.io/badge/UV-Package%20Manager-green.svg)](https://docs.astral.sh/uv/)\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n\nA powerful, production-ready Streamlit web application for comprehensive LLM response evaluation and benchmarking. Features multi-dimensional scoring across 7 key criteria, interactive analytics dashboard, persistent evaluation history, and Docker deployment. Perfect for AI researchers, developers, and organizations seeking to systematically assess and improve their language model outputs with detailed metrics and visual insights.\n\n![LLM Text Evaluation Framework - Home](media/home.png)\n![LLM Text Evaluation Framework - History](media/history.png)\n\n## 📌 Features\n\n### **Evaluation Metrics**\n\nScores responses across seven weighted criteria:\n\n| Criteria         | Description                                                                         |\n| ---------------- | ----------------------------------------------------------------------------------- |\n| Relevance        | Measures how semantically similar the LLM response is to the expected response      |\n| Accuracy         | Assesses the factual correctness and precision of the content                       |\n| Completeness     | Evaluates the logical flow, readability, and sentence structure                     |\n| Coherence        | Checks how well the response covers all expected content points                     |\n| Creativity       | Measures originality and unique expression while maintaining relevance              |\n| Tone             | Assesses appropriateness, consistency, and professional language use                |\n| Intent Alignment | Evaluates how well the response matches the user's intended purpose                 |\n\n## 🛠 Tech Stack\n\n* **Python 3.10+**\n* **Streamlit** (UI)\n* **SQLite** (local storage)\n* **Sentence Transformers** (semantic similarity)\n* **NLTK \u0026 TextStat** (text quality analysis)\n* **Plotly** (interactive charts)\n\n## 🚀 Quick Start\n\n### **Prerequisites**\n- Python 3.10 or higher\n- Docker (optional, for containerized deployment)\n- UV package manager (recommended) or pip\n\n### **1. Clone the Repository**\n\n```bash\ngit clone https://github.com/ysskrishna/llm-text-evaluation-framework.git\ncd llm-text-evaluation-framework\n```\n\n### **2.1 Run using Docker Compose (Recommended for Production)**\n```bash\ndocker-compose up --build\n```\n\n\n### **2.2 Run using UV (Recommended for Development)**\n```bash\n# Install UV if you haven't already\npip install uv\n\n# Install dependencies\nuv sync\n\n# Run the application\nuv run streamlit run main.py\n```\n\n\n### **3. Open in Browser**\n\nVisit **[http://localhost:8501](http://localhost:8501)**\n\n\n## 📂 Project Structure\n\n```\nllm-text-evaluation-framework/\n├── ai/                          # AI evaluation logic\n│   ├── evaluator.py            # Main evaluation functions\n│   └── evaluator_utils.py      # Utility functions for scoring algorithms\n├── components/                  # Streamlit UI components\n│   ├── evaluation_result.py    # Results display with charts and analytics\n│   └── sidebar.py              # Sidebar navigation\n├── core/                       # Core application logic\n│   ├── config.py              # Configuration settings and weights\n│   └── database.py            # Database initialization and setup\n├── models/                     # Data models\n│   ├── enums.py               # Evaluation criteria enums\n│   └── models.py              # SQLModel data models\n├── pages/                      # Streamlit pages\n│   └── 1_history.py          # Evaluation history and analytics dashboard\n├── repositories/               # Data access layer\n│   └── evaluation.py          # Evaluation CRUD operations\n├── main.py                     # Main application entry point\n├── pyproject.toml             # Project configuration and dependencies\n├── Dockerfile                  # Docker container configuration\n├── docker-compose.yml          # Docker Compose setup\n└── README.md                  # This file\n```\n\n\n## ⚙️ Configuration\n\n### **Customizing Evaluation Weights**\nModify `core/config.py` to adjust scoring criteria weights:\n\n```python\nEVALUATION_CRITERIA_WEIGHTS = {\n    EvaluationCriteria.RELEVANCE: 0.25,        # Increase relevance weight\n    EvaluationCriteria.ACCURACY: 0.25,         # Increase accuracy weight\n    EvaluationCriteria.COHERENCE: 0.15,        # Adjust as needed\n    EvaluationCriteria.COMPLETENESS: 0.15,\n    EvaluationCriteria.CREATIVITY: 0.05,\n    EvaluationCriteria.TONE: 0.10,\n    EvaluationCriteria.ALIGNMENT_WITH_INTENT: 0.05\n}\n```\n\n### **Database Configuration**\nSQLite database (`llm_evaluations.db`)\n\n\n## 🚀 Future Enhancements\n- [ ] **Export your results** - to Excel, CSV, or PDF reports\n- [ ] **Batch testing** - evaluate hundreds of responses at once\n- [ ] **Better analytics** - more charts and insights\n- [ ] **Database Scalability** - Support PostgreSQL for large deployments\n- [ ] **API access** - integrate with your own tools/workflow\n- [ ] **Settings page** - for evaluation weights and settings via ui\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n\n## 📜 License\n\nThis project is released under the **MIT License**.\nSee [LICENSE](LICENSE) for details.\n\n**Author:** [Siva Sai Krishna](https://github.com/ysskrishna)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fysskrishna%2Fllm-text-evaluation-framework","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fysskrishna%2Fllm-text-evaluation-framework","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fysskrishna%2Fllm-text-evaluation-framework/lists"}