https://github.com/benitomartin/llm-observability-opik
LLM Evaluation and Observability System for Football Content
https://github.com/benitomartin/llm-observability-opik
bertscore comet-ml cosine-similarity evaluation-metrics hallucination huggingface-transformers mongodb openai pre-commit python zenml
Last synced: about 1 month ago
JSON representation
LLM Evaluation and Observability System for Football Content
- Host: GitHub
- URL: https://github.com/benitomartin/llm-observability-opik
- Owner: benitomartin
- License: mit
- Created: 2025-05-31T14:18:06.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-06-19T20:15:56.000Z (4 months ago)
- Last Synced: 2025-06-19T21:24:56.378Z (4 months ago)
- Topics: bertscore, comet-ml, cosine-similarity, evaluation-metrics, hallucination, huggingface-transformers, mongodb, openai, pre-commit, python, zenml
- Language: Python
- Homepage: https://decodingml.substack.com/p/your-ai-football-assist-eval-guide
- Size: 1.18 MB
- Stars: 16
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Football Teams AI Evaluation and Observability
![]()
A modular pipeline for evaluating and observing LLM performance on football teams' content---
A **complete, observable LLM pipeline** for evaluating and observing the performance of large language models (LLMs) applied to football teams' content. This project uses Wikipedia data about football teams to **generate summaries, create QA datasets**, and **evaluate model responses** using state-of-the-art tools:
- **ZenML** for pipeline orchestration and experiment tracking
- **MongoDB** for structured storage and vector-based retrieval
- **Opik** for LLM evaluation and observabilityDesigned for **research**, **benchmarking**, and **experimentation**, the system is fully configurable, supports **semantic search**, and enables fine-grained analysis of LLM behavior across a range of evaluation metrics.
## Overview
- **ETL Pipeline**: Crawl, parse, and ingest Wikipedia articles into MongoDB.
- **Summarization Pipeline**: Generate summaries for each article using LLMs.
- **Evaluation Pipelines**: Score summaries and QA datasets using BERTScore, cosine similarity, answer relevancy, and hallucinations.
- **Experiment Tracking**: Integrated with **ZenML** and **Opik** for experiment and metric visualization.
- **Configurable**: Customize settings via YAML and environment variables.## Project Structure
```text
├── .github # CI pipeline
├── src
│ ├── configs/ # Configs, prompts, and settings
│ ├── data/ # Evaluation data and crawled team data
│ ├── evaluation/ # Summary & QA evaluation scripts
│ ├── infra/ # MongoDB vector index utilities
│ ├── pipelines/ # ZenML pipeline entrypoints
│ ├── search/ # Search observability utility
│ ├── steps/ # ZenML steps: ETL, dataset, summaries
├── tests/ # Unit tests
├── .pre-commit-config.yaml # Pre-commit hooks
├── LICENSE # License
├── Makefile # Makefile commands
├── README.md # Project description
├── pyproject.toml # Project dependencies
```## Getting Started
### Prerequisites
- [Python 3.12+](https://www.python.org/downloads/release/python-3120/)
- [uv](https://github.com/astral-sh/uv)
- [ZenML](https://zenml.io/)
- [OpenAI](https://openai.com/)
- [Opik](https://www.comet.com/site/products/opik/)
- [MongoDB](https://www.mongodb.com/)### Installation
1. **Clone the repository**
```bash
git clone https://github.com/benitomartin/llm-observability-opik.git
cd llm-observability-opik
```1. **Install dependencies**
```bash
uv sync --all-groups
source ./.venv/bin/activate
```1. **Create MongoDB Account**
Create an account at [MongoDB](https://www.mongodb.com/) and a free cluster:
- Get your `MONGODB_URI` and update your `.env` file accordingly
- The project uses two collections:
- One for storing Wikipedia articles and summaries
- Another with vector indexes and embeddings for similarity search
1. **Create an Opik/Comet ML Account**
Create an account at [Opik](https://www.comet.com/site/products/opik/), which is the evaluation platform from Comet ML:
- Get your `COMET_API_KEY`, add it to your `.env` file, and configure Opik with the following command and the [official configuration guide](https://www.comet.com/docs/opik/tracing/sdk_configuration):
```bash
opik configure
```1. **Configure environment**
Copy `.env.example` to `.env` and update with your credentials (`COMET_API_KEY`, `MONGODB_URI` and `OPENAI_API_KEY`):
```bash
cp .env.example .env
```## Usage
To check all available make commands run:
```bash
make help
```### Pipelines
- **Start ZenML Locally**
```bash
make zenml-login
```- **ETL pipeline**
```bash
make run-etl-pipeline
```- **Summarization pipeline**
```bash
make run-summarization-pipeline
```### Evaluation
- **Evaluate summaries with Opik**
Evaluate the summaries using BERT Score and Cosine Similarity:
```bash
make run-evaluate-summaries
```- **Evaluate QA dataset**
Evaluate a synthetic Q&A dataset on Hallucinations and Answer Relevancy:
```bash
make run-evaluate-dataset
```### Search Tracing
Run a single query for testing:
```bash
run-tracing
```### Testing
- **Evaluate summaries with Opik**
```bash
make run-tests
```### Dev Tools
- **Lint, format, type-check and clean up**
```bash
make all
```## Configuration
You can configure the following:
- MongoDB connection
- OpenAI API and model names
- Evaluation dataset pathsEdit:
- `src/configs/settings.py`
- `src/configs/config.yaml`## Experiment Tracking
- **ZenML Dashboard**: [http://127.0.0.1:8237](http://127.0.0.1:8237)
- **Opik**: [Opik](https://www.comet.com/site/products/opik/)
- **MongoDB**: [MongoDB](https://www.mongodb.com/)## 📄 License
[MIT License](LICENSE)