https://github.com/seekai-786/rag_model_ct

A prototype RAG-based AI project that allows cricket fans to explore and ask questions about previous editions of the Champions Trophy using PDFs as data. Built using ChromaDB and Hugging Face models, this solution enables real-time querying without retraining the model — combining fast retrieval with powerful generative responses.
https://github.com/seekai-786/rag_model_ct

aiforsports artificial-intelligence championstrophy cricket cricket-fan-engagement cricketstats customknowledgebase database huggingface langchain llm mistral neural-network nlp rag rag-implementation retrieval-augmented-generation semantic-search sportsanalytics vector

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/seekai-786/rag_model_ct
Owner: SeekAI-786
License: apache-2.0
Created: 2025-02-03T19:03:22.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-04-12T06:55:28.000Z (6 months ago)
Last Synced: 2025-04-12T07:36:32.272Z (6 months ago)
Topics: aiforsports, artificial-intelligence, championstrophy, cricket, cricket-fan-engagement, cricketstats, customknowledgebase, database, huggingface, langchain, llm, mistral, neural-network, nlp, rag, rag-implementation, retrieval-augmented-generation, semantic-search, sportsanalytics, vector
Language: Jupyter Notebook
Homepage:
Size: 40 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## 🏏 CT RAG MODEL — AI-Powered Cricket Fan Engagement using RAG

This project is a **prototype Retrieval-Augmented Generation (RAG) solution**, designed to **empower AI models to learn from new data without the need for hours of retraining**. It was built **before the Champions Trophy**, with the goal of allowing fans to **explore and learn about previous editions** of the tournament.

## Purpose

Traditional language models cannot adapt quickly to new information. This project solves that by integrating the **RAG (Retrieval-Augmented Generation)** approach, where:

1. **New documents are ingested (e.g., PDFs)** and broken into smaller chunks.
2. The chunks are stored using **ChromaDB**, a powerful vector store.
3. When a user asks a question, the system retrieves **relevant chunks only**, significantly reducing noise.
4. A **free LLM from Hugging Face** is used to generate responses based on the retrieved content.
5. The final response is then **cleaned and refined through a powerful LLM** for better user experience.

## How RAG Based Model Works

RAG combines the **retrieval abilities of search engines** with the **generative power of language models**. Instead of asking the model to remember everything, we:
- Store data in vector format (embeddings)
- Use semantic similarity to **retrieve only relevant info**
- Feed this to a model for more accurate and up-to-date answers

This ensures:
- **Faster response times**
- **No model retraining required**
- **Up-to-date answers from the latest documents**

## Features

- Use any **PDF** as your data source
- Automatically chunk the document into smaller text segments
- Save and manage vector data using **ChromaDB**
- Ask questions and get accurate answers using **free Hugging Face models**
- Final answers are **cleaned via an LLM** for clarity
- Easily define **custom save paths** for managing data
- Fully customizable — use your own models or datasets

## Model Used

- **Embedding Model**: `all-MiniLM-L6-v2` (Hugging Face)
- **LLM for response generation**: `mistralai/Mistral-7B-Instruct-v0.1` (Hugging Face, free version used)

You can swap these with other models from Hugging Face based on your requirements.

## How to Use

1. **Install Dependencies**

```bash
pip install langchain chromadb sentence-transformers huggingface_hub pypdf
```

2. **Add your PDF**

Place your file in the `./data` folder or any path you define.

3. **Run the notebook**

Open `CT_RAG_MODEL.ipynb` and run step by step:
- PDF loading and chunking
- Embedding and saving to ChromaDB
- Query and get answer
- Clean final response using LLM

4. **Ask questions like:**
> “Who was the player of the match in the final of CT 2017?”

## Project Context

This was built **ahead of the Champions Trophy** to offer fans a **historical insight into past editions** using AI-powered tools. It’s not designed for real-time use with the current tournament data but can easily be extended with new documents.

## Let's Collaborate!

I'm looking forward to collaborating with developers, cricket fans, or AI enthusiasts who want to:
- Improve this RAG prototype
- Add multilingual support
- Build a frontend for casual fans
- Deploy this as an interactive web app

Drop your suggestions or feel free to fork and play with the code!

## Disclaimer

This is a prototype. It was built quickly for experimentation and idea validation during the **Champions Trophy hype**. Contributions are welcome to make it production-ready.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/seekai-786/rag_model_ct

Awesome Lists containing this project

README