https://github.com/rajshah4/LLM-Evaluation

Sample notebooks and prompts for LLM evaluation
https://github.com/rajshah4/LLM-Evaluation

Last synced: 3 months ago
JSON representation

Sample notebooks and prompts for LLM evaluation

Host: GitHub
URL: https://github.com/rajshah4/LLM-Evaluation
Owner: rajshah4
Created: 2023-10-20T00:22:42.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-03-08T18:31:25.000Z (over 1 year ago)
Last Synced: 2024-03-08T20:13:19.293Z (over 1 year ago)
Language: Jupyter Notebook
Size: 11.2 MB
Stars: 60
Watchers: 5
Forks: 20
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-hacking-lists - rajshah4/LLM-Evaluation - Sample notebooks and prompts for LLM evaluation (Jupyter Notebook)

README

        # Resources for Evaluation of LLMs / Generative AI

This repository includes the slides and some of the notebooks that are used in my Evaluation workshops. 

Some of the notebooks do require an OpenAI API key.

These notebooks are intended for explaining key points of the talk, please don't try to bring them to production use. If you want to dig deeper or have issues, go to the source for each of these projects.

**I will do a updated workshop in April 2025, so look for updates here**

## About the workshop

![image](workshop_one_pager.png)

## Notebook links

Prompting a Chatbot: [Colab notebook](https://colab.research.google.com/github/minimaxir/chatgpt_api_test/blob/main/glados_chatbot.ipynb)

Testing Properties of a System: [Guidance AI](https://github.com/guidance-ai/guidance/blob/main/notebooks/testing_lms.ipynb)

Langtest tutorials from John Snow Labs: [Colab Notebooks](http://langtest.org/docs/pages/tutorials/tutorials)

LLM Evaluation Harness from EleutherAI: [Github](LLM_evaluation_harness_for_Arc_Easy_and_SST.ipynb) or [Colab notebook](https://colab.research.google.com/drive/1lPHO8wosT72jkhfBbcESsSD56IvpYk9u#scrollTo=asj6HXacKfc_)

Ragas showing Model as an evaluator: [Github](ragas_quickstart.ipynb) or [Colab notebook](https://colab.research.google.com/drive/1i78-peTBdhK5y4ZskFzC_NtLRaqvySXM)

Ragas using LangFuse: [Colab notebook](https://colab.research.google.com/github/langfuse/langfuse-docs/blob/main/cookbook/evaluation_of_rag_with_ragas.ipynb)

Evaluate LLMs and RAG a practical example using Langchain and Hugging Face: [Github](https://github.com/philschmid/evaluate-llms/blob/main/notebooks/01-getting-started.ipynb)

MLFlow Automated Evaluation: [Blog](https://www.databricks.com/blog/announcing-mlflow-28-llm-judge-metrics-and-best-practices-llm-evaluation-rag-applications-part)

LLM Grader on AWS: [Video](https://youtu.be/HUuO9eJbOTk?si=9tI6Na10QhMFkKHe) and [Notebook](https://github.com/fhuthmacher/LLMevaluation/blob/main/LLMInformationExtraction.ipynb)

Argilla for Annotation: [Spaces](https://huggingface.co/spaces/argilla/llm-eval) login: admin  password: 12345678

LLM AutoEval for RunPod by Maxime Labonne [Colab](https://colab.research.google.com/drive/1Igs3WZuXAIv9X0vwqiE90QlEPys8e8Oa)

## Conference Presentations

Generative AI Summit, Austin (Oct 2023) - [Slides](presentation_slides/EvaluatingLLMs_GenAI_Oct2023_Shah.pdf)

ODSC West, San Francisco (Nov 2023) - [Slides](presentation_slides/EvaluatingLLMs_ODSC_Nov2023_Shah.pdf)

Arize Holiday Conference (Dec 2023) - [Slides](presentation_slides/EvaluatingLLMs_Arize_December2023.pdf)  

Data Innovation Conference (Apr 2024) - [Slides](presentation_slides/DataInnovation_Apr_2024.pdf)

## Videos

Evaluation for Large Language Models and Generative AI - A Deep Dive - [YouTube](https://youtu.be/iQl03pQlYWY)

Constructing an Evaluation Approach for Generative AI Models - [YouTube](https://youtu.be/PtXOQDHPddE?si=PQ4N1B2mX2d_9PwC&t=147)

Large Language Models (LLMs) Can Explain Their Predictions - [YouTube](https://youtu.be/9RFz3cQ9NqE?si=IvhEgOFZugQTr5Ku) & [Slides](presentation_slides/ExplanationsLLMs_Jan2024.pdf)

## Other Additional Resources

Josh Tobin's Evaluation talk [YouTube](https://youtu.be/r-HUnht-Gns?si=5vU3RzXf7Jkprwn1)

[Awesome-LLMOps](https://github.com/tensorchord/Awesome-LLMOps?tab=readme-ov-file#llmops)

LLM Evaluation [Tooling Review](https://www.atla-ai.com/post/llm-evaluation-tooling-review)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rajshah4/LLM-Evaluation

Awesome Lists containing this project

README