https://github.com/rajshah4/LLM-Evaluation
Sample notebooks and prompts for LLM evaluation
https://github.com/rajshah4/LLM-Evaluation
Last synced: about 1 month ago
JSON representation
Sample notebooks and prompts for LLM evaluation
- Host: GitHub
- URL: https://github.com/rajshah4/LLM-Evaluation
- Owner: rajshah4
- Created: 2023-10-20T00:22:42.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-03-08T18:31:25.000Z (about 1 year ago)
- Last Synced: 2024-03-08T20:13:19.293Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 11.2 MB
- Stars: 60
- Watchers: 5
- Forks: 20
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-hacking-lists - rajshah4/LLM-Evaluation - Sample notebooks and prompts for LLM evaluation (Jupyter Notebook)
README
# Resources for Evaluation of LLMs / Generative AI
This repository includes the slides and some of the notebooks that are used in my Evaluation workshops.Some of the notebooks do require an OpenAI API key.
These notebooks are intended for explaining key points of the talk, please don't try to bring them to production use. If you want to dig deeper or have issues, go to the source for each of these projects.
**I will do a updated workshop in April 2025, so look for updates here**
## About the workshop

## Notebook links
Prompting a Chatbot: [Colab notebook](https://colab.research.google.com/github/minimaxir/chatgpt_api_test/blob/main/glados_chatbot.ipynb)
Testing Properties of a System: [Guidance AI](https://github.com/guidance-ai/guidance/blob/main/notebooks/testing_lms.ipynb)
Langtest tutorials from John Snow Labs: [Colab Notebooks](http://langtest.org/docs/pages/tutorials/tutorials)
LLM Evaluation Harness from EleutherAI: [Github](LLM_evaluation_harness_for_Arc_Easy_and_SST.ipynb) or [Colab notebook](https://colab.research.google.com/drive/1lPHO8wosT72jkhfBbcESsSD56IvpYk9u#scrollTo=asj6HXacKfc_)
Ragas showing Model as an evaluator: [Github](ragas_quickstart.ipynb) or [Colab notebook](https://colab.research.google.com/drive/1i78-peTBdhK5y4ZskFzC_NtLRaqvySXM)
Ragas using LangFuse: [Colab notebook](https://colab.research.google.com/github/langfuse/langfuse-docs/blob/main/cookbook/evaluation_of_rag_with_ragas.ipynb)
Evaluate LLMs and RAG a practical example using Langchain and Hugging Face: [Github](https://github.com/philschmid/evaluate-llms/blob/main/notebooks/01-getting-started.ipynb)
MLFlow Automated Evaluation: [Blog](https://www.databricks.com/blog/announcing-mlflow-28-llm-judge-metrics-and-best-practices-llm-evaluation-rag-applications-part)
LLM Grader on AWS: [Video](https://youtu.be/HUuO9eJbOTk?si=9tI6Na10QhMFkKHe) and [Notebook](https://github.com/fhuthmacher/LLMevaluation/blob/main/LLMInformationExtraction.ipynb)
Argilla for Annotation: [Spaces](https://huggingface.co/spaces/argilla/llm-eval) login: admin password: 12345678
LLM AutoEval for RunPod by Maxime Labonne [Colab](https://colab.research.google.com/drive/1Igs3WZuXAIv9X0vwqiE90QlEPys8e8Oa)
## Conference Presentations
Generative AI Summit, Austin (Oct 2023) - [Slides](presentation_slides/EvaluatingLLMs_GenAI_Oct2023_Shah.pdf)ODSC West, San Francisco (Nov 2023) - [Slides](presentation_slides/EvaluatingLLMs_ODSC_Nov2023_Shah.pdf)
Arize Holiday Conference (Dec 2023) - [Slides](presentation_slides/EvaluatingLLMs_Arize_December2023.pdf)
Data Innovation Conference (Apr 2024) - [Slides](presentation_slides/DataInnovation_Apr_2024.pdf)
## Videos
Evaluation for Large Language Models and Generative AI - A Deep Dive - [YouTube](https://youtu.be/iQl03pQlYWY)Constructing an Evaluation Approach for Generative AI Models - [YouTube](https://youtu.be/PtXOQDHPddE?si=PQ4N1B2mX2d_9PwC&t=147)
Large Language Models (LLMs) Can Explain Their Predictions - [YouTube](https://youtu.be/9RFz3cQ9NqE?si=IvhEgOFZugQTr5Ku) & [Slides](presentation_slides/ExplanationsLLMs_Jan2024.pdf)
## Other Additional Resources
Josh Tobin's Evaluation talk [YouTube](https://youtu.be/r-HUnht-Gns?si=5vU3RzXf7Jkprwn1)[Awesome-LLMOps](https://github.com/tensorchord/Awesome-LLMOps?tab=readme-ov-file#llmops)
LLM Evaluation [Tooling Review](https://www.atla-ai.com/post/llm-evaluation-tooling-review)