An open API service indexing awesome lists of open source software.

https://github.com/anmolian/prompt_eval_llm_judge

Prompt Design & LLM Judge
https://github.com/anmolian/prompt_eval_llm_judge

contrastive-cot-prompting cot-prompting few-shot-prompting llm-judge llms one-shot-prompting prompt-engineering role-playing-prompting self-consistency-prompting trec-rag-2024 zero-shot-prompting

Last synced: 3 months ago
JSON representation

Prompt Design & LLM Judge

Awesome Lists containing this project

README

        

![Project Banner](https://github.com/Anmolian/Prompt_Eval_LLM_Judge/blob/main/AI.jpg?raw=true)
# Evaluating Prompt Strategies with an LLM Judge

## Technologies Used: LLMs, Prompt Designing, OpenAI API

- Designed and implemented seven prompt strategies (Zero Shot, Few Shot, Chain of Thought, etc.) to systematically test LLM-generated responses on 150 queries from the TREC ’24 RAG Track dataset (MS Marco V2.1).
- Developed an LLM Judge, an automated evaluation framework that scored over 1,050 responses based on Relevance, Correctness, Coherence, Conciseness, and Consistency, using the GPT-4o-mini API.
- Engineered a Python-based pipeline to automate response generation, scoring, and visualisation, revealing thatstructured prompting techniques like Chain of Thought achieved the highest average normalised score (9.36/10), improving LLM performance in complex reasoning tasks.

---

*Image credit: [Designed by Freepik](http://www.freepik.com/)*