https://github.com/anmolian/prompt_eval_llm_judge

Prompt Design & LLM Judge
https://github.com/anmolian/prompt_eval_llm_judge

contrastive-cot-prompting cot-prompting few-shot-prompting llm-judge llms one-shot-prompting prompt-engineering role-playing-prompting self-consistency-prompting trec-rag-2024 zero-shot-prompting

Last synced: 3 months ago
JSON representation

Prompt Design & LLM Judge

Host: GitHub
URL: https://github.com/anmolian/prompt_eval_llm_judge
Owner: Anmolian
License: mit
Created: 2025-02-10T21:46:55.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-02-10T21:59:17.000Z (3 months ago)
Last Synced: 2025-02-10T22:32:09.854Z (3 months ago)
Topics: contrastive-cot-prompting, cot-prompting, few-shot-prompting, llm-judge, llms, one-shot-prompting, prompt-engineering, role-playing-prompting, self-consistency-prompting, trec-rag-2024, zero-shot-prompting
Language: Python
Homepage:
Size: 0 Bytes
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

![Project Banner](https://github.com/Anmolian/Prompt_Eval_LLM_Judge/blob/main/AI.jpg?raw=true)
# Evaluating Prompt Strategies with an LLM Judge

## Technologies Used: LLMs, Prompt Designing, OpenAI API

- Designed and implemented seven prompt strategies (Zero Shot, Few Shot, Chain of Thought, etc.) to systematically test LLM-generated responses on 150 queries from the TREC ’24 RAG Track dataset (MS Marco V2.1).
- Developed an LLM Judge, an automated evaluation framework that scored over 1,050 responses based on Relevance, Correctness, Coherence, Conciseness, and Consistency, using the GPT-4o-mini API.
- Engineered a Python-based pipeline to automate response generation, scoring, and visualisation, revealing thatstructured prompting techniques like Chain of Thought achieved the highest average normalised score (9.36/10), improving LLM performance in complex reasoning tasks.

---

*Image credit: [Designed by Freepik](http://www.freepik.com/)*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/anmolian/prompt_eval_llm_judge

Awesome Lists containing this project

README