https://github.com/paradite/eval-data
Prompts and evaluation data for LLMs on real world coding and writing tasks
https://github.com/paradite/eval-data
ai benchmark eval evaluation llm prompt prompt-engineering
Last synced: 7 months ago
JSON representation
Prompts and evaluation data for LLMs on real world coding and writing tasks
- Host: GitHub
- URL: https://github.com/paradite/eval-data
- Owner: paradite
- Created: 2024-02-29T11:05:03.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-06-08T07:15:01.000Z (7 months ago)
- Last Synced: 2025-06-08T08:20:43.869Z (7 months ago)
- Topics: ai, benchmark, eval, evaluation, llm, prompt, prompt-engineering
- Language: TypeScript
- Homepage: https://eval.16x.engineer/
- Size: 434 KB
- Stars: 14
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Eval data
Evaluation data for LLMs and prompts on real world coding tasks and writing tasks.
Built by [16x Prompt](https://prompt.16x.engineer/) and [16x Eval](https://eval.16x.engineer/).
## Coding Projects
### Next.js
[emoji-todo](/projects/emoji-todo/) - A simple Next.js TODO app with emojis.
### SQL
[sql](/projects/sql/) - SQL code snippets.
### Python
[python-script](/projects/python-script/) - Python script code snippets.
### Benchmark Visualization
[visualization](/projects/visualization/) - Coding a visualization of benchmark results.
### TypeScript Narrowing
[typescript-narrowing](/projects/typescript-narrowing/) - Coding TypeScript narrowing tests.
## Writing Projects
### AI Timeline
[ai-timeline](/projects/ai-timeline/) - Writing an AI Timeline.
## Model Evaluation Results
[model-eval-results](/model-eval-results/) - Raw results exported from [16x Eval](https://eval.16x.engineer/) for models evaluations.
## 16x Eval
I am building a local desktop app to evaluate models and prompts.
See [16x Eval website](https://eval.16x.engineer/) for more information.