https://github.com/kennethleungty/deepseek-r1-ollama-simple-evals
Run and Evaluate DeepSeek-R1 Distilled Models Locally with Ollama and OpenAI's simple-evals
https://github.com/kennethleungty/deepseek-r1-ollama-simple-evals
deepseek deepseek-r1 large-language-models llama llama3 llm ollama openai qwen qwen2-5 simple-evals
Last synced: 6 months ago
JSON representation
Run and Evaluate DeepSeek-R1 Distilled Models Locally with Ollama and OpenAI's simple-evals
- Host: GitHub
- URL: https://github.com/kennethleungty/deepseek-r1-ollama-simple-evals
- Owner: kennethleungty
- License: mit
- Created: 2025-03-05T14:09:33.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-04-21T12:55:34.000Z (6 months ago)
- Last Synced: 2025-04-23T16:16:26.153Z (6 months ago)
- Topics: deepseek, deepseek-r1, large-language-models, llama, llama3, llm, ollama, openai, qwen, qwen2-5, simple-evals
- Language: Jupyter Notebook
- Homepage:
- Size: 1.65 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Evaluating DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI's simple-evals
## Context
- The recent launch of the DeepSeek-R1 model sent ripples across the global AI community. It delivered breakthroughs on par with the reasoning models from Meta and OpenAI, achieving this in a fraction of the time and at a significantly lower cost.
- Beyond the headlines and online buzz, how can we assess the model's reasoning abilities using recognized benchmarks?
- DeepSeek's user interface makes it easy to explore its capabilities, but using it programmatically offers deeper insights and more seamless integration into real-world applications.
- Understanding how to run such models locally also provides enhanced control and offline access.
- In this project, we will explore how to use Ollama and OpenAI's simple-evals to evaluate the reasoning capabilities of DeepSeek-R1's distilled models based on the famous GPQA-Diamond benchmark.**More details coming soon!**