An open API service indexing awesome lists of open source software.

https://github.com/kennethleungty/deepseek-r1-ollama-simple-evals

Run and Evaluate DeepSeek-R1 Distilled Models Locally with Ollama and OpenAI's simple-evals
https://github.com/kennethleungty/deepseek-r1-ollama-simple-evals

deepseek deepseek-r1 large-language-models llama llama3 llm ollama openai qwen qwen2-5 simple-evals

Last synced: 6 months ago
JSON representation

Run and Evaluate DeepSeek-R1 Distilled Models Locally with Ollama and OpenAI's simple-evals

Awesome Lists containing this project

README

          

# Evaluating DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI's simple-evals

## Context
- The recent launch of the DeepSeek-R1 model sent ripples across the global AI community. It delivered breakthroughs on par with the reasoning models from Meta and OpenAI, achieving this in a fraction of the time and at a significantly lower cost.
- Beyond the headlines and online buzz, how can we assess the model's reasoning abilities using recognized benchmarks? 
- DeepSeek's user interface makes it easy to explore its capabilities, but using it programmatically offers deeper insights and more seamless integration into real-world applications.
- Understanding how to run such models locally also provides enhanced control and offline access.
- In this project, we will explore how to use Ollama and OpenAI's simple-evals to evaluate the reasoning capabilities of DeepSeek-R1's distilled models based on the famous GPQA-Diamond benchmark.

**More details coming soon!**