https://github.com/oumi-ai/oumi
Everything you need to build state-of-the-art foundation models, end-to-end.
https://github.com/oumi-ai/oumi
Last synced: 2 months ago
JSON representation
Everything you need to build state-of-the-art foundation models, end-to-end.
- Host: GitHub
- URL: https://github.com/oumi-ai/oumi
- Owner: oumi-ai
- License: apache-2.0
- Created: 2024-05-07T17:45:15.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-01-29T00:23:55.000Z (3 months ago)
- Last Synced: 2025-01-29T00:24:35.994Z (3 months ago)
- Language: Python
- Homepage: https://oumi.ai
- Size: 6.43 MB
- Stars: 19
- Watchers: 8
- Forks: 0
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
- AiTreasureBox - oumi-ai/oumi - 04-07_7830_9](https://img.shields.io/github/stars/oumi-ai/oumi.svg)|Everything you need to build state-of-the-art foundation models, end-to-end.| (Repos)
- awesome - oumi-ai/oumi - Everything you need to build state-of-the-art foundation models, end-to-end. (Python)
- StarryDivineSky - oumi-ai/oumi
- trackawesomelist - oumi-ai/oumi (â5.8k) - source platform that streamlines the entire lifecycle of foundation models - from data preparation and training to evaluation and deployment (Recently Updated / [Feb 06, 2025](/content/2025/02/06/README.md))
- awesome-LLM-resourses - Oumi - of-the-art foundation models, end-to-end. (åūŪč° Fine-Tuning)
README

[](https://oumi.ai/docs/en/latest/index.html)
[](https://oumi.ai/blog)
[](https://discord.gg/oumi)
[](https://badge.fury.io/py/oumi)
[](https://opensource.org/licenses/Apache-2.0)
[](https://github.com/oumi-ai/oumi/actions/workflows/pretest.yaml)
[](https://github.com/oumi-ai/oumi/actions/workflows/gpu_tests.yaml)
[](https://github.com/oumi-ai/oumi)
[](https://github.com/psf/black)
[](https://github.com/pre-commit/pre-commit)
[](https://oumi.ai)### Everything you need to build state-of-the-art foundation models, end-to-end.
Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from data preparation and training to evaluation and deployment. Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.
With Oumi, you can:
- ð Train and fine-tune models from 10M to 405B parameters using state-of-the-art techniques (SFT, LoRA, QLoRA, DPO, and more)
- ðĪ Work with both text and multimodal models (Llama, Qwen, Phi, and others)
- ð Synthesize and curate training data with LLM judges
- âĄïļ Deploy models efficiently with popular inference engines (vLLM, SGLang)
- ð Evaluate models comprehensively across standard benchmarks
- ð Run anywhere - from laptops to clusters to clouds (AWS, Azure, GCP, Lambda, and more)
- ð Integrate with both open models and commercial APIs (OpenAI, Anthropic, Vertex AI, Together, Parasail, ...)All with one consistent API, production-grade reliability, and all the flexibility you need for research.
Learn more at [oumi.ai](https://oumi.ai/docs), or jump right in with the [quickstart guide](https://oumi.ai/docs/en/latest/get_started/quickstart.html).
## ð Getting Started
| **Notebook** | **Try in Colab** | **Goal** |
|----------|--------------|-------------|
| **ðŊ Getting Started: A Tour** || Quick tour of core features: training, evaluation, inference, and job management |
| **ð§ Model Finetuning Guide** || End-to-end guide to LoRA tuning with data prep, training, and evaluation |
| **ð Model Distillation** || Guide to distilling large models into smaller, efficient ones |
| **ð Model Evaluation** || Comprehensive model evaluation using Oumi's evaluation framework |
| **âïļ Remote Training** || Launch and monitor training jobs on cloud (AWS, Azure, GCP, Lambda, etc.) platforms |
| **ð LLM-as-a-Judge** || Filter and curate training data with built-in judges |
| **ð vLLM Inference Engine** || Fast inference at scale with the vLLM engine |
## ð§ Usage
### Installation
Installing oumi in your environment is straightforward:
```shell
# Install the package (CPU & NPU only)
pip install oumi # For local development & testing# OR, with GPU support (Requires Nvidia or AMD GPU)
pip install oumi[gpu] # For GPU training# To get the latest version, install from the source
pip install git+https://github.com/oumi-ai/oumi.git
```For more advanced installation options, see the [installation guide](https://oumi.ai/docs/en/latest/get_started/installation.html).
### Oumi CLI
You can quickly use the `oumi` command to train, evaluate, and infer models using one of the existing [recipes](/configs/recipes):
```shell
# Training
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml# Evaluation
oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml# Inference
oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml --interactive
```For more advanced options, see the [training](https://oumi.ai/docs/en/latest/user_guides/train/train.html), [evaluation](https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html), [inference](https://oumi.ai/docs/en/latest/user_guides/infer/infer.html), and [llm-as-a-judge](https://oumi.ai/docs/en/latest/user_guides/judge/judge.html) guides.
### Running Jobs Remotely
You can run jobs remotely on cloud platforms (AWS, Azure, GCP, Lambda, etc.) using the `oumi launch` command:
```shell
# GCP
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml# AWS
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_aws_job.yaml# Azure
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_azure_job.yaml# Lambda
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_lambda_job.yaml
```**Note:** Oumi is in beta and under active development. The core features are stable, but some advanced features might change as the platform improves.
## ðŧ Why use Oumi?
If you need a comprehensive platform for training, evaluating, or deploying models, Oumi is a great choice.
Here are some of the key features that make Oumi stand out:
- ð§ **Zero Boilerplate**: Get started in minutes with ready-to-use recipes for popular models and workflows. No need to write training loops or data pipelines.
- ðĒ **Enterprise-Grade**: Built and validated by teams training models at scale
- ðŊ **Research Ready**: Perfect for ML research with easily reproducible experiments, and flexible interfaces for customizing each component.
- ð **Broad Model Support**: Works with most popular model architectures - from tiny models to the largest ones, text-only to multimodal.
- ð **SOTA Performance**: Native support for distributed training techniques (FSDP, DDP) and optimized inference engines (vLLM, SGLang).
- ðĪ **Community First**: 100% open source with an active community. No vendor lock-in, no strings attached.## ð Examples & Recipes
Explore the growing collection of ready-to-use configurations for state-of-the-art models and training workflows:
**Note:** These configurations are not an exhaustive list of what's supported, simply examples to get you started. You can find a more exhaustive list of supported [models](https://oumi.ai/docs/en/latest/resources/models/supported_models.html), and datasets ([supervised fine-tuning](https://oumi.ai/docs/en/latest/resources/datasets/sft_datasets.html), [pre-training](https://oumi.ai/docs/en/latest/resources/datasets/pretraining_datasets.html), [preference tuning](https://oumi.ai/docs/en/latest/resources/datasets/preference_datasets.html), and [vision-language finetuning](https://oumi.ai/docs/en/latest/resources/datasets/vl_sft_datasets.html)) in the oumi documentation.
### ð DeepSeek R1 Family
| Model | Example Configurations |
|-------|------------------------|
| DeepSeek R1 671B | [Inference (Together AI)](configs/recipes/deepseek_r1/inference/671b_together_infer.yaml) |
| Distilled Llama 8B | [FFT](/configs/recipes/deepseek_r1/sft/distill_llama_8b/full_train.yaml) âĒ [LoRA](/configs/recipes/deepseek_r1/sft/distill_llama_8b/lora_train.yaml) âĒ [QLoRA](/configs/recipes/deepseek_r1/sft/distill_llama_8b/qlora_train.yaml) âĒ [Inference](configs/recipes/deepseek_r1/inference/distill_llama_8b_infer.yaml) âĒ [Evaluation](/configs/recipes/deepseek_r1/evaluation/distill_llama_8b/eval.yaml) |
| Distilled Llama 70B | [FFT](/configs/recipes/deepseek_r1/sft/distill_llama_70b/full_train.yaml) âĒ [LoRA](/configs/recipes/deepseek_r1/sft/distill_llama_70b/lora_train.yaml) âĒ [QLoRA](/configs/recipes/deepseek_r1/sft/distill_llama_70b/qlora_train.yaml) âĒ [Inference](configs/recipes/deepseek_r1/inference/distill_llama_70b_infer.yaml) âĒ [Evaluation](/configs/recipes/deepseek_r1/evaluation/distill_llama_70b/eval.yaml) |
| Distilled Qwen 1.5B | [FFT](/configs/recipes/deepseek_r1/sft/distill_qwen_1_5b/full_train.yaml) âĒ [LoRA](/configs/recipes/deepseek_r1/sft/distill_qwen_1_5b/lora_train.yaml) âĒ [Inference](configs/recipes/deepseek_r1/inference/distill_qwen_1_5b_infer.yaml) âĒ [Evaluation](/configs/recipes/deepseek_r1/evaluation/distill_qwen_1_5b/eval.yaml) |
| Distilled Qwen 32B | [LoRA](/configs/recipes/deepseek_r1/sft/distill_qwen_32b/lora_train.yaml) âĒ [Inference](configs/recipes/deepseek_r1/inference/distill_qwen_32b_infer.yaml) âĒ [Evaluation](/configs/recipes/deepseek_r1/evaluation/distill_qwen_32b/eval.yaml) |### ðĶ Llama Family
| Model | Example Configurations |
|-------|------------------------|
| Llama 3.1 8B | [FFT](/configs/recipes/llama3_1/sft/8b_full/train.yaml) âĒ [LoRA](/configs/recipes/llama3_1/sft/8b_lora/train.yaml) âĒ [QLoRA](/configs/recipes/llama3_1/sft/8b_qlora/train.yaml) âĒ [Pre-training](/configs/recipes/llama3_1/pretraining/8b/train.yaml) âĒ [Inference (vLLM)](configs/recipes/llama3_1/inference/8b_rvllm_infer.yaml) âĒ [Inference](/configs/recipes/llama3_1/inference/8b_infer.yaml) âĒ [Evaluation](/configs/recipes/llama3_1/evaluation/8b_eval.yaml) |
| Llama 3.1 70B | [FFT](/configs/recipes/llama3_1/sft/70b_full/train.yaml) âĒ [LoRA](/configs/recipes/llama3_1/sft/70b_lora/train.yaml) âĒ [QLoRA](/configs/recipes/llama3_1/sft/70b_qlora/train.yaml) âĒ [Inference](/configs/recipes/llama3_1/inference/70b_infer.yaml) âĒ [Evaluation](/configs/recipes/llama3_1/evaluation/70b_eval.yaml) |
| Llama 3.1 405B | [FFT](/configs/recipes/llama3_1/sft/405b_full/train.yaml) âĒ [LoRA](/configs/recipes/llama3_1/sft/405b_lora/train.yaml) âĒ [QLoRA](/configs/recipes/llama3_1/sft/405b_qlora/train.yaml) |
| Llama 3.2 1B | [FFT](/configs/recipes/llama3_2/sft/1b_full/train.yaml) âĒ [LoRA](/configs/recipes/llama3_2/sft/1b_lora/train.yaml) âĒ [QLoRA](/configs/recipes/llama3_2/sft/1b_qlora/train.yaml) âĒ [Inference (vLLM)](/configs/recipes/llama3_2/inference/1b_vllm_infer.yaml) âĒ [Inference (SGLang)](/configs/recipes/llama3_2/inference/1b_sglang_infer.yaml) âĒ [Inference](/configs/recipes/llama3_2/inference/1b_infer.yaml) âĒ [Evaluation](/configs/recipes/llama3_2/evaluation/1b_eval.yaml) |
| Llama 3.2 3B | [FFT](/configs/recipes/llama3_2/sft/3b_full/train.yaml) âĒ [LoRA](/configs/recipes/llama3_2/sft/3b_lora/train.yaml) âĒ [QLoRA](/configs/recipes/llama3_2/sft/3b_qlora/train.yaml) âĒ [Inference (vLLM)](/configs/recipes/llama3_2/inference/3b_vllm_infer.yaml) âĒ [Inference (SGLang)](/configs/recipes/llama3_2/inference/3b_sglang_infer.yaml) âĒ [Inference](/configs/recipes/llama3_2/inference/3b_infer.yaml) âĒ [Evaluation](/configs/recipes/llama3_2/evaluation/3b_eval.yaml) |
| Llama 3.3 70B | [FFT](/configs/recipes/llama3_3/sft/70b_full/train.yaml) âĒ [LoRA](/configs/recipes/llama3_3/sft/70b_lora/train.yaml) âĒ [QLoRA](/configs/recipes/llama3_3/sft/70b_qlora/train.yaml) âĒ [Inference (vLLM)](/configs/recipes/llama3_3/inference/70b_vllm_infer.yaml) âĒ [Inference](/configs/recipes/llama3_3/inference/70b_infer.yaml) âĒ [Evaluation](/configs/recipes/llama3_3/evaluation/70b_eval.yaml) |
| Llama 3.2 Vision 11B | [SFT](/configs/recipes/vision/llama3_2_vision/sft/11b_full/train.yaml) âĒ [Inference (vLLM)](/configs/recipes/vision/llama3_2_vision/inference/11b_rvllm_infer.yaml) âĒ [Inference (SGLang)](/configs/recipes/vision/llama3_2_vision/inference/11b_sglang_infer.yaml) âĒ [Evaluation](/configs/recipes/vision/llama3_2_vision/evaluation/11b_eval.yaml) |### ðĻ Vision Models
| Model | Example Configurations |
|-------|------------------------|
| Llama 3.2 Vision 11B | [SFT](/configs/recipes/vision/llama3_2_vision/sft/11b_full/train.yaml) âĒ [LoRA](/configs/recipes/vision/llama3_2_vision/sft/11b_lora/train.yaml) âĒ [Inference (vLLM)](/configs/recipes/vision/llama3_2_vision/inference/11b_rvllm_infer.yaml) âĒ [Inference (SGLang)](/configs/recipes/vision/llama3_2_vision/inference/11b_sglang_infer.yaml) âĒ [Evaluation](/configs/recipes/vision/llama3_2_vision/evaluation/11b_eval.yaml) |
| LLaVA 7B | [SFT](/configs/recipes/vision/llava_7b/sft/train.yaml) âĒ [Inference (vLLM)](configs/recipes/vision/llava_7b/inference/vllm_infer.yaml) âĒ [Inference](/configs/recipes/vision/llava_7b/inference/infer.yaml) |
| Phi3 Vision 4.2B | [SFT](/configs/recipes/vision/phi3/sft/train.yaml) âĒ [Inference (vLLM)](configs/recipes/vision/phi3/inference/vllm_infer.yaml) |
| Qwen2-VL 2B | [SFT](/configs/recipes/vision/qwen2_vl_2b/sft/train.yaml) âĒ [Inference (vLLM)](configs/recipes/vision/qwen2_vl_2b/inference/vllm_infer.yaml) âĒ [Inference (SGLang)](configs/recipes/vision/qwen2_vl_2b/inference/sglang_infer.yaml) âĒ [Inference](configs/recipes/vision/qwen2_vl_2b/inference/infer.yaml) âĒ [Evaluation](configs/recipes/vision/qwen2_vl_2b/evaluation/eval.yaml) |
| SmolVLM-Instruct 2B | [SFT](/configs/recipes/vision/smolvlm/sft/gcp_job.yaml) |### ð Even more options
This section lists all the language models that can be used with Oumi. Thanks to the integration with the [ðĪ Transformers](https://github.com/huggingface/transformers) library, you can easily use any of these models for training, evaluation, or inference.
Models prefixed with a checkmark (â ) have been thoroughly tested and validated by the Oumi community, with ready-to-use recipes available in the [configs/recipes](configs/recipes) directory.
ð Click to see more supported models
#### Instruct Models
| Model | Size | Paper | HF Hub | License | Open [^1] | Recommended Parameters |
|-------|------|-------|---------|----------|------|------------------------|
| â SmolLM-Instruct | 135M/360M/1.7B | [Blog](https://huggingface.co/blog/smollm) | [Hub](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) | Apache 2.0 | â | |
| â DeepSeek R1 Family | 1.5B/8B/32B/70B/671B | [Blog](https://api-docs.deepseek.com/news/news250120) | [Hub](https://huggingface.co/deepseek-ai/DeepSeek-R1) | MIT | â | |
| â Llama 3.1 Instruct | 8B/70B/405B | [Paper](https://arxiv.org/abs/2407.21783) | [Hub](https://huggingface.co/meta-llama/Llama-3.1-70b-instruct) | [License](https://llama.meta.com/llama3/license/) | â | |
| â Llama 3.2 Instruct | 1B/3B | [Paper](https://arxiv.org/abs/2407.21783) | [Hub](https://huggingface.co/meta-llama/Llama-3.2-3b-instruct) | [License](https://llama.meta.com/llama3/license/) | â | |
| â Llama 3.3 Instruct | 70B | [Paper](https://arxiv.org/abs/2407.21783) | [Hub](https://huggingface.co/meta-llama/Llama-3.3-70b-instruct) | [License](https://llama.meta.com/llama3/license/) | â | |
| â Phi-3.5-Instruct | 4B/14B | [Paper](https://arxiv.org/abs/2404.14219) | [Hub](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | [License](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/LICENSE) | â | |
| Qwen2.5-Instruct | 0.5B-70B | [Paper](https://arxiv.org/abs/2309.16609) | [Hub](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | [License](https://github.com/QwenLM/Qwen/blob/main/LICENSE) | â | |
| OLMo 2 Instruct | 7B | [Paper](https://arxiv.org/abs/2402.00838) | [Hub](https://huggingface.co/allenai/OLMo-2-1124-7B) | Apache 2.0 | â | |
| MPT-Instruct | 7B | [Blog](https://www.mosaicml.com/blog/mpt-7b) | [Hub](https://huggingface.co/mosaicml/mpt-7b-instruct) | Apache 2.0 | â | |
| Command R | 35B/104B | [Blog](https://cohere.com/blog/command-r7b) | [Hub](https://huggingface.co/CohereForAI/c4ai-command-r-plus) | [License](https://cohere.com/c4ai-cc-by-nc-license) | â | |
| Granite-3.1-Instruct | 2B/8B | [Paper](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) | [Hub](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) | Apache 2.0 | â | |
| Gemma 2 Instruct | 2B/9B | [Blog](https://ai.google.dev/gemma) | [Hub](https://huggingface.co/google/gemma-2-2b-it) | [License](https://ai.google.dev/gemma/terms) | â | |
| DBRX-Instruct | 130B MoE | [Blog](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) | [Hub](https://huggingface.co/databricks/dbrx-instruct) | Apache 2.0 | â | |
| Falcon-Instruct | 7B/40B | [Paper](https://arxiv.org/abs/2306.01116) | [Hub](https://huggingface.co/tiiuae/falcon-7b-instruct) | Apache 2.0 | â | |#### Vision-Language Models
| Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
|-------|------|-------|---------|----------|------|---------------------|
| â Llama 3.2 Vision | 11B | [Paper](https://arxiv.org/abs/2407.21783) | [Hub](https://huggingface.co/meta-llama/Llama-3.2-11b-vision) | [License](https://llama.meta.com/llama3/license/) | â | |
| â LLaVA-1.5 | 7B | [Paper](https://arxiv.org/abs/2310.03744) | [Hub](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | [License](https://ai.meta.com/llama/license) | â | |
| â Phi-3 Vision | 4.2B | [Paper](https://arxiv.org/abs/2404.14219) | [Hub](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) | [License](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/LICENSE) | â | |
| â BLIP-2 | 3.6B | [Paper](https://arxiv.org/abs/2301.12597) | [Hub](https://huggingface.co/Salesforce/blip2-opt-2.7b) | MIT | â | |
| â Qwen2-VL | 2B | [Blog](https://qwenlm.github.io/blog/qwen2-vl/) | [Hub](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) | [License](https://github.com/QwenLM/Qwen/blob/main/LICENSE) | â | |
| â SmolVLM-Instruct | 2B | [Blog](https://huggingface.co/blog/smolvlm) | [Hub](https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct) | Apache 2.0 | â | |#### Base Models
| Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
|-------|------|-------|---------|----------|------|---------------------|
| â SmolLM2 | 135M/360M/1.7B | [Blog](https://huggingface.co/blog/smollm) | [Hub](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) | Apache 2.0 | â | |
| â Llama 3.2 | 1B/3B | [Paper](https://arxiv.org/abs/2407.21783) | [Hub](https://huggingface.co/meta-llama/Llama-3.2-3b) | [License](https://llama.meta.com/llama3/license/) | â | |
| â Llama 3.1 | 8B/70B/405B | [Paper](https://arxiv.org/abs/2407.21783) | [Hub](https://huggingface.co/meta-llama/Llama-3.1-70b) | [License](https://llama.meta.com/llama3/license/) | â | |
| â GPT-2 | 124M-1.5B | [Paper](https://arxiv.org/abs/2005.14165) | [Hub](https://huggingface.co/gpt2) | MIT | â | |
| DeepSeek V2 | 7B/13B | [Blog](https://www.deepseek.com/blogs/deepseek-v2) | [Hub](https://huggingface.co/deepseek-ai/deepseek-llm-7b-v2) | [License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/LICENSE-MODEL) | â | |
| Gemma2 | 2B/9B | [Blog](https://ai.google.dev/gemma) | [Hub](https://huggingface.co/google/gemma2-7b) | [License](https://ai.google.dev/gemma/terms) | â | |
| GPT-J | 6B | [Blog](https://www.eleuther.ai/artifacts/gpt-j) | [Hub](https://huggingface.co/EleutherAI/gpt-j-6b) | Apache 2.0 | â | |
| GPT-NeoX | 20B | [Paper](https://arxiv.org/abs/2204.06745) | [Hub](https://huggingface.co/EleutherAI/gpt-neox-20b) | Apache 2.0 | â | |
| Mistral | 7B | [Paper](https://arxiv.org/abs/2310.06825) | [Hub](https://huggingface.co/mistralai/Mistral-7B-v0.1) | Apache 2.0 | â | |
| Mixtral | 8x7B/8x22B | [Blog](https://mistral.ai/news/mixtral-of-experts/) | [Hub](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | Apache 2.0 | â | |
| MPT | 7B | [Blog](https://www.mosaicml.com/blog/mpt-7b) | [Hub](https://huggingface.co/mosaicml/mpt-7b) | Apache 2.0 | â | |
| OLMo | 1B/7B | [Paper](https://arxiv.org/abs/2402.00838) | [Hub](https://huggingface.co/allenai/OLMo-7B-hf) | Apache 2.0 | â | |#### Reasoning Models
| Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
|-------|------|-------|---------|----------|------|---------------------|
| Qwen QwQ | 32B | [Blog](https://qwenlm.github.io/blog/qwq-32b-preview/) | [Hub](https://huggingface.co/Qwen/QwQ-32B-Preview) | [License](https://github.com/QwenLM/Qwen/blob/main/LICENSE) | â | |#### Code Models
| Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
|-------|------|-------|---------|----------|------|---------------------|
| â Qwen2.5 Coder | 0.5B-32B | [Blog](https://qwenlm.github.io/blog/qwen2.5/) | [Hub](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) | [License](https://github.com/QwenLM/Qwen/blob/main/LICENSE) | â | |
| DeepSeek Coder | 1.3B-33B | [Paper](https://arxiv.org/abs/2401.02954) | [Hub](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct) | [License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/LICENSE-MODEL) | â | |
| StarCoder 2 | 3B/7B/15B | [Paper](https://arxiv.org/abs/2402.19173) | [Hub](https://huggingface.co/bigcode/starcoder2-15b) | [License](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement) | â | |#### Math Models
| Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
|-------|------|-------|---------|----------|------|---------------------|
| DeepSeek Math | 7B | [Paper](https://arxiv.org/abs/2401.02954) | [Hub](https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct) | [License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/LICENSE-MODEL) | â | |## ð Documentation
To learn more about all the platform's capabilities, see the [Oumi documentation](https://oumi.ai/docs).
## ðĪ Join the Community!
Oumi is a community-first effort. Whether you are a developer, a researcher, or a non-technical user, all contributions are very welcome!
- To contribute to the `oumi` repository, please check the [`CONTRIBUTING.md`](https://github.com/oumi-ai/oumi/blob/main/CONTRIBUTING.md) for guidance on how to contribute to send your first Pull Request.
- Make sure to join our [Discord community](https://discord.gg/oumi) to get help, share your experiences, and contribute to the project!
- If you are interested in joining one of the community's open-science efforts, check out our [open collaboration](https://oumi.ai/community) page.## ð Acknowledgements
Oumi makes use of [several libraries](https://oumi.ai/docs/en/latest/about/acknowledgements.html) and tools from the open-source community. We would like to acknowledge and deeply thank the contributors of these projects! âĻ ð ðŦ
## ð Citation
If you find Oumi useful in your research, please consider citing it:
```bibtex
@software{oumi2025,
author = {Oumi Community},
title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models},
month = {January},
year = {2025},
url = {https://github.com/oumi-ai/oumi}
}
```## ð License
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.
[^1]: Open models are defined as models with fully open weights, training code, and data, and a permissive license. See [Open Source Definitions](https://opensource.org/ai) for more information.