An open API service indexing awesome lists of open source software.

https://github.com/cheahjs/free-llm-api-resources

A list of free LLM inference resources accessible via API.
https://github.com/cheahjs/free-llm-api-resources

ai claude gemini llama llm openai

Last synced: 4 months ago
JSON representation

A list of free LLM inference resources accessible via API.

Awesome Lists containing this project

README

          

# Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

> [!NOTE]
> Please don't abuse these services, else we might lose them.

> [!WARNING]
> This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

- [Free Providers](#free-providers)
- [OpenRouter](#openrouter)
- [Google AI Studio](#google-ai-studio)
- [NVIDIA NIM](#nvidia-nim)
- [Mistral (La Plateforme)](#mistral-la-plateforme)
- [Mistral (Codestral)](#mistral-codestral)
- [HuggingFace Inference Providers](#huggingface-inference-providers)
- [Vercel AI Gateway](#vercel-ai-gateway)
- [Cerebras](#cerebras)
- [Groq](#groq)
- [Cohere](#cohere)
- [GitHub Models](#github-models)
- [Cloudflare Workers AI](#cloudflare-workers-ai)
- [Google Cloud Vertex AI](#google-cloud-vertex-ai)
- [Providers with trial credits](#providers-with-trial-credits)
- [Fireworks](#fireworks)
- [Baseten](#baseten)
- [Nebius](#nebius)
- [Novita](#novita)
- [AI21](#ai21)
- [Upstage](#upstage)
- [NLP Cloud](#nlp-cloud)
- [Alibaba Cloud (International) Model Studio](#alibaba-cloud-international-model-studio)
- [Modal](#modal)
- [Inference.net](#inferencenet)
- [Hyperbolic](#hyperbolic)
- [SambaNova Cloud](#sambanova-cloud)
- [Scaleway Generative APIs](#scaleway-generative-apis)

## Free Providers

### [OpenRouter](https://openrouter.ai)

**Limits:**

[20 requests/minute
50 requests/day
Up to 1000 requests/day with $10 lifetime topup](https://openrouter.ai/docs/api-reference/limits)

Models share a common quota.

- [Gemma 3 12B Instruct](https://openrouter.ai/google/gemma-3-12b-it:free)
- [Gemma 3 27B Instruct](https://openrouter.ai/google/gemma-3-27b-it:free)
- [Gemma 3 4B Instruct](https://openrouter.ai/google/gemma-3-4b-it:free)
- [Hermes 3 Llama 3.1 405B](https://openrouter.ai/nousresearch/hermes-3-llama-3.1-405b:free)
- [Llama 3.1 405B Instruct](https://openrouter.ai/meta-llama/llama-3.1-405b-instruct:free)
- [Llama 3.2 3B Instruct](https://openrouter.ai/meta-llama/llama-3.2-3b-instruct:free)
- [Llama 3.3 70B Instruct](https://openrouter.ai/meta-llama/llama-3.3-70b-instruct:free)
- [Mistral Small 3.1 24B Instruct](https://openrouter.ai/mistralai/mistral-small-3.1-24b-instruct:free)
- [Qwen 2.5 VL 7B Instruct](https://openrouter.ai/qwen/qwen-2.5-vl-7b-instruct:free)
- [allenai/molmo-2-8b:free](https://openrouter.ai/allenai/molmo-2-8b:free)
- [arcee-ai/trinity-large-preview:free](https://openrouter.ai/arcee-ai/trinity-large-preview:free)
- [arcee-ai/trinity-mini:free](https://openrouter.ai/arcee-ai/trinity-mini:free)
- [cognitivecomputations/dolphin-mistral-24b-venice-edition:free](https://openrouter.ai/cognitivecomputations/dolphin-mistral-24b-venice-edition:free)
- [deepseek/deepseek-r1-0528:free](https://openrouter.ai/deepseek/deepseek-r1-0528:free)
- [google/gemma-3n-e2b-it:free](https://openrouter.ai/google/gemma-3n-e2b-it:free)
- [google/gemma-3n-e4b-it:free](https://openrouter.ai/google/gemma-3n-e4b-it:free)
- [liquid/lfm-2.5-1.2b-instruct:free](https://openrouter.ai/liquid/lfm-2.5-1.2b-instruct:free)
- [liquid/lfm-2.5-1.2b-thinking:free](https://openrouter.ai/liquid/lfm-2.5-1.2b-thinking:free)
- [moonshotai/kimi-k2:free](https://openrouter.ai/moonshotai/kimi-k2:free)
- [nvidia/nemotron-3-nano-30b-a3b:free](https://openrouter.ai/nvidia/nemotron-3-nano-30b-a3b:free)
- [nvidia/nemotron-nano-12b-v2-vl:free](https://openrouter.ai/nvidia/nemotron-nano-12b-v2-vl:free)
- [nvidia/nemotron-nano-9b-v2:free](https://openrouter.ai/nvidia/nemotron-nano-9b-v2:free)
- [openai/gpt-oss-120b:free](https://openrouter.ai/openai/gpt-oss-120b:free)
- [openai/gpt-oss-20b:free](https://openrouter.ai/openai/gpt-oss-20b:free)
- [qwen/qwen3-4b:free](https://openrouter.ai/qwen/qwen3-4b:free)
- [qwen/qwen3-coder:free](https://openrouter.ai/qwen/qwen3-coder:free)
- [qwen/qwen3-next-80b-a3b-instruct:free](https://openrouter.ai/qwen/qwen3-next-80b-a3b-instruct:free)
- [tngtech/deepseek-r1t-chimera:free](https://openrouter.ai/tngtech/deepseek-r1t-chimera:free)
- [tngtech/deepseek-r1t2-chimera:free](https://openrouter.ai/tngtech/deepseek-r1t2-chimera:free)
- [tngtech/tng-r1t-chimera:free](https://openrouter.ai/tngtech/tng-r1t-chimera:free)
- [upstage/solar-pro-3:free](https://openrouter.ai/upstage/solar-pro-3:free)
- [z-ai/glm-4.5-air:free](https://openrouter.ai/z-ai/glm-4.5-air:free)

### [Google AI Studio](https://aistudio.google.com)

Data is used for training when used outside of the UK/CH/EEA/EU.

Model NameModel Limits
Gemini 3 Flash250,000 tokens/minute
20 requests/day
5 requests/minute
Gemini 2.5 Flash250,000 tokens/minute
20 requests/day
5 requests/minute
Gemini 2.5 Flash-Lite250,000 tokens/minute
20 requests/day
10 requests/minute
Gemma 3 27B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 12B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 4B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 1B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute

### [NVIDIA NIM](https://build.nvidia.com/explore/discover)

Phone number verification required.
Models tend to be context window limited.

**Limits:** 40 requests/minute

- [Various open models](https://build.nvidia.com/models)

### [Mistral (La Plateforme)](https://console.mistral.ai/)

* Free tier (Experiment plan) requires opting into data training
* Requires phone number verification.

**Limits (per-model):** 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month

- [Open and Proprietary Mistral models](https://docs.mistral.ai/getting-started/models/models_overview/)

### [Mistral (Codestral)](https://codestral.mistral.ai/)

* Currently free to use
* Monthly subscription based
* Requires phone number verification

**Limits:** 30 requests/minute, 2,000 requests/day

- Codestral

### [HuggingFace Inference Providers](https://huggingface.co/docs/inference-providers/en/index)

HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.

**Limits:** [$0.10/month in credits](https://huggingface.co/docs/inference-providers/en/pricing)

- Various open models across supported providers

### [Vercel AI Gateway](https://vercel.com/docs/ai-gateway)

Routes to various supported providers.

**Limits:** [$5/month](https://vercel.com/docs/ai-gateway/pricing)

### [Cerebras](https://cloud.cerebras.ai/)

Model NameModel Limits
gpt-oss-120b30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Qwen 3 235B A22B Instruct30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.3 70B30 requests/minute
64,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Qwen 3 32B30 requests/minute
64,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.1 8B30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Z.ai GLM-4.610 requests/minute
60,000 tokens/minute
100 requests/hour
100,000 tokens/hour
100 requests/day
1,000,000 tokens/day

### [Groq](https://console.groq.com)

Model NameModel Limits
Allam 2 7B7,000 requests/day
6,000 tokens/minute
Llama 3.1 8B14,400 requests/day
6,000 tokens/minute
Llama 3.3 70B1,000 requests/day
12,000 tokens/minute
Llama 4 Maverick 17B 128E Instruct1,000 requests/day
6,000 tokens/minute
Llama 4 Scout Instruct1,000 requests/day
30,000 tokens/minute
Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Whisper Large v3 Turbo7,200 audio-seconds/minute
2,000 requests/day
canopylabs/orpheus-arabic-saudi
canopylabs/orpheus-v1-english
groq/compound250 requests/day
70,000 tokens/minute
groq/compound-mini250 requests/day
70,000 tokens/minute
meta-llama/llama-guard-4-12b14,400 requests/day
15,000 tokens/minute
meta-llama/llama-prompt-guard-2-22m
meta-llama/llama-prompt-guard-2-86m
moonshotai/kimi-k2-instruct1,000 requests/day
10,000 tokens/minute
moonshotai/kimi-k2-instruct-09051,000 requests/day
10,000 tokens/minute
openai/gpt-oss-120b1,000 requests/day
8,000 tokens/minute
openai/gpt-oss-20b1,000 requests/day
8,000 tokens/minute
openai/gpt-oss-safeguard-20b1,000 requests/day
8,000 tokens/minute
qwen/qwen3-32b1,000 requests/day
6,000 tokens/minute

### [Cohere](https://cohere.com)

**Limits:**

[20 requests/minute
1,000 requests/month](https://docs.cohere.com/docs/rate-limits)

Models share a common monthly quota.

- c4ai-aya-expanse-32b
- c4ai-aya-expanse-8b
- c4ai-aya-vision-32b
- c4ai-aya-vision-8b
- command-a-03-2025
- command-a-reasoning-08-2025
- command-a-translate-08-2025
- command-a-vision-07-2025
- command-r-08-2024
- command-r-plus-08-2024
- command-r7b-12-2024
- command-r7b-arabic-02-2025

### [GitHub Models](https://github.com/marketplace/models)

Extremely restrictive input/output token limits.

**Limits:** [Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)](https://docs.github.com/en/github-models/prototyping-with-ai-models#rate-limits)

- AI21 Jamba 1.5 Large
- Codestral 25.01
- Cohere Command A
- Cohere Command R 08-2024
- Cohere Command R+ 08-2024
- DeepSeek-R1
- DeepSeek-R1-0528
- DeepSeek-V3-0324
- Grok 3
- Grok 3 Mini
- Llama 4 Maverick 17B 128E Instruct FP8
- Llama 4 Scout 17B 16E Instruct
- Llama-3.2-11B-Vision-Instruct
- Llama-3.2-90B-Vision-Instruct
- Llama-3.3-70B-Instruct
- MAI-DS-R1
- Meta-Llama-3.1-405B-Instruct
- Meta-Llama-3.1-8B-Instruct
- Ministral 3B
- Mistral Medium 3 (25.05)
- Mistral Small 3.1
- OpenAI GPT-4.1
- OpenAI GPT-4.1-mini
- OpenAI GPT-4.1-nano
- OpenAI GPT-4o
- OpenAI GPT-4o mini
- OpenAI Text Embedding 3 (large)
- OpenAI Text Embedding 3 (small)
- OpenAI gpt-5
- OpenAI gpt-5-chat (preview)
- OpenAI gpt-5-mini
- OpenAI gpt-5-nano
- OpenAI o1
- OpenAI o1-mini
- OpenAI o1-preview
- OpenAI o3
- OpenAI o3-mini
- OpenAI o4-mini
- Phi-4
- Phi-4-mini-instruct
- Phi-4-mini-reasoning
- Phi-4-multimodal-instruct
- Phi-4-reasoning

### [Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai)

**Limits:** [10,000 neurons/day](https://developers.cloudflare.com/workers-ai/platform/pricing/#free-allocation)

- @cf/aisingapore/gemma-sea-lion-v4-27b-it
- @cf/ibm-granite/granite-4.0-h-micro
- @cf/openai/gpt-oss-120b
- @cf/openai/gpt-oss-20b
- @cf/qwen/qwen3-30b-a3b-fp8
- DeepSeek R1 Distill Qwen 32B
- Deepseek Coder 6.7B Base (AWQ)
- Deepseek Coder 6.7B Instruct (AWQ)
- Deepseek Math 7B Instruct
- Discolm German 7B v1 (AWQ)
- Falcom 7B Instruct
- Gemma 2B Instruct (LoRA)
- Gemma 3 12B Instruct
- Gemma 7B Instruct
- Gemma 7B Instruct (LoRA)
- Hermes 2 Pro Mistral 7B
- Llama 2 13B Chat (AWQ)
- Llama 2 7B Chat (FP16)
- Llama 2 7B Chat (INT8)
- Llama 2 7B Chat (LoRA)
- Llama 3 8B Instruct
- Llama 3 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (FP8)
- Llama 3.2 11B Vision Instruct
- Llama 3.2 1B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct (FP8)
- Llama 4 Scout Instruct
- Llama Guard 3 8B
- Mistral 7B Instruct v0.1
- Mistral 7B Instruct v0.1 (AWQ)
- Mistral 7B Instruct v0.2
- Mistral 7B Instruct v0.2 (LoRA)
- Mistral Small 3.1 24B Instruct
- Neural Chat 7B v3.1 (AWQ)
- OpenChat 3.5 0106
- OpenHermes 2.5 Mistral 7B (AWQ)
- Phi-2
- Qwen 1.5 0.5B Chat
- Qwen 1.5 1.8B Chat
- Qwen 1.5 14B Chat (AWQ)
- Qwen 1.5 7B Chat (AWQ)
- Qwen 2.5 Coder 32B Instruct
- Qwen QwQ 32B
- SQLCoder 7B 2
- Starling LM 7B Beta
- TinyLlama 1.1B Chat v1.0
- Una Cybertron 7B v2 (BF16)
- Zephyr 7B Beta (AWQ)

### [Google Cloud Vertex AI](https://console.cloud.google.com/vertex-ai/model-garden)

Very stringent payment verification for Google Cloud.

Model NameModel Limits
Llama 3.2 90B Vision Instruct30 requests/minute
Free during preview
Llama 3.1 70B Instruct60 requests/minute
Free during preview
Llama 3.1 8B Instruct60 requests/minute
Free during preview

## Providers with trial credits

### [Fireworks](https://fireworks.ai/)

**Credits:** $1

**Models:** [Various open models](https://fireworks.ai/models)

### [Baseten](https://app.baseten.co/)

**Credits:** $30

**Models:** [Any supported model - pay by compute time](https://www.baseten.co/library/)

### [Nebius](https://studio.nebius.com/)

**Credits:** $1

**Models:** [Various open models](https://studio.nebius.ai/models)

### [Novita](https://novita.ai/?ref=ytblmjc&utm_source=affiliate)

**Credits:** $0.5 for 1 year

**Models:** [Various open models](https://novita.ai/models)

### [AI21](https://studio.ai21.com/)

**Credits:** $10 for 3 months

**Models:** Jamba family of models

### [Upstage](https://console.upstage.ai/)

**Credits:** $10 for 3 months

**Models:** Solar Pro/Mini

### [NLP Cloud](https://nlpcloud.com/home)

**Credits:** $15

**Requirements:** Phone number verification

**Models:** Various open models

### [Alibaba Cloud (International) Model Studio](https://bailian.console.alibabacloud.com/)

**Credits:** 1 million tokens/model

**Models:** [Various open and proprietary Qwen models](https://www.alibabacloud.com/en/product/modelstudio)

### [Modal](https://modal.com)

**Credits:** $5/month upon sign up, $30/month with payment method added

**Models:** Any supported model - pay by compute time

### [Inference.net](https://inference.net)

**Credits:** $1, $25 on responding to email survey

**Models:** Various open models

### [Hyperbolic](https://app.hyperbolic.xyz/)

**Credits:** $1

**Models:**
- DeepSeek V3
- DeepSeek V3 0324
- Llama 3.1 405B Base
- Llama 3.1 405B Instruct
- Llama 3.1 70B Instruct
- Llama 3.1 8B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct
- Pixtral 12B (2409)
- Qwen QwQ 32B
- Qwen2.5 72B Instruct
- Qwen2.5 Coder 32B Instruct
- Qwen2.5 VL 72B Instruct
- Qwen2.5 VL 7B Instruct
- deepseek-ai/deepseek-r1-0528
- openai/gpt-oss-120b
- openai/gpt-oss-120b-turbo
- openai/gpt-oss-20b
- qwen/qwen3-235b-a22b
- qwen/qwen3-235b-a22b-instruct-2507
- qwen/qwen3-coder-480b-a35b-instruct
- qwen/qwen3-next-80b-a3b-instruct
- qwen/qwen3-next-80b-a3b-thinking

### [SambaNova Cloud](https://cloud.sambanova.ai/)

**Credits:** $5 for 3 months

**Models:**
- E5-Mistral-7B-Instruct
- Llama 3.1 8B
- Llama 3.3 70B
- Llama 3.3 70B
- Llama-4-Maverick-17B-128E-Instruct
- Qwen/Qwen3-235B
- Qwen/Qwen3-32B
- Whisper-Large-v3
- deepseek-ai/DeepSeek-R1-0528
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- deepseek-ai/DeepSeek-V3-0324
- deepseek-ai/DeepSeek-V3.1
- deepseek-ai/DeepSeek-V3.1-Terminus
- deepseek-ai/DeepSeek-V3.2
- openai/gpt-oss-120b
- tbd

### [Scaleway Generative APIs](https://console.scaleway.com/generative-api/models)

**Credits:** 1,000,000 free tokens

**Models:**
- BGE-Multilingual-Gemma2
- DeepSeek R1 Distill Llama 70B
- Gemma 3 27B Instruct
- Llama 3.1 8B Instruct
- Llama 3.3 70B Instruct
- Mistral Nemo 2407
- Pixtral 12B (2409)
- Whisper Large v3
- devstral-2-123b-instruct-2512
- gpt-oss-120b
- holo2-30b-a3b
- mistral-small-3.2-24b-instruct-2506
- qwen3-235b-a22b-instruct-2507
- qwen3-coder-30b-a3b-instruct
- qwen3-embedding-8b
- voxtral-small-24b-2507