https://github.com/cheahjs/free-llm-api-resources

A list of free LLM inference resources accessible via API.
https://github.com/cheahjs/free-llm-api-resources

ai claude gemini llama llm openai

Last synced: 6 months ago
JSON representation

A list of free LLM inference resources accessible via API.

Host: GitHub
URL: https://github.com/cheahjs/free-llm-api-resources
Owner: cheahjs
Created: 2024-07-04T20:10:17.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2026-01-28T00:34:16.000Z (6 months ago)
Last Synced: 2026-01-28T15:38:55.783Z (6 months ago)
Topics: ai, claude, gemini, llama, llm, openai
Language: Python
Homepage:
Size: 357 KB
Stars: 8,034
Watchers: 125
Forks: 781
Open Issues: 26
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

stargazer - cheahjs/free-llm-api-resources - A list of free LLM inference resources accessible via API. (⭐️27629) (Python)
StarryDivineSky - cheahjs/free-llm-api-resources - llm-api-resources 是一个整理了大量免费大型语言模型（LLM）推理资源的项目，通过API接口提供访问方式。该项目旨在为开发者和研究者提供便捷的途径，直接调用多种开源LLM模型的推理服务，无需自行部署模型。项目核心功能是维护一个持续更新的API资源列表，涵盖不同模型的接口地址、认证方式、调用参数规范及使用限制等信息，例如支持通义千问、Llama、Baichuan等主流模型的API接入。其特色在于对资源进行分类整理，按模型类型、是否需注册、响应速度等维度标注，同时提供调用示例代码（如Python请求示例），帮助用户快速上手。工作原理基于模型提供方开放的API接口，用户通过HTTP请求发送文本输入，接收模型生成的响应结果。项目定期更新资源链接和参数说明，确保信息准确性，并通过开源社区协作维护模型兼容性。该工具适合需要快速集成LLM能力但缺乏部署资源的场景，同时为开发者节省自行搭建模型服务的时间成本，是连接模型开发者与应用者的桥梁。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
AiTreasureBox - cheahjs/free-llm-api-resources - 03-23_16773_134](https://img.shields.io/github/stars/cheahjs/free-llm-api-resources.svg)|A list of free LLM inference resources accessible via API.| (Repos)
awesome-ChatGPT-repositories - free-llm-api-resources - A list of free LLM inference resources accessible via API. (Langchain)
awesome_gpt_super_prompting - cheahjs/free-llm-api-resources - A list of free LLM inference resources accessible via API. (🗂️ GPTs Lists / Hall Of Fame:)
awesome-github-projects - free-llm-api-resources - A list of free LLM inference resources accessible via API. ⭐28,130 `Python` 🔥 (🤖 AI & Machine Learning)
awesome-data-analysis - Free Llm Api Resources - Up-to-date list of free APIs for accessing large language models (LLMs). (🧠 AI Applications & Platforms / Resources)
awesome-local-llms - free-llm-api-resources
awesome-ai-200 - cheahjs/free-llm-api-resources

README

          
# Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

> [!NOTE]  

> Please don't abuse these services, else we might lose them.

> [!WARNING]  

> This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

- [Free Providers](#free-providers)

  - [OpenRouter](#openrouter)

  - [Google AI Studio](#google-ai-studio)

  - [NVIDIA NIM](#nvidia-nim)

  - [Mistral (La Plateforme)](#mistral-la-plateforme)

  - [Mistral (Codestral)](#mistral-codestral)

  - [HuggingFace Inference Providers](#huggingface-inference-providers)

  - [Vercel AI Gateway](#vercel-ai-gateway)

  - [Cerebras](#cerebras)

  - [Groq](#groq)

  - [Cohere](#cohere)

  - [GitHub Models](#github-models)

  - [Cloudflare Workers AI](#cloudflare-workers-ai)

  - [Google Cloud Vertex AI](#google-cloud-vertex-ai)

- [Providers with trial credits](#providers-with-trial-credits)

  - [Fireworks](#fireworks)

  - [Baseten](#baseten)

  - [Nebius](#nebius)

  - [Novita](#novita)

  - [AI21](#ai21)

  - [Upstage](#upstage)

  - [NLP Cloud](#nlp-cloud)

  - [Alibaba Cloud (International) Model Studio](#alibaba-cloud-international-model-studio)

  - [Modal](#modal)

  - [Inference.net](#inferencenet)

  - [Hyperbolic](#hyperbolic)

  - [SambaNova Cloud](#sambanova-cloud)

  - [Scaleway Generative APIs](#scaleway-generative-apis)

## Free Providers

### [OpenRouter](https://openrouter.ai)

**Limits:**

[20 requests/minute
50 requests/day
Up to 1000 requests/day with $10 lifetime topup](https://openrouter.ai/docs/api-reference/limits)

Models share a common quota.

- [Gemma 3 12B Instruct](https://openrouter.ai/google/gemma-3-12b-it:free)

- [Gemma 3 27B Instruct](https://openrouter.ai/google/gemma-3-27b-it:free)

- [Gemma 3 4B Instruct](https://openrouter.ai/google/gemma-3-4b-it:free)

- [Hermes 3 Llama 3.1 405B](https://openrouter.ai/nousresearch/hermes-3-llama-3.1-405b:free)

- [Llama 3.1 405B Instruct](https://openrouter.ai/meta-llama/llama-3.1-405b-instruct:free)

- [Llama 3.2 3B Instruct](https://openrouter.ai/meta-llama/llama-3.2-3b-instruct:free)

- [Llama 3.3 70B Instruct](https://openrouter.ai/meta-llama/llama-3.3-70b-instruct:free)

- [Mistral Small 3.1 24B Instruct](https://openrouter.ai/mistralai/mistral-small-3.1-24b-instruct:free)

- [Qwen 2.5 VL 7B Instruct](https://openrouter.ai/qwen/qwen-2.5-vl-7b-instruct:free)

- [allenai/molmo-2-8b:free](https://openrouter.ai/allenai/molmo-2-8b:free)

- [arcee-ai/trinity-large-preview:free](https://openrouter.ai/arcee-ai/trinity-large-preview:free)

- [arcee-ai/trinity-mini:free](https://openrouter.ai/arcee-ai/trinity-mini:free)

- [cognitivecomputations/dolphin-mistral-24b-venice-edition:free](https://openrouter.ai/cognitivecomputations/dolphin-mistral-24b-venice-edition:free)

- [deepseek/deepseek-r1-0528:free](https://openrouter.ai/deepseek/deepseek-r1-0528:free)

- [google/gemma-3n-e2b-it:free](https://openrouter.ai/google/gemma-3n-e2b-it:free)

- [google/gemma-3n-e4b-it:free](https://openrouter.ai/google/gemma-3n-e4b-it:free)

- [liquid/lfm-2.5-1.2b-instruct:free](https://openrouter.ai/liquid/lfm-2.5-1.2b-instruct:free)

- [liquid/lfm-2.5-1.2b-thinking:free](https://openrouter.ai/liquid/lfm-2.5-1.2b-thinking:free)

- [moonshotai/kimi-k2:free](https://openrouter.ai/moonshotai/kimi-k2:free)

- [nvidia/nemotron-3-nano-30b-a3b:free](https://openrouter.ai/nvidia/nemotron-3-nano-30b-a3b:free)

- [nvidia/nemotron-nano-12b-v2-vl:free](https://openrouter.ai/nvidia/nemotron-nano-12b-v2-vl:free)

- [nvidia/nemotron-nano-9b-v2:free](https://openrouter.ai/nvidia/nemotron-nano-9b-v2:free)

- [openai/gpt-oss-120b:free](https://openrouter.ai/openai/gpt-oss-120b:free)

- [openai/gpt-oss-20b:free](https://openrouter.ai/openai/gpt-oss-20b:free)

- [qwen/qwen3-4b:free](https://openrouter.ai/qwen/qwen3-4b:free)

- [qwen/qwen3-coder:free](https://openrouter.ai/qwen/qwen3-coder:free)

- [qwen/qwen3-next-80b-a3b-instruct:free](https://openrouter.ai/qwen/qwen3-next-80b-a3b-instruct:free)

- [tngtech/deepseek-r1t-chimera:free](https://openrouter.ai/tngtech/deepseek-r1t-chimera:free)

- [tngtech/deepseek-r1t2-chimera:free](https://openrouter.ai/tngtech/deepseek-r1t2-chimera:free)

- [tngtech/tng-r1t-chimera:free](https://openrouter.ai/tngtech/tng-r1t-chimera:free)

- [upstage/solar-pro-3:free](https://openrouter.ai/upstage/solar-pro-3:free)

- [z-ai/glm-4.5-air:free](https://openrouter.ai/z-ai/glm-4.5-air:free)

### [Google AI Studio](https://aistudio.google.com)

Data is used for training when used outside of the UK/CH/EEA/EU.

Model NameModel Limits

Gemini 3 Flash250,000 tokens/minute
20 requests/day
5 requests/minute

Gemini 2.5 Flash250,000 tokens/minute
20 requests/day
5 requests/minute

Gemini 2.5 Flash-Lite250,000 tokens/minute
20 requests/day
10 requests/minute

Gemma 3 27B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute

Gemma 3 12B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute

Gemma 3 4B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute

Gemma 3 1B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute

### [NVIDIA NIM](https://build.nvidia.com/explore/discover)

Phone number verification required.

Models tend to be context window limited.

**Limits:** 40 requests/minute

- [Various open models](https://build.nvidia.com/models)

### [Mistral (La Plateforme)](https://console.mistral.ai/)

* Free tier (Experiment plan) requires opting into data training

* Requires phone number verification.

**Limits (per-model):** 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month

- [Open and Proprietary Mistral models](https://docs.mistral.ai/getting-started/models/models_overview/)

### [Mistral (Codestral)](https://codestral.mistral.ai/)

* Currently free to use

* Monthly subscription based

* Requires phone number verification

**Limits:** 30 requests/minute, 2,000 requests/day

- Codestral

### [HuggingFace Inference Providers](https://huggingface.co/docs/inference-providers/en/index)

HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.

**Limits:** [$0.10/month in credits](https://huggingface.co/docs/inference-providers/en/pricing)

- Various open models across supported providers

### [Vercel AI Gateway](https://vercel.com/docs/ai-gateway)

Routes to various supported providers.

**Limits:** [$5/month](https://vercel.com/docs/ai-gateway/pricing)

### [Cerebras](https://cloud.cerebras.ai/)

Model NameModel Limits

gpt-oss-120b30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day

Qwen 3 235B A22B Instruct30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day

Llama 3.3 70B30 requests/minute
64,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day

Qwen 3 32B30 requests/minute
64,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day

Llama 3.1 8B30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day

Z.ai GLM-4.610 requests/minute
60,000 tokens/minute
100 requests/hour
100,000 tokens/hour
100 requests/day
1,000,000 tokens/day

### [Groq](https://console.groq.com)

Model NameModel Limits

Allam 2 7B7,000 requests/day
6,000 tokens/minute

Llama 3.1 8B14,400 requests/day
6,000 tokens/minute

Llama 3.3 70B1,000 requests/day
12,000 tokens/minute

Llama 4 Maverick 17B 128E Instruct1,000 requests/day
6,000 tokens/minute

Llama 4 Scout Instruct1,000 requests/day
30,000 tokens/minute

Whisper Large v37,200 audio-seconds/minute
2,000 requests/day

Whisper Large v3 Turbo7,200 audio-seconds/minute
2,000 requests/day

canopylabs/orpheus-arabic-saudi

canopylabs/orpheus-v1-english

groq/compound250 requests/day
70,000 tokens/minute

groq/compound-mini250 requests/day
70,000 tokens/minute

meta-llama/llama-guard-4-12b14,400 requests/day
15,000 tokens/minute

meta-llama/llama-prompt-guard-2-22m

meta-llama/llama-prompt-guard-2-86m

moonshotai/kimi-k2-instruct1,000 requests/day
10,000 tokens/minute

moonshotai/kimi-k2-instruct-09051,000 requests/day
10,000 tokens/minute

openai/gpt-oss-120b1,000 requests/day
8,000 tokens/minute

openai/gpt-oss-20b1,000 requests/day
8,000 tokens/minute

openai/gpt-oss-safeguard-20b1,000 requests/day
8,000 tokens/minute

qwen/qwen3-32b1,000 requests/day
6,000 tokens/minute

### [Cohere](https://cohere.com)

**Limits:**

[20 requests/minute
1,000 requests/month](https://docs.cohere.com/docs/rate-limits)

Models share a common monthly quota.

- c4ai-aya-expanse-32b

- c4ai-aya-expanse-8b

- c4ai-aya-vision-32b

- c4ai-aya-vision-8b

- command-a-03-2025

- command-a-reasoning-08-2025

- command-a-translate-08-2025

- command-a-vision-07-2025

- command-r-08-2024

- command-r-plus-08-2024

- command-r7b-12-2024

- command-r7b-arabic-02-2025

### [GitHub Models](https://github.com/marketplace/models)

Extremely restrictive input/output token limits.

**Limits:** [Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)](https://docs.github.com/en/github-models/prototyping-with-ai-models#rate-limits)

- AI21 Jamba 1.5 Large

- Codestral 25.01

- Cohere Command A

- Cohere Command R 08-2024

- Cohere Command R+ 08-2024

- DeepSeek-R1

- DeepSeek-R1-0528

- DeepSeek-V3-0324

- Grok 3

- Grok 3 Mini

- Llama 4 Maverick 17B 128E Instruct FP8

- Llama 4 Scout 17B 16E Instruct

- Llama-3.2-11B-Vision-Instruct

- Llama-3.2-90B-Vision-Instruct

- Llama-3.3-70B-Instruct

- MAI-DS-R1

- Meta-Llama-3.1-405B-Instruct

- Meta-Llama-3.1-8B-Instruct

- Ministral 3B

- Mistral Medium 3 (25.05)

- Mistral Small 3.1

- OpenAI GPT-4.1

- OpenAI GPT-4.1-mini

- OpenAI GPT-4.1-nano

- OpenAI GPT-4o

- OpenAI GPT-4o mini

- OpenAI Text Embedding 3 (large)

- OpenAI Text Embedding 3 (small)

- OpenAI gpt-5

- OpenAI gpt-5-chat (preview)

- OpenAI gpt-5-mini

- OpenAI gpt-5-nano

- OpenAI o1

- OpenAI o1-mini

- OpenAI o1-preview

- OpenAI o3

- OpenAI o3-mini

- OpenAI o4-mini

- Phi-4

- Phi-4-mini-instruct

- Phi-4-mini-reasoning

- Phi-4-multimodal-instruct

- Phi-4-reasoning

### [Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai)

**Limits:** [10,000 neurons/day](https://developers.cloudflare.com/workers-ai/platform/pricing/#free-allocation)

- @cf/aisingapore/gemma-sea-lion-v4-27b-it

- @cf/ibm-granite/granite-4.0-h-micro

- @cf/openai/gpt-oss-120b

- @cf/openai/gpt-oss-20b

- @cf/qwen/qwen3-30b-a3b-fp8

- DeepSeek R1 Distill Qwen 32B

- Deepseek Coder 6.7B Base (AWQ)

- Deepseek Coder 6.7B Instruct (AWQ)

- Deepseek Math 7B Instruct

- Discolm German 7B v1 (AWQ)

- Falcom 7B Instruct

- Gemma 2B Instruct (LoRA)

- Gemma 3 12B Instruct

- Gemma 7B Instruct

- Gemma 7B Instruct (LoRA)

- Hermes 2 Pro Mistral 7B

- Llama 2 13B Chat (AWQ)

- Llama 2 7B Chat (FP16)

- Llama 2 7B Chat (INT8)

- Llama 2 7B Chat (LoRA)

- Llama 3 8B Instruct

- Llama 3 8B Instruct (AWQ)

- Llama 3.1 8B Instruct (AWQ)

- Llama 3.1 8B Instruct (FP8)

- Llama 3.2 11B Vision Instruct

- Llama 3.2 1B Instruct

- Llama 3.2 3B Instruct

- Llama 3.3 70B Instruct (FP8)

- Llama 4 Scout Instruct

- Llama Guard 3 8B

- Mistral 7B Instruct v0.1

- Mistral 7B Instruct v0.1 (AWQ)

- Mistral 7B Instruct v0.2

- Mistral 7B Instruct v0.2 (LoRA)

- Mistral Small 3.1 24B Instruct

- Neural Chat 7B v3.1 (AWQ)

- OpenChat 3.5 0106

- OpenHermes 2.5 Mistral 7B (AWQ)

- Phi-2

- Qwen 1.5 0.5B Chat

- Qwen 1.5 1.8B Chat

- Qwen 1.5 14B Chat (AWQ)

- Qwen 1.5 7B Chat (AWQ)

- Qwen 2.5 Coder 32B Instruct

- Qwen QwQ 32B

- SQLCoder 7B 2

- Starling LM 7B Beta

- TinyLlama 1.1B Chat v1.0

- Una Cybertron 7B v2 (BF16)

- Zephyr 7B Beta (AWQ)

### [Google Cloud Vertex AI](https://console.cloud.google.com/vertex-ai/model-garden)

Very stringent payment verification for Google Cloud.

Model NameModel Limits

Llama 3.2 90B Vision Instruct30 requests/minute
Free during preview

Llama 3.1 70B Instruct60 requests/minute
Free during preview

Llama 3.1 8B Instruct60 requests/minute
Free during preview

## Providers with trial credits

### [Fireworks](https://fireworks.ai/)

**Credits:** $1

**Models:** [Various open models](https://fireworks.ai/models)

### [Baseten](https://app.baseten.co/)

**Credits:** $30

**Models:** [Any supported model - pay by compute time](https://www.baseten.co/library/)

### [Nebius](https://studio.nebius.com/)

**Credits:** $1

**Models:** [Various open models](https://studio.nebius.ai/models)

### [Novita](https://novita.ai/?ref=ytblmjc&utm_source=affiliate)

**Credits:** $0.5 for 1 year

**Models:** [Various open models](https://novita.ai/models)

### [AI21](https://studio.ai21.com/)

**Credits:** $10 for 3 months

**Models:** Jamba family of models

### [Upstage](https://console.upstage.ai/)

**Credits:** $10 for 3 months

**Models:** Solar Pro/Mini

### [NLP Cloud](https://nlpcloud.com/home)

**Credits:** $15

**Requirements:** Phone number verification

**Models:** Various open models

### [Alibaba Cloud (International) Model Studio](https://bailian.console.alibabacloud.com/)

**Credits:** 1 million tokens/model

**Models:** [Various open and proprietary Qwen models](https://www.alibabacloud.com/en/product/modelstudio)

### [Modal](https://modal.com)

**Credits:** $5/month upon sign up, $30/month with payment method added

**Models:** Any supported model - pay by compute time

### [Inference.net](https://inference.net)

**Credits:** $1, $25 on responding to email survey

**Models:** Various open models

### [Hyperbolic](https://app.hyperbolic.xyz/)

**Credits:** $1

**Models:**

- DeepSeek V3

- DeepSeek V3 0324

- Llama 3.1 405B Base

- Llama 3.1 405B Instruct

- Llama 3.1 70B Instruct

- Llama 3.1 8B Instruct

- Llama 3.2 3B Instruct

- Llama 3.3 70B Instruct

- Pixtral 12B (2409)

- Qwen QwQ 32B

- Qwen2.5 72B Instruct

- Qwen2.5 Coder 32B Instruct

- Qwen2.5 VL 72B Instruct

- Qwen2.5 VL 7B Instruct

- deepseek-ai/deepseek-r1-0528

- openai/gpt-oss-120b

- openai/gpt-oss-120b-turbo

- openai/gpt-oss-20b

- qwen/qwen3-235b-a22b

- qwen/qwen3-235b-a22b-instruct-2507

- qwen/qwen3-coder-480b-a35b-instruct

- qwen/qwen3-next-80b-a3b-instruct

- qwen/qwen3-next-80b-a3b-thinking

### [SambaNova Cloud](https://cloud.sambanova.ai/)

**Credits:** $5 for 3 months

**Models:**

- E5-Mistral-7B-Instruct

- Llama 3.1 8B

- Llama 3.3 70B

- Llama 3.3 70B

- Llama-4-Maverick-17B-128E-Instruct

- Qwen/Qwen3-235B

- Qwen/Qwen3-32B

- Whisper-Large-v3

- deepseek-ai/DeepSeek-R1-0528

- deepseek-ai/DeepSeek-R1-Distill-Llama-70B

- deepseek-ai/DeepSeek-V3-0324

- deepseek-ai/DeepSeek-V3.1

- deepseek-ai/DeepSeek-V3.1-Terminus

- deepseek-ai/DeepSeek-V3.2

- openai/gpt-oss-120b

- tbd

### [Scaleway Generative APIs](https://console.scaleway.com/generative-api/models)

**Credits:** 1,000,000 free tokens

**Models:**

- BGE-Multilingual-Gemma2

- DeepSeek R1 Distill Llama 70B

- Gemma 3 27B Instruct

- Llama 3.1 8B Instruct

- Llama 3.3 70B Instruct

- Mistral Nemo 2407

- Pixtral 12B (2409)

- Whisper Large v3

- devstral-2-123b-instruct-2512

- gpt-oss-120b

- holo2-30b-a3b

- mistral-small-3.2-24b-instruct-2506

- qwen3-235b-a22b-instruct-2507

- qwen3-coder-30b-a3b-instruct

- qwen3-embedding-8b

- voxtral-small-24b-2507

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cheahjs/free-llm-api-resources

Awesome Lists containing this project

README