https://github.com/piyook/llm-mock
A local mock LLM server for early front end development work
https://github.com/piyook/llm-mock
chatgpt gemini langchain llm local-development-environment mock-server
Last synced: 23 days ago
JSON representation
A local mock LLM server for early front end development work
- Host: GitHub
- URL: https://github.com/piyook/llm-mock
- Owner: piyook
- License: mit
- Created: 2024-05-22T12:38:28.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-12-30T11:34:18.000Z (5 months ago)
- Last Synced: 2026-01-01T18:32:41.777Z (5 months ago)
- Topics: chatgpt, gemini, langchain, llm, local-development-environment, mock-server
- Language: TypeScript
- Homepage:
- Size: 841 KB
- Stars: 4
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLMock — Local Mock LLM API
[](https://github.com/piyook/llm-mock/releases)
[](https://github.com/piyook/llm-mock/actions/workflows/tests.yaml)
[](https://opensource.org/licenses/MIT)
[](https://www.npmjs.com/package/llmock)
A lightweight local server that simulates LLM APIs for development and testing. Build and test AI-powered applications without API costs or an internet connection.
---
## Table of Contents
- [Why LLMock?](#why-llmock)
- [Quick Start](#quick-start)
- [Installation Options](#installation-options)
- [Configuration](#configuration)
- [Features](#features)
- [Integration Guide](#integration-guide)
- [Supporting Different LLM Providers](#supporting-different-llm-providers)
- [Docker Support](#docker-support)
- [Troubleshooting](#troubleshooting)
- [License](#license)
---
## Why LLMock?
- **Free and fast** — no API costs, instant responses for rapid prototyping
- **Consistent testing** — predictable, repeatable responses for testing UI logic
- **Offline capable** — works without internet connectivity
- **Full visibility** — complete request logging and a live dashboard
- **Realistic simulation** — configurable delays, SSE streaming, and mock embeddings
- **OpenAI-compatible** — works with ChatGPT, Grok, Llama, DeepSeek, Gemini, and any OpenAI-style API
Built on [Fastify](https://www.fastify.io/) for high performance and reliability.
---
## Quick Start
**Prerequisites:** Node.js 20+
The fastest way to get started is with the scaffolding tool, which creates a complete project with configuration files and example templates:
```bash
npm create llmock@latest my-project
cd my-project
npm install
npm run llmock:start
```
Open `http://localhost:8001` to see the live dashboard.
That's it. To run against a specific model preset:
```bash
npm run llmock:chatgpt # OpenAI ChatGPT-style (default)
npm run llmock:gemini # Google Gemini format
npm run llmock:streaming # OpenAI-style with SSE streaming
npm run llmock:embeddings # Optimised for embeddings/RAG testing
```
### Scaffolded project layout
```
my-project/
├── package.json
├── .llmockrc.json
├── README.md
├── requests/
│ ├── openai-chat.json
│ └── gemini-chat.json
└── responses/
├── openai-chat-response.json
└── gemini-chat-response.json
```
---
## Installation Options
### Option 1: Scaffolded project (recommended)
```bash
npm create llmock@latest my-project
```
Generates a ready-to-use project with configuration, templates, and Docker support.
### Option 2: Add to an existing project
```bash
npm install llmock
# or globally
npm install -g llmock
```
Then use the CLI directly:
```bash
llmock start # default: ChatGPT model, port 8001
llmock start --model=gemini
llmock start --port=3000 --stream=true
llmock stop
llmock config # show current settings
llmock help
```
All CLI flags support both `--key=value` and `--key value` formats and override `.llmockrc.json` at runtime.
### Foreground Mode
For Docker containers or when you want the server to stay attached to your terminal:
```bash
llmock start --foreground
```
The `--foreground` flag keeps the server process attached and forwards all output to your console. This is essential for Docker containers and useful for debugging. Without this flag, the server runs as a detached background process.
**Windows users:** In normal mode, the server may create a minimized terminal window. This is expected behavior for background processes on Windows. Use `llmock start --foreground` if you want to keep the server visible in your terminal.
### Option 3: Docker
```bash
npm create llmock@latest my-project
cd my-project
npm run docker:start
```
See [Docker Support](#docker-support) for full details.
---
## Configuration
### Configuration file (`.llmockrc.json`)
All settings live in `.llmockrc.json` in your project root. CLI flags always override these values.
```json
{
"defaultModel": "chatgpt",
"models": {
"chatgpt": {
"name": "openai",
"model": "gpt-4o",
"endpoint": "chatgpt/chat/completions",
"responseType": "lorem",
"maxLoremParas": 8,
"validateRequests": true,
"logRequests": true,
"debug": false,
"stream": false,
"responseDelay": {
"min": 3000,
"max": 5000
},
"embeddings": {
"enabled": true,
"dimensions": 128
}
}
},
"server": {
"port": 8001,
"host": "0.0.0.0"
}
}
```
**Configuration reference:**
| Option | Description |
|---|---|
| `name` | LLM provider name (used for template loading) |
| `model` | Model identifier (e.g. `gpt-4o`, `gemini-pro`) |
| `endpoint` | API endpoint path |
| `responseType` | `"lorem"` (random text) or `"stored"` (predefined responses) |
| `maxLoremParas` | Max sentences in lorem ipsum responses |
| `validateRequests` | Validate incoming requests against templates |
| `logRequests` | Save requests to the log file |
| `debug` | Enable verbose console logging |
| `stream` | Return SSE streaming responses |
| `responseDelay.min/max` | Response delay range in milliseconds |
| `embeddings.enabled` | Enable the `/v1/embeddings` endpoint |
| `embeddings.dimensions` | Embedding vector size |
### Adding custom models
Extend the `models` object with any additional preset, then start with `--model=`:
```json
{
"models": {
"my-model": {
"name": "openai",
"model": "gpt-3.5-turbo",
"endpoint": "api/v1/chat/completions",
"responseType": "stored",
"validateRequests": true,
"logRequests": false,
"debug": true,
"stream": false,
"responseDelay": { "min": 1000, "max": 2000 },
"embeddings": { "enabled": false, "dimensions": 64 }
}
}
}
```
```bash
llmock start --model=my-model
```
### Response types
**Lorem ipsum** — generates random placeholder text, good for testing variable-length content in the UI:
```json
{ "responseType": "lorem", "maxLoremParas": 8 }
```
**Stored responses** — returns predefined answers from `src/data/data.json`, useful for domain-specific or reproducible testing. The server randomly selects from this list on each request:
```json
{
"responseType": "stored"
}
```
```json
{
"responses": [
"This is a custom response for testing.",
"Another predefined response for consistency."
]
}
```
### Streaming responses
Enable OpenAI-style Server-Sent Events (SSE) streaming in your config or via CLI:
```json
{ "stream": true }
```
```bash
llmock start --stream=true
```
When enabled, the endpoint returns a chunked SSE stream. The first few chunks arrive immediately (mimicking real LLM behaviour), with subsequent chunks following the configured delay.
```
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-124","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]
```
### Response delay simulation
Simulate realistic API latency to test loading states, timeout handling, and UX:
```json
{
"responseDelay": { "min": 800, "max": 2500 }
}
```
Set both values to `0` for instant responses. The server picks a random value in the range for each request.
| Profile | min | max |
|---|---|---|
| Instant (development) | 0 | 0 |
| Fast | 100 | 300 |
| Realistic production | 800 | 2500 |
| Slow / timeout testing | 3000 | 8000 |
| Fixed delay | 1000 | 1000 |
### Custom API paths
Set the endpoint to match any provider's path structure:
```json
{ "endpoint": "chatgpt/chat/completions" }
// → http://localhost:8001/chatgpt/chat/completions
{ "endpoint": "models/gemini-pro:generateContent" }
// → http://localhost:8001/models/gemini-pro:generateContent
```
### Environment variables
For CI/CD pipelines, set these to switch between mock and production:
```bash
TEST_MODE=true
TEST_BASE_URL=http://localhost:8001/chatgpt
TEST_EMBEDDING_URL=http://localhost:8001/v1/embeddings
```
Setting `TEST_MODE=false` switches back to real LLM services.
---
## Features
### Dashboard
Once running, open `http://localhost:8001` for the live dashboard:

| URL | Purpose |
|---|---|
| `http://localhost:8001` | Main dashboard |
| `http://localhost:8001/logs` | Request log history |
| `http://localhost:8001/ping` | Health check |
The dashboard shows server status, current configuration, available endpoints, and recent request logs. It refreshes automatically every 2 seconds.
### Available endpoints
| Endpoint | Description |
|---|---|
| Configurable (default: `/chatgpt/chat/completions`) | Chat completions |
| `/v1/embeddings` | OpenAI-compatible mock embeddings |
### Request validation
Validate incoming requests against templates to confirm API compatibility:
1. Add a template to the `request-templates/` folder
2. Enable validation: `"validateRequests": true`
Invalid requests return a detailed error describing the mismatch.
### Request logging
Enable with `"logRequests": true` and view at `http://localhost:8001/logs`, or find the log file at:
| Platform | Log location |
|---|---|
| Windows | `C:\Users\{name}\AppData\Local\llmock-nodejs\Log\` |
| macOS | `~/Library/Logs/llmock-nodejs/` |
| Linux | `~/.local/share/llmock-nodejs/log/` |
### Debug mode
Enable verbose console output to see incoming request details, validation results, response generation steps, and timing:
```bash
llmock start --debug=true
```
---
## Integration Guide
### Chat completions
#### Standard (non-streaming)
```bash
curl http://localhost:8001/chatgpt/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 1,
"stream": false
}'
```
```json
{
"id": "chatcmpl-6sf37lXn5paUcuf8UaurpMIKRMsTe",
"object": "chat.completion",
"created": 1678485525,
"model": "gpt-3.5-turbo-0301",
"choices": [{"message": {"role": "assistant", "content": "Generated response"}}]
}
```
#### Streaming
Enable `stream: true` in your config, then use the same endpoint:
```bash
curl -N http://localhost:8001/chatgpt/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'
```
### Embeddings API
The mock server provides an OpenAI-compatible embeddings endpoint at `/v1/embeddings`:
```bash
curl http://localhost:8001/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "Your text string goes here"
}'
```
Pass an array of strings for multiple embeddings in one call:
```bash
-d '{"model": "text-embedding-ada-002", "input": ["First text", "Second text"]}'
```
**Response format:**
```json
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.1234, -0.5678, 0.9012]
}
],
"model": "text-embedding-3-small",
"usage": { "prompt_tokens": 6, "total_tokens": 6 }
}
```
Key characteristics of mock embeddings: deterministic (same input always returns the same vector), configurable dimensions, model-name-sensitive, and OpenAI-compatible in shape. Note that vectors are pseudo-random — they have the correct shape for testing but are not real semantic embeddings.
### Using with LangChain
Point your `ChatOpenAI` client at the mock server when `TEST_MODE` is enabled:
```javascript
import { ChatOpenAI } from '@langchain/openai';
const chatModel = new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'gpt-3.5-turbo',
configuration:
process.env.TEST_MODE === 'true'
? { baseURL: process.env.TEST_BASE_URL } // http://localhost:8001/chatgpt
: {},
});
```
For embeddings, use LangChain's built-in fake embeddings or call the mock endpoint directly:
```javascript
class MockEmbeddingsAPI {
async embedDocuments(texts) {
return Promise.all(texts.map(text => this.embedQuery(text)));
}
async embedQuery(text) {
const response = await fetch(process.env.TEST_EMBEDDING_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ input: text, model: 'text-embedding-ada-002' }),
});
const data = await response.json();
return data.data[0].embedding;
}
}
const embeddings =
process.env.TEST_MODE === 'true'
? new MockEmbeddingsAPI()
: new OpenAIEmbeddings({ openAIApiKey: process.env.OPENAI_API_KEY });
```
---
## Supporting Different LLM Providers
LLMock supports any provider that uses the OpenAI chat completion format: ChatGPT, Grok, Llama, DeepSeek, Mistral, Claude, Gemini, and more. For providers with different request/response shapes, create custom templates.
### Template locations
The framework checks two locations, in priority order:
1. `./request-templates/` and `./response-templates/` in your project root
2. `src/request-templates/` and `src/response-templates/` in the package source
Project-level templates take priority, so you can add custom templates without modifying the package.
### Creating a custom provider template
**Step 1 — Request template** (`request-templates/_req.json`):
```json
{
"model": "string",
"messages": [
{ "role": "string", "content": "string" }
]
}
```
**Step 2 — Response template** (`response-templates/_res.json`):
Use `DYNAMIC_CONTENT_HERE` as the placeholder for generated content:
```json
{
"id": "chatcmpl-123",
"object": "chat.completion",
"choices": [
{
"message": {
"role": "assistant",
"content": "DYNAMIC_CONTENT_HERE"
},
"finish_reason": "stop"
}
]
}
```
**Step 3 — Model preset** (`.llmockrc.json`):
The `name` field must match your template filename prefix:
```json
{
"models": {
"mymodel": {
"name": "mymodel",
"model": "my-custom-model-v1",
"endpoint": "api/v1/chat/completions",
"responseType": "lorem",
"maxLoremParas": 8,
"validateRequests": true,
"stream": false,
"responseDelay": { "min": 1000, "max": 2000 },
"embeddings": { "enabled": true, "dimensions": 128 }
}
}
}
```
**Step 4 — Start and test:**
```bash
llmock start --model=mymodel
curl http://localhost:8001/api/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "my-custom-model-v1", "messages": [{"role": "user", "content": "Hello"}]}'
```
---
## Docker Support
Docker is included when you use the scaffolding tool and is useful for CI/CD pipelines and consistent team environments.
### Available scripts
| Script | Description |
|---|---|
| `npm run docker:start` | Start the container in detached mode |
| `npm run docker:stop` | Stop the container and remove volumes |
| `npm run docker:rebuild` | Rebuild and restart the container |
| `npm run docker:restart` | Stop and start the container |
### Configuration
The Docker container uses the same `.llmockrc.json` as the local setup, mounted as a read-only volume. Update settings and restart to apply changes — no rebuild required:
```bash
vim .llmockrc.json
npm run docker:restart
```
### How Docker Works
The Docker container uses the `--foreground` flag to keep the LLMock server process attached. This prevents the container from restarting continuously, which would happen if the server ran as a detached background process. The container includes:
- **Dockerfile**: Multi-stage Node.js build with security best practices
- **docker-compose.yml**: Port 8001 exposed, config file mounted, health checks
- **docker-start script**: Runs `llmock start --foreground` to keep the server attached
### Manual Docker commands
```bash
docker compose up -d --force-recreate # build and start
docker compose logs -f # view logs
docker compose down --volumes # stop and clean up
docker compose down --volumes && docker compose up -d --force-recreate --build # rebuild
```
> **Note:** LLMock is intended for local development and testing only.
---
## Troubleshooting
**Server not responding**
Confirm the server is running and the port matches `.llmockrc.json`. Open `http://localhost:8001` — if it's unreachable, the server may not have started.

**Port already in use**
Change the port in `.llmockrc.json` or pass it as a flag:
```bash
llmock start --port=8002
```
**Request validation failures**
- Confirm your request template matches the provider's API format
- Check the request shape at `http://localhost:8001/logs`
- Verify the `name` field in your model config matches the template filename prefix
**Response delays not applied**
Ensure both `responseDelay.min` and `responseDelay.max` are set and greater than `0`, then restart the server.
---
## License
MIT — see [LICENSE](LICENSE) for details.