https://github.com/ruivieira/white-rabbit
vLLM Emulator
https://github.com/ruivieira/white-rabbit
deno emulator openai typescript vllm
Last synced: about 1 month ago
JSON representation
vLLM Emulator
- Host: GitHub
- URL: https://github.com/ruivieira/white-rabbit
- Owner: ruivieira
- License: apache-2.0
- Created: 2025-08-08T22:35:15.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-08-16T23:44:03.000Z (about 2 months ago)
- Last Synced: 2025-08-17T01:12:33.554Z (about 2 months ago)
- Topics: deno, emulator, openai, typescript, vllm
- Language: TypeScript
- Homepage:
- Size: 342 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# white-rabbit
[](https://github.com/ruivieira/white-rabbit/actions/workflows/ci.yml)
[](https://jsr.io/@rui/white-rabbit)
[](https://jsr.io/@rui/white-rabbit)
[](https://quay.io/repository/ruimvieira/white-rabbit)
[](https://results.pre-commit.ci/latest/github/ruivieira/white-rabbit/main)
![]()
Deno vLLM emulator providing mock OpenAI-compatible API endpoints for testing and development.
## Purpose
White Rabbit is designed to **test integration with vLLM APIs** without requiring a real LLM
deployment. The responses are typically gibberish since no actual language model is served - this is
intentional for testing API compatibility, request/response formats, and integration workflows.Perfect for:
- Testing vLLM API integration code
- Development environments where you need vLLM-compatible endpoints
- CI/CD pipelines that need mock LLM services
- Load testing API clients without GPU resources## Installation
### From JSR
```typescript
import { genParagraph } from "jsr:@rui/white-rabbit";
import type { ChatCompletionsRequest, EmbeddingRequest } from "jsr:@rui/white-rabbit/api";// Generate mock text
const mockText = genParagraph(5);
```### Using specific modules
```typescript
// Import API types
import type {
ChatCompletionsRequest,
CompletionsRequest,
EmbeddingRequest,
} from "jsr:@rui/white-rabbit/api";// Import text generation utilities
import { genParagraph } from "jsr:@rui/white-rabbit/text-generation";
```## Run locally
```bash
cd /home/rui/Sync/code/typescript/white-rabbit
# Deno 1.41+ recommended
deno task dev
# or
deno task start# Run with custom model name
WR_MODEL="my-custom-model" deno task start
```## Configuration
### Environment Variables
**Model Configuration:**
- `WR_MODEL` - Override the model name returned in API responses. If not set, defaults to
`Qwen/Qwen2.5-1.5B-Instruct`.
- `WR_HOST` - Set the host address to bind the server to. If not set, defaults to `localhost`.
- `WR_PORT` - Set the port number for the server to listen on. If not set, defaults to `8000`.**Logging Configuration:**
- `WR_LOG_LEVEL` - Set logging level: `DEBUG`, `INFO`, `WARNING`, or `ERROR` (default: `INFO`)
- `WR_LOG_PREFIX` - Customise log message prefix (default: `🐰`)
- `WR_LOG_COLORS` - Enable/disable coloured log output: `true` or `false` (default: `true`)**Log Levels:**
- `DEBUG` - Includes detailed HTTP request logging with headers and body payloads
- `INFO` - Standard logging without detailed request information
- `WARNING` - Only warnings and errors
- `ERROR` - Only error messages**Examples:**
```bash
# Set model name to "granite-3.1-8b"
export WR_MODEL="granite-3.1-8b"
deno task start# Or inline
WR_MODEL="granite-3.1-8b" deno task start# Configure host and port
export WR_HOST="0.0.0.0"
export WR_PORT="8080"
deno task start# Or inline
WR_HOST="0.0.0.0" WR_PORT="8080" deno task start# Configure logging
WR_LOG_LEVEL=DEBUG WR_LOG_PREFIX="MY_SERVER" deno task start# Disable coloured logs (useful for log files)
WR_LOG_COLORS=false deno task start
```## Direct File Dataset Configuration
You can configure White Rabbit to use a custom dataset by setting the following environment
variables:- `WR_HF_DATASET`: A direct URL to a CSV or text file (e.g.,
`https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv`)
- `WR_HF_COLUMN`: The column name within the dataset to use for text generation**Important**: For Hugging Face datasets, use the `/resolve/` endpoint instead of `/blob/` to get
the raw file content:- ❌ `https://huggingface.co/datasets/name/repo/blob/main/file.csv` (HTML page)
- ✅ `https://huggingface.co/datasets/name/repo/resolve/main/file.csv` (raw file)### Examples
**Using direct file URL (Recommended):**
```bash
export WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv"
export WR_HF_COLUMN="text"
deno task start
```**Using Docker with direct file URL:**
```bash
docker run -p 8000:8000 \
-e WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv" \
-e WR_HF_COLUMN="text" \
white-rabbit:latest
```**Using Docker with custom host and port:**
```bash
docker run -p 8080:8080 \
-e WR_HOST="0.0.0.0" \
-e WR_PORT="8080" \
white-rabbit:latest
```### Example - Toxigen Dataset
The toxigen dataset contains the following columns:
- `text`: The input text prompt (use this for text generation)
- `generation`: Generated text response
- `generation_method`: Method used for generation
- `group`: Group classification
- `prompt_label`: Label for the prompt
- `roberta_prediction`: RoBERTa model predictionFor text generation, use `WR_HF_COLUMN="text"`.
## Supported Endpoints
### Text Generation
- `POST /generate` - Generate text using Markov chains
- `GET /health` - Health check endpoint## Using Custom Hugging Face Datasets
White Rabbit supports loading custom datasets directly from Hugging Face using the `/resolve/`
endpoint. This allows you to train the Markov chain on any CSV dataset hosted on Hugging Face.### How It Works
1. **Dataset Source**: The system fetches CSV files directly from Hugging Face using the `/resolve/`
endpoint
2. **Column Selection**: You specify which column contains the text data for training
3. **Automatic Parsing**: The system automatically parses the CSV and extracts the specified column
4. **Markov Training**: The extracted text is used to train the Markov chain for text generation
5. **Lazy Loading**: The dataset is loaded only when first needed, then cached in memory for
subsequent requests**Performance Note**: The first text generation request may experience a delay while the dataset
downloads and processes. However, once loaded, the dataset is cached in memory, so all subsequent
inference requests will be fast with no additional delays.### Example: Toxigen Dataset for Toxic Model Detection
The [Toxigen dataset](https://huggingface.co/datasets/toxigen/toxigen-data) is particularly useful
for testing and evaluating toxic content detection models. This dataset contains:- **Purpose**: Designed to test how well language models can detect and avoid generating toxic
content
- **Content**: Contains prompts that are designed to elicit toxic responses from language models
- **Use Case**: Perfect for testing whether your text generation system can avoid producing harmful
content#### Setting Up Toxigen Dataset
```bash
# Set the dataset URL (use /resolve/ for raw file access)
export WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv"# Specify the column containing the text prompts
export WR_HF_COLUMN="text"# Start the server
deno task start
```#### Dataset Structure
The toxigen dataset contains these columns:
- `text`: The input text prompt (use this for text generation)
- `generation`: Generated text response
- `generation_method`: Method used for generation
- `group`: Group classification
- `prompt_label`: Label for the prompt
- `roberta_prediction`: RoBERTa model prediction#### Testing Toxic Content Detection
With the toxigen dataset loaded, you can:
1. **Generate Text**: Use the `/generate` endpoint to create text based on the dataset
2. **Evaluate Safety**: Check if the generated text maintains appropriate content standards
3. **Model Testing**: Test how well your system handles potentially problematic prompts
4. **Content Filtering**: Implement additional safety measures based on the generated content#### Example API Call
```bash
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Write a story about",
"max_tokens": 100
}'
```### Other Dataset Examples
You can use any CSV dataset hosted on Hugging Face. Here are some other popular options:
- **Creative Writing**:
`https://huggingface.co/datasets/writing-prompts/resolve/main/writing-prompts.csv`
- **Conversation Data**:
`https://huggingface.co/datasets/conversation-ai/resolve/main/conversation.csv`
- **Custom Datasets**: Upload your own CSV files to Hugging Face and use the `/resolve/` endpoint### Best Practices
1. **Use `/resolve/` endpoint**: Always use `/resolve/` instead of `/blob/` for raw file access
2. **Column validation**: Ensure the specified column exists and contains appropriate text data
3. **Content review**: Review generated content, especially when using datasets with sensitive
content
4. **Testing**: Test your system thoroughly before deploying with custom datasets### Chat Completions
- `POST /v1/chat/completions` - Generate chat completions
### Completions (Legacy)
- `POST /v1/completions` - Generate text completions
### Embeddings
- `POST /v1/embeddings` - Generate text embeddings
### Models
- `GET /v1/models` - List available models
### Tokenization
- `POST /tokenize` - Tokenise text into token IDs
- `POST /detokenize` - Convert token IDs back to text### Server Information
- `GET /version` - Return vLLM version information
- `GET /stats` - Return server statistics and metrics## Usage Examples
### Chat Completions
```bash
curl --request POST \
--url http://localhost:8000/v1/chat/completions \
--header 'Content-Type: application/json' \
--data '{
"model": "test-model",
"messages": [
{
"role": "user",
"content": "What is the opposite of down?"
}
],
"temperature": 0,
"logprobs": true,
"max_tokens": 500
}'
```### Text Completions
```bash
curl --request POST \
--url http://localhost:8000/v1/completions \
--header 'Content-Type: application/json' \
--data '{
"model": "test-model",
"prompt": "Once upon a time",
"max_tokens": 100,
"n": 1
}'
```### Embeddings
```bash
curl --request POST \
--url http://localhost:8000/v1/embeddings \
--header 'Content-Type: application/json' \
--data '{
"model": "test-embedding-model",
"input": "The quick brown fox jumps over the lazy dog",
"encoding_format": "float"
}'
```### Multiple Text Embeddings
```bash
curl --request POST \
--url http://localhost:8000/v1/embeddings \
--header 'Content-Type: application/json' \
--data '{
"model": "test-embedding-model",
"input": [
"First text to embed",
"Second text to embed",
"Third text to embed"
],
"encoding_format": "float",
"dimensions": 768
}'
```### Models
```bash
curl --request GET \
--url http://localhost:8000/v1/models
```### Tokenization
```bash
curl --request POST \
--url http://localhost:8000/tokenize \
--header 'Content-Type: application/json' \
--data '{
"model": "test-model",
"text": "Hello, world!",
"add_special_tokens": true
}'
```### Detokenization
```bash
curl --request POST \
--url http://localhost:8000/detokenize \
--header 'Content-Type: application/json' \
--data '{
"model": "test-model",
"tokens": [1, 15496, 11, 1917, 0, 2]
}'
```### Version Information
```bash
# Basic version info
curl --request GET \
--url http://localhost:8000/version# Detailed version info with build details
curl --request GET \
--url http://localhost:8000/version?details=true
```### Server Statistics
```bash
curl --request GET \
--url http://localhost:8000/stats
```## Features
### Core Functionality
- **Mock Data Generation**: Generates realistic-looking mock responses with random text and
embeddings
- **Markov completions**: Uses a small QA dataset and a weighted Markov chain to produce more
topic-relevant answers for `/v1/completions` and `/v1/chat/completions`. Supports custom Hugging
Face datasets with configurable column extraction and format handling.
- **OpenAI API Compatibility**: Follows OpenAI API specifications for request/response formats
- **Multiple Input Support**: Supports single strings, arrays of strings, and token ID arrays
- **Configurable Parameters**: Supports parameters like `max_tokens`, `n`, `logprobs`, `dimensions`,
etc.
- **Normalised Embeddings**: Generated embeddings are unit vectors (normalised to length 1)
- **Token Usage Tracking**: Returns realistic token usage statistics
- **Model Management**: Lists available models with metadata
- **Tokenization Support**: Mock tokenization and detokenization with consistent token IDs
- **Server Monitoring**: Provides version information and real-time server statistics### Logging and Monitoring
- **vLLM-Compatible Logging**: Professional logging system that matches vLLM's output format
- **Periodic Statistics**: Automatic throughput reporting every 10 seconds (like vLLM)
- Prompt tokens per second
- Generation tokens per second
- Running and total request counts
- Server uptime tracking
- **Request Tracing**: Debug-level logging of all incoming requests and processing steps
- **HTTP Request Logging**: When `LOG_LEVEL=DEBUG`, logs all HTTP requests with method, path,
headers, and body payloads
- **Configurable Log Levels**: DEBUG, INFO, WARNING, ERROR with environment variable control
- **Coloured Output**: Colour-coded log messages for easy reading (configurable)
- **Graceful Shutdown**: Proper signal handling and resource cleanup### HTTP Request Logging
When `WR_LOG_LEVEL=DEBUG` is set, White Rabbit provides comprehensive HTTP request logging that
includes:- **Request Method**: HTTP method (GET, POST, etc.)
- **Request Path**: Full URL path
- **Request Headers**: All HTTP headers with values
- **Request Body**: Complete request body payload for POST requests
- **Structured Format**: Clear markers to identify request log boundariesThis is particularly useful for:
- **Debugging**: Troubleshooting API integration issues
- **Development**: Understanding exactly what clients are sending
- **Testing**: Verifying request payloads during development
- **Monitoring**: Tracking API usage patterns**Example DEBUG level output:**
```
🐰:server DEBUG 08-13 17:45:49 [server.ts:386] === HTTP Request Log ===
🐰:server DEBUG 08-13 17:45:49 [server.ts:387] Method: POST
🐰:server DEBUG 08-13 17:45:49 [server.ts:388] Path: /v1/chat/completions
🐰:server DEBUG 08-13 17:45:49 [server.ts:389] Headers: {
"content-type": "application/json",
"user-agent": "curl/7.68.0"
}
🐰:server DEBUG 08-13 17:45:49 [server.ts:397] Body: {"model":"test","messages":[{"role":"user","content":"Hello"}]}
🐰:server DEBUG 08-13 17:45:49 [server.ts:404] === End Request Log ===
```**Note**: Request body logging is only performed for POST requests. GET requests will log method,
path, and headers but not body content.Any string is accepted for the `model` argument across all endpoints. However, the actual model name
returned in responses is determined by the `WR_MODEL` environment variable (or the default
`Qwen/Qwen2.5-1.5B-Instruct` if not set), regardless of what the client requests.## Docker
### Build and Run
```bash
# Build the Docker image
docker build -t white-rabbit .# Run the container
docker run -p 8000:8000 white-rabbit# Run with custom port
docker run -p 9000:8000 white-rabbit# Run with custom model name
docker run -p 8000:8000 -e WR_MODEL="granite-3.1-8b" white-rabbit# Run with custom host and port
docker run -p 8080:8080 \
-e WR_HOST="0.0.0.0" \
-e WR_PORT="8080" \
white-rabbit# Run with direct file dataset
docker run -p 8000:8000 \
-e WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv" \
-e WR_HF_COLUMN="prompt" \
white-rabbit
```### Docker Features
- **Multi-stage build**: Uses UBI9 as builder base for security and compliance
- **Compiled binary**: Compiles Deno application to a single executable binary
- **Minimal runtime**: Final image uses UBI9 minimal for reduced attack surface
- **Non-root user**: Runs as dedicated `whiterabbit` user for security
- **Health check**: Built-in health check endpoint monitoring
- **Optimised layers**: Efficient Docker layer caching for faster rebuilds