https://github.com/lynnlangit/llm-determinism-app

Simple React app which illustrates ideas from "Defeating Nondeterminism in LLM Inference" by Thinking Machines
https://github.com/lynnlangit/llm-determinism-app

determinism llm-inference llms

Last synced: 8 months ago
JSON representation

Simple React app which illustrates ideas from "Defeating Nondeterminism in LLM Inference" by Thinking Machines

Host: GitHub
URL: https://github.com/lynnlangit/llm-determinism-app
Owner: lynnlangit
Created: 2025-09-22T18:40:46.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-09-22T19:07:20.000Z (9 months ago)
Last Synced: 2025-10-05T05:40:40.043Z (8 months ago)
Topics: determinism, llm-inference, llms
Language: JavaScript
Homepage: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
Size: 1.72 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# LLM Determinism Explorer

Deterministic Mode

Non-Deterministic Output

Non-Deterministic Mode

An interactive React application that demonstrates and explains nondeterminism in Large Language Model (LLM) inference, based on the research ["Defeating Nondeterminism in LLM Inference"](https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/) by Thinking Machines Lab.

## Features

This single-page application explores why LLMs such as [Google's Gemma](https://deepmind.google/models/gemma/) produce different outputs even at temperature=0, and provides practical solutions for achieving determinism.

### Interactive Demonstrations

- **Overview**: Qwen-235B experiment results showing 80 unique outputs from 1000 runs
- **Float Calculator**: Interactive floating-point non-associativity demonstration
- **Batch Simulation**: How different batch sizes trigger different kernels
- **Atomic Operations**: Race conditions in GPU parallel operations
- **Kernel Solutions**: Code examples for deterministic implementations
- **Live Demo**: Gemma-2B inference simulation with copy-to-clipboard code
- **Performance Analysis**: Throughput comparisons and trade-offs
- **Implementation Guide**: Step-by-step setup instructions

### Key Features

- **Deterministic Mode Toggle**: Affects all simulations throughout the app
- **Dark Gradient Theme**: Professional blue gradient background
- **Interactive Elements**: Calculators, dropdowns, simulations with realistic delays
- **Code Examples**: Copy-to-clipboard functionality for implementation
- **Responsive Design**: Works on desktop and mobile devices

## Quick Start

1. **Clone or download** this repository
2. **Start a local HTTP server** in the project directory:
```bash
python3 -m http.server 8000
```
3. **Open your browser** and go to `http://localhost:8000`
4. **Explore the tabs** to learn about LLM nondeterminism

Note: A local server is required to avoid CORS issues when loading the JSX component file.

## File Structure

```
LLM-determinism-app/
├── index.html # Main HTML file with React setup
├── LLMDeterminismApp.jsx # Complete React application
└── README.md # This file
```

## Browser Requirements

- Modern browser with ES6+ support
- JavaScript enabled
- Network access for React CDN (or download for offline use)

## Usage

### Navigation
- Use the **tab navigation** to explore different aspects of LLM nondeterminism
- Toggle **Deterministic Mode** to see how it affects all simulations
- Click **copy buttons** to get implementation code examples

### Interactive Elements
- **Float Calculator**: Adjust values to see floating-point precision issues
- **Batch Selector**: Change batch sizes to see kernel variations
- **Kernel Dropdown**: Explore different operation types and solutions
- **Demo Runner**: Simulate Gemma-2B inference with different configurations

## Educational Content

### The Problem
- LLMs produce different outputs even at temperature=0
- Research shows 80 unique outputs from 1000 identical runs
- Three root causes: floating-point ops, batch variance, concurrency

### The Solutions
- Environment configuration for deterministic algorithms
- Model loading with appropriate data types
- Generation parameters for consistent results
- Performance trade-offs and when to use deterministic mode

## Technical Implementation

- **Single-file React component** with inline styles
- **No external CSS dependencies**
- **Uses React from CDN** with Babel for JSX transformation
- **Responsive grid layouts**
- **Color-coded status indicators**
- **Realistic simulation delays**
- **Clipboard integration**

## Performance Notes

The app demonstrates that deterministic mode typically results in:
- **30-40% throughput reduction**
- **Higher memory usage** (float32 vs float16)
- **Limited optimization** opportunities

## Research Reference

Based on "Defeating Nondeterminism in LLM Inference" research demonstrating:
- Qwen-235B nondeterminism at temperature=0
- Token 103 divergence patterns
- Practical solutions for reproducible inference

## Development

To modify the application:
1. **Start the development server**: `python3 -m http.server 8000`
2. **Edit `LLMDeterminismApp.jsx`** (uses React hooks, no ES6 imports)
3. **Refresh your browser** at `http://localhost:8000`
4. Changes will be reflected immediately

The component uses React hooks for state management and includes hover effects, animations, and interactive simulations.

### Technical Notes
- The JSX file has been modified to work with Babel standalone in the browser
- React imports have been replaced with global `React.useState` calls
- Component is exported to `window.LLMDeterminismApp` for browser compatibility

## License

This educational demonstration is provided as-is for learning purposes.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lynnlangit/llm-determinism-app

Awesome Lists containing this project

README