https://github.com/muhtasham/char-prefix-conditioning
A minimal, efficient implementation of character prefix conditioning for code completion.
https://github.com/muhtasham/char-prefix-conditioning
code-generation
Last synced: 11 months ago
JSON representation
A minimal, efficient implementation of character prefix conditioning for code completion.
- Host: GitHub
- URL: https://github.com/muhtasham/char-prefix-conditioning
- Owner: Muhtasham
- Created: 2025-07-17T23:18:09.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-18T19:48:53.000Z (11 months ago)
- Last Synced: 2025-07-20T09:01:27.957Z (11 months ago)
- Topics: code-generation
- Language: Python
- Homepage: https://www.cursor.so/blog/cpc
- Size: 35.2 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
Awesome Lists containing this project
README
# Character Prefix Conditioning
A minimal, efficient implementation of character prefix conditioning (CPC) for code completion, inspired by the [Cursor blog](https://cursor.com/blog/cpc).
## Overview
When using a language model for code completion, we typically want the model to produce a completion that begins with what the user has typed. However, modern language models operate on sequences of tokens, not characters, so naively tokenizing the user's input and sending it to the model produces wrong results if the user's cursor doesn't happen to lie on a token boundary.
**CPC** is an algorithm for sampling a sequence of tokens conditioned on a character prefix, ensuring completions always start with the user's typed prefix—even if it doesn't align with token boundaries.
## Mathematical Foundation
### Problem Statement
We want to sample a sequence of tokens $s = t_1, t_2, \ldots, t_n$ from a distribution specified by an autoregressive model $p(s)$ given by:
$$p(s) = p(t_1, t_2, \ldots, t_n) = \prod_{k=1}^{n} p(t_k \mid t_1, \ldots, t_{k-1})$$
subject to the constraint that $s$ starts with a character prefix $P$, i.e., $P$ is a prefix of $\text{repr}(t_1) + \text{repr}(t_2) + \cdots + \text{repr}(t_n)$, where $+$ means string concatenation and $\text{repr}$ maps a token to the characters it represents.
We define $q(s) = p(s \mid s \text{ starts with } P)$. It's sufficient to find a way to sample autoregressively from $q(s)$, that is, to sample from $q(t_k \mid t_1, \ldots, t_{k-1})$ for each $k$.
### Algorithm
For each step $k$, we need to sample from $q(t_k \mid t_1, \ldots, t_{k-1})$. Here's the efficient algorithm:
1. **Get model predictions**: Compute $p(t_k \mid t_1, \ldots, t_{k-1})$ from the language model for all possible tokens $t_k$
2. **Apply constraint mask**: For each token $t_k$, check if appending it to the current sequence would satisfy the character prefix constraint $P$. Create a binary mask $M(t_k)$ where:
- $M(t_k) = 1$ if $\text{repr}(t_1) + \cdots + \text{repr}(t_{k-1}) + \text{repr}(t_k)$ starts with $P$
- $M(t_k) = 0$ otherwise
3. **Renormalize probabilities**: Compute the constrained distribution:
$$q(t_k) = \frac{p(t_k) \cdot M(t_k)}{\sum_{t'} p(t') \cdot M(t')}$$
4. **Sample from constrained distribution**: Sample $t_k \sim q(t_k)$
5. **Terminate when constraint is satisfied**: Stop when the generated sequence starts with the prefix $P$
### Key Insights
- **Efficiency**: The algorithm requires only one forward pass through the language model per generated token, minimizing model calls.
- **Vectorization**: The constraint checking (for all possible next tokens) is vectorized across the vocabulary, making it efficient despite being O(|V|) per step.
- **Early termination**: Generation can stop once the constraint is satisfied, then continue normally.
- **Fallback strategies**: For edge cases where no valid tokens have sufficient probability, the algorithm can fall back to the most probable valid token or even violate the constraint with a retry mechanism.
### Complexity Analysis
- **Per step**: O(|V|) constraint checking (vectorized), **1 model call**
- **Total**: O(n · |V|) constraint checks and O(n) model calls for generating n tokens
- **Memory**: O(|V|) for storing token representations and masks
- **Optimizations**: KV caching reduces repeated computations, early termination reduces total steps
## Setup
**Install dependencies**:
```sh
uv sync
```
**Run the main script**:
```sh
uv run main.py
```
## Usage
```python
from main import ModelManager, character_prefix_sample
# Initialize and load model
model_manager = ModelManager("gpt2")
model_manager.load_model()
# Generate with character prefix constraint
result = character_prefix_sample(
model_manager=model_manager,
prompt_text="import",
character_prefix="import num",
max_new_tokens=15
)
print(result) # Output: "import numpy as np"
```
## Examples
The implementation includes comprehensive test cases demonstrating various scenarios:
- **Simple prefix matching**: `"import"` → `"import num"` → `"import numpy as np"`
- **Mid-token completion**: `"The model's behav"` → `"The model's behavi"` → `"The model's behavior"`
- **F-string completion**: `'print(f"The result is {re'` → `'print(f"The result is {res'` → `'print(f"The result is {result}"'`
- **Empty prompt generation**: `""` → `"Once upon a ti"` → `"Once upon a time"`
- **JSON completion**: `'{"data": {"user'` → `'{"data": {"username": "test'` → `'{"data": {"username": "test"}}'`