Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wellecks/llmstep
llmstep: [L]LM proofstep suggestions in Lean 4.
https://github.com/wellecks/llmstep
lean lean4 llm theorem-proving
Last synced: 3 months ago
JSON representation
llmstep: [L]LM proofstep suggestions in Lean 4.
- Host: GitHub
- URL: https://github.com/wellecks/llmstep
- Owner: wellecks
- License: mit
- Created: 2023-07-08T23:50:23.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-11-11T01:30:11.000Z (about 1 year ago)
- Last Synced: 2024-08-10T14:14:45.641Z (6 months ago)
- Topics: lean, lean4, llm, theorem-proving
- Language: Python
- Homepage:
- Size: 1.56 MB
- Stars: 104
- Watchers: 3
- Forks: 14
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `llmstep`: [L]LM proofstep suggestions in Lean
*News*
- [11.2023] Experimental [*Llemma*](https://arxiv.org/abs/2310.10631) suggestions that leverage file context
- [10.2023] New paper describing version 1.0.0 of `llmstep`: [[paper](https://arxiv.org/abs/2310.18457)]
- [10.2023] Support for [Reprover](#reprover)
- [9.2023] Support for free GPU servers via [Google Colab](#google-colab)---
`llmstep` is a Lean 4 tactic for suggesting proof steps using a language model:
Calling `llmstep "prefix"` gives suggestions that start with `prefix`:
```lean
example (f : ℕ → ℕ) : Monotone f → ∀ n, f n ≤ f (n + 1) := by
intro h n
llmstep "exact"==> Lean Infoview
Try This:
* exact h (Nat.le_succ _)
* exact h (Nat.le_succ n)
* exact h (Nat.le_add_right _ _)
```Clicking a suggestion places it in the proof:
```lean
example (f : ℕ → ℕ) : Monotone f → ∀ n, f n ≤ f (n + 1) := by
intro h n
exact h (Nat.le_succ _)
````llmstep` checks the language model suggestions in Lean, and highlights those that close the proof.
## Quick start
First, [install Lean 4 in VS Code](https://leanprover.github.io/lean4/doc/quickstart.html) and the python requirements (`pip install -r requirements.txt`).
Then [start a server](#servers):
```bash
python python/server.py
```Open `LLMstep/Examples.lean` in VS Code and try out `llmstep`.
## Use `llmstep` in a project
1. Add `llmstep` in `lakefile.lean`:
```lean
require llmstep from git
"https://github.com/wellecks/llmstep"
```
Then run `lake update`.2. Import `llmstep` in a Lean file:
```lean
import LLMstep
```3. Start a server based on your runtime environment. For instance:
```bash
python python/server.py
```
Please see the [recommended servers below](#servers).## Servers
The `llmstep` tactic communicates with a server that you can run in your own environment (e.g., CPU, GPU, Google Colab).The table below shows the recommended language model and server scripts.
To start a server, use `python {script}`, e.g. `python python/server_vllm.py`:| Environment | Script | Default Model | Context |Speed | miniF2F-test |
| -------- | ------- | ------- |-------|------- |------- |
| CPU | `python/server_encdec.py` | [LeanDojo ByT5 300m](https://huggingface.co/kaiyuy/leandojo-lean4-tacgen-byt5-small) | State | 3.16s | 22.1\%|
| Colab GPU | See [Colab setup](#google-colab) | [llmstep Pythia 2.8b](https://huggingface.co/wellecks/llmstep-mathlib4-pythia2.8b) |State |1.68s | 27.9\%|
| CUDA GPU | `python/server_vllm.py` | [llmstep Pythia 2.8b](https://huggingface.co/wellecks/llmstep-mathlib4-pythia2.8b) |State|**0.25s** | **27.9\%**|
| CUDA GPU* | `python/server_llemma.py` | [Llemma 7b](https://huggingface.co/EleutherAI/llemma_7b) |State, **current file** 🔥 | N/A | N/A|Please refer to [our paper](https://arxiv.org/abs/2310.18457) for further information on the benchmarks.
`llmstep` aims to be a model-agnostic tool. We welcome contributions of new models.
\* File context support (e.g. with [Llemma](https://arxiv.org/abs/2310.10631)) is currently experimental.
## Implementation
`llmstep` has three parts:
1. a [Lean tactic](./LLMstep/LLMstep.lean)
2. a [language model](https://huggingface.co/wellecks/llmstep-mathlib4-pythia2.8b)
3. a [Python server](./python/server.py)The Lean tactic sends a request to the server. \
The server calls the language model and returns the generated suggestions. \
The suggestions are displayed by the tactic in VS Code.## Google Colab
To use Google Colab's free GPU to run a server, follow these instructions:
1. Open and run this notebook to start a server: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wellecks/llmstep/blob/master/python/colab/llmstep_colab_server.ipynb)
2. In your local environment, set the environment variable `LLMSTEP_HOST` equal to the url printed out in this notebook (for example, `https://04fa-34-125-110-83.ngrok.io/`).
3. In your local environment, set the environment variable `LLMSTEP_SERVER=COLAB`.
4. Use `llmstep`.
#### VS Code steps (2) and (3)
To set environment variables in VS Code, go to:
- Settings (`Command` + `,` on Mac)
- Extensions -> Lean 4
- Add the environment variables to `Server Env`. For example:- Then restart the Lean Server (`Command` + `t`, then type `> Lean 4: Restart Server`):
## Language model
By default, `llmstep` uses a Pythia 2.8b language model fine-tuned on [LeanDojo Benchmark 4](https://zenodo.org/record/8040110):
- [`llmstep` model on Huggingface](https://huggingface.co/wellecks/llmstep-mathlib4-pythia2.8b)The [python/train](python/train) directory shows how the model was fine-tuned.
#### Reprover
You can use the non-retrieval version of [Reprover](https://github.com/lean-dojo/ReProver), which we refer to as [LeanDojo ByT5 300m](https://huggingface.co/kaiyuy/leandojo-lean4-tacgen-byt5-small):```
python python/server_encdec.py
```
By default, this runs the `leandojo-lean4-tacgen-byt5-small` model.\
This model is particularly useful on CPU due to its small parameter count.#### Using a different model
Swap in other decoder-only language models with the `--hf-model` argument:
```bash
python server.py --hf-model some/other-model-7B
```
Use `--hf-model` with `python/server_encdec.py` for encoder-decoder models.Use `--hf-model` with `python/server_llemma.py` for prompted base models (e.g. CodeLlama).
#### Fine-tuning a model
The scripts in [python/train](python/train) show how to finetune a model.## Additional Notes
#### Acknowledgements
* The `llmstep` tactic is inspired by [`gpt-f`](https://github.com/jesse-michael-han/lean-gptf).
* Fine-tuning data for the Pythia-2.8b model is from [LeanDojo](https://leandojo.org/).
* The fine-tuning code is based on the script from [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca).
* The tactic implementation adopts ideas and code from Mathlib4's `Polyrith` and `Std.Tactic.TryThis`.
* Thank you to Mario Carneiro and Scott Morrison for reviewing the tactic implementation.#### History
`llmstep` was initially created for an IJCAI-2023 tutorial on neural theorem proving.\
It aims to be a model-agnostic platform for integrating language models and Lean.#### Citation
Please cite:
```
@article{welleck2023llmstep,
title={LLMSTEP: LLM proofstep suggestions in Lean},
author={Sean Welleck and Rahul Saha},
journal={arXiv preprint arXiv:2310.18457},
year={2023}
}
```