{"id":13993990,"url":"https://github.com/lean-dojo/ReProver","last_synced_at":"2025-07-22T18:32:53.760Z","repository":{"id":176552730,"uuid":"615015099","full_name":"lean-dojo/ReProver","owner":"lean-dojo","description":"Retrieval-Augmented Theorem Provers for Lean","archived":false,"fork":false,"pushed_at":"2024-04-12T04:21:47.000Z","size":1725,"stargazers_count":158,"open_issues_count":2,"forks_count":27,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-04-13T01:22:49.235Z","etag":null,"topics":["lean","machine-learning","theorem-proving"],"latest_commit_sha":null,"homepage":"https://leandojo.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lean-dojo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-03-16T19:23:13.000Z","updated_at":"2024-04-14T16:32:39.627Z","dependencies_parsed_at":"2023-12-26T00:29:22.283Z","dependency_job_id":"ebaa77af-e437-42a0-8f00-c3168cd4f750","html_url":"https://github.com/lean-dojo/ReProver","commit_stats":null,"previous_names":["lean-dojo/reprover"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lean-dojo/ReProver","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lean-dojo%2FReProver","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lean-dojo%2FReProver/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lean-dojo%2FReProver/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lean-dojo%2FReProver/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lean-dojo","download_url":"https://codeload.github.com/lean-dojo/ReProver/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lean-dojo%2FReProver/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266552509,"owners_count":23947174,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lean","machine-learning","theorem-proving"],"created_at":"2024-08-09T14:02:39.480Z","updated_at":"2025-07-22T18:32:50.326Z","avatar_url":"https://github.com/lean-dojo.png","language":"Python","funding_links":[],"categories":["Python","Models"],"sub_categories":["Whole proof generation"],"readme":"# Retrieval-Augmented Prover (ReProver)\n\n![Model](images/ReProver.jpg)\n\nCode for the paper:  \n\n[LeanDojo: Theorem Proving with Retrieval-Augmented Language Models](https://leandojo.org/)      \nNeurIPS (Datasets and Benchmarks Track), 2023, Oral presentation  \n[Kaiyu Yang](https://yangky11.github.io/), [Aidan Swope](https://aidanswope.com/about), [Alex Gu](https://minimario.github.io/), [Rahul Chalamala](https://rchalamala.github.io/),  \n[Peiyang Song](https://peiyang-song.github.io/), [Shixing Yu](https://billysx.github.io/), [Saad Godil](https://www.linkedin.com/in/saad-godil-9728353/), [Ryan Prenger](https://www.linkedin.com/in/ryan-prenger-18797ba1/), [Anima Anandkumar](http://tensorlab.cms.caltech.edu/users/anima/)\n\n```bibtex\n@inproceedings{yang2023leandojo,\n  title={{LeanDojo}: Theorem Proving with Retrieval-Augmented Language Models},\n  author={Yang, Kaiyu and Swope, Aidan and Gu, Alex and Chalamala, Rahul and Song, Peiyang and Yu, Shixing and Godil, Saad and Prenger, Ryan and Anandkumar, Anima},\n  booktitle={Neural Information Processing Systems (NeurIPS)},\n  year={2023}\n}\n```\n\nLean 3 is deprecated. The `main` branch only supports Lean 4. You may use the [`legacy`](https://github.com/lean-dojo/ReProver/tree/legacy) branch if you want to work with Lean 3.\n\n[![GitHub license](https://img.shields.io/github/license/MineDojo/MineDojo)](https://github.com/MineDojo/MineDojo/blob/main/LICENSE) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n\n## Quick Links\n\n  - [LeanDojo Website](https://leandojo.org/)\n  - [Using Trained Models on Hugging Face](#using-trained-models-on-hugging-face)\n  - [Using the Model Directly in Lean](#using-the-model-directly-in-lean)\n  - [Requirements](#requirements)\n  - [Premise Selection](#premise-selection)\n  - [Theorem Proving](#theorem-proving)\n  - [Questions and Bugs](#questions-and-bugs)\n\n\n## Using Trained Models on Hugging Face\n\n| Model name | Model architecture | Input | Output |\n| ---------- | ------------------ | ----- | ------ |\n| [kaiyuy/leandojo-lean4-tacgen-byt5-small](https://huggingface.co/kaiyuy/leandojo-lean4-tacgen-byt5-small) | ByT5 (encoder-decoder) | Proof state | Tactic |\n| [kaiyuy/leandojo-lean4-retriever-byt5-small](https://huggingface.co/kaiyuy/leandojo-lean4-retriever-byt5-small) | ByT5 (encoder-only) | Proof state | Embedding |\n| [kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small](https://huggingface.co/kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small) | ByT5 (encoder-decoder) | Retrieved premises + proof state | Tactic |\n\nOur trained models are available on HuggingFace Hub. With minimum dependencies (only [PyTorch](https://pytorch.org/) and [HuggingFace Transformers](https://huggingface.co/docs/transformers/index)), you can use our models to perform inference, finetune them on your own data, or plug them into your customized theorem proving pipeline. Below are some examples.\n\n\n### Tactic Generator\n\nOur tactic generator is a [ByT5](https://huggingface.co/docs/transformers/model_doc/byt5) model finetuned to generate tactics given a proof state.\n```python\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"kaiyuy/leandojo-lean4-tacgen-byt5-small\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"kaiyuy/leandojo-lean4-tacgen-byt5-small\")\n\nstate = \"n : ℕ\\n⊢ gcd n n = n\"\ntokenized_state = tokenizer(state, return_tensors=\"pt\")\n\n# Generate a single tactic.\ntactic_ids = model.generate(tokenized_state.input_ids, max_length=1024)\ntactic = tokenizer.decode(tactic_ids[0], skip_special_tokens=True)\nprint(tactic, end=\"\\n\\n\")\n\n# Generate multiple tactics via beam search.\ntactic_candidates_ids = model.generate(\n    tokenized_state.input_ids,\n    max_length=1024,\n    num_beams=4,\n    length_penalty=0.0,\n    do_sample=False,\n    num_return_sequences=4,\n    early_stopping=False,\n)\ntactic_candidates = tokenizer.batch_decode(\n    tactic_candidates_ids, skip_special_tokens=True\n)\nfor tac in tactic_candidates:\n    print(tac)\n```\n\nThe expected output is shown below. `\u003ca\u003e` and `\u003c/a\u003e` are markers of premises in generated tactics. You should remove them when using the tactics.\n```lean\nrw [gcd_comm, gcd_rec n n]\n\nsimp [gcd]\napply Nat.dvd_antisymm\ninduction' n with n n_ih\ninduction' n with n hn\n```\n\n\n### Premise Retriever\n\nAt the core of our premise retriever is a ByT5 encoder that embeds states and premises into vectors. You can \nuse the vectors to perform retrieval by maximizing cosine similarity.\n```python\nimport torch\nfrom typing import Union, List\nfrom transformers import AutoTokenizer, AutoModelForTextEncoding\n\ntokenizer = AutoTokenizer.from_pretrained(\"kaiyuy/leandojo-lean4-retriever-byt5-small\")\nmodel = AutoModelForTextEncoding.from_pretrained(\"kaiyuy/leandojo-lean4-retriever-byt5-small\")\n\nstate = \"n : ℕ\\n⊢ gcd n n = n\"\npremises = [\n  \"\u003ca\u003evsub_eq_zero_iff_eq\u003c/a\u003e @[simp] lemma vsub_eq_zero_iff_eq {p1 p2 : P} : p1 -ᵥ p2 = (0 : G) ↔ p1 = p2\",\n  \"\u003ca\u003eis_scalar_tower.coe_to_alg_hom'\u003c/a\u003e @[simp] lemma coe_to_alg_hom' : (to_alg_hom R S A : S → A) = algebra_map S A\",\n  \"\u003ca\u003epolynomial.X_sub_C_ne_zero\u003c/a\u003e theorem X_sub_C_ne_zero (r : R) : X - C r ≠ 0\",\n  \"\u003ca\u003eforall_true_iff\u003c/a\u003e theorem forall_true_iff : (α → true) ↔ true\",\n  \"def \u003ca\u003eNat.gcd\u003c/a\u003e : Nat → Nat → Nat\\n| 0        y := y\\n| (succ x) y := have y % succ x \u003c succ x, from mod_lt _ $ succ_pos _,\\n                gcd (y % succ x) (succ x)\",\n  \"@[simp] theorem \u003ca\u003eNat.gcd_zero_left\u003c/a\u003e (x : Nat) : gcd 0 x = x\",\n  \"@[simp] theorem \u003ca\u003eNat.gcd_succ\u003c/a\u003e (x y : Nat) : gcd (succ x) y = gcd (y % succ x) (succ x)\",\n  \"@[simp] theorem \u003ca\u003eNat.mod_self\u003c/a\u003e (n : Nat) : n % n = 0\",\n]  # A corpus of premises to retrieve from.\n\n@torch.no_grad()\ndef encode(s: Union[str, List[str]]) -\u003e torch.Tensor:\n    \"\"\"Encode texts into feature vectors.\"\"\"\n    if isinstance(s, str):\n        s = [s]\n        should_squeeze = True\n    else:\n        should_squeeze = False\n    tokenized_s = tokenizer(s, return_tensors=\"pt\", padding=True)\n    hidden_state = model(tokenized_s.input_ids).last_hidden_state\n    lens = tokenized_s.attention_mask.sum(dim=1)\n    features = (hidden_state * tokenized_s.attention_mask.unsqueeze(2)).sum(dim=1) / lens.unsqueeze(1)\n    if should_squeeze:\n      features = features.squeeze()\n    return features\n\n@torch.no_grad()\ndef retrieve(state: str, premises: List[str], k: int) -\u003e List[str]:\n    \"\"\"Retrieve the top-k premises given a state.\"\"\"\n    state_emb = encode(state)\n    premise_embs = encode(premises)\n    scores = (state_emb @ premise_embs.T)\n    topk = scores.topk(k).indices.tolist()\n    return [premises[i] for i in topk]\n\nfor p in retrieve(state, premises, k=4):\n    print(p, end=\"\\n\\n\")\n```\n\nExpected output:\n```lean\ndef \u003ca\u003eNat.gcd\u003c/a\u003e : Nat → Nat → Nat\n| 0        y := y\n| (succ x) y := have y % succ x \u003c succ x, from mod_lt _ $ succ_pos _,\n                gcd (y % succ x) (succ x)\n\n@[simp] theorem \u003ca\u003eNat.gcd_zero_left\u003c/a\u003e (x : Nat) : gcd 0 x = x\n\n@[simp] theorem \u003ca\u003eNat.gcd_succ\u003c/a\u003e (x y : Nat) : gcd (succ x) y = gcd (y % succ x) (succ x)\n\n@[simp] theorem \u003ca\u003eNat.mod_self\u003c/a\u003e (n : Nat) : n % n = 0\n```\n\n\n### Retrieval-Augmented Tactic Generator\n\nReProver's tactic generator takes as input the concatenation of retrieved premises and the state.\n```python\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small\")\n\nstate = \"n : ℕ\\n⊢ gcd n n = n\"\nretrieved_premises = [\n  \"def \u003ca\u003eNat.gcd\u003c/a\u003e : Nat → Nat → Nat\\n| 0        y := y\\n| (succ x) y := have y % succ x \u003c succ x, from mod_lt _ $ succ_pos _,\\n                gcd (y % succ x) (succ x)\",\n  \"@[simp] theorem \u003ca\u003eNat.mod_self\u003c/a\u003e (n : Nat) : n % n = 0\",\n]\ninput = \"\\n\\n\".join(retrieved_premises + [state])\nprint(\"------ INPUT ------\\n\", input)\ntokenized_input = tokenizer(input, return_tensors=\"pt\", max_length=2300, truncation=True)\n\n# Generate a single tactic.\ntactic_ids = model.generate(tokenized_input.input_ids, max_length=1024)\ntactic = tokenizer.decode(tactic_ids[0], skip_special_tokens=True)\nprint(\"\\n------ OUTPUT ------\")\nprint(tactic, end=\"\\n\\n\")\n\n# Generate multiple tactics via beam search.\ntactic_candidates_ids = model.generate(\n    tokenized_input.input_ids,\n    max_length=1024,\n    num_beams=4,\n    length_penalty=0.0,\n    do_sample=False,\n    num_return_sequences=4,\n    early_stopping=False,\n)\ntactic_candidates = tokenizer.batch_decode(\n    tactic_candidates_ids, skip_special_tokens=True\n)\nfor tac in tactic_candidates:\n    print(tac)\n```\n\nExpected output:\n```\n------ INPUT ------\n def \u003ca\u003eNat.gcd\u003c/a\u003e : Nat → Nat → Nat\n| 0        y := y\n| (succ x) y := have y % succ x \u003c succ x, from mod_lt _ $ succ_pos _,\n                gcd (y % succ x) (succ x)\n\n@[simp] theorem \u003ca\u003eNat.mod_self\u003c/a\u003e (n : Nat) : n % n = 0\n\nn : ℕ\n⊢ gcd n n = n\n\n------ OUTPUT ------\nrw [gcd_def, ← gcd_def, ← gcd_def, ← gcd_def]\n\nsimp [gcd]\nrw [gcd]\nrw [gcd_def]\nrw [← Nat.mod_self n, ← Nat.mod_self n]\n```\n\n**The rest of this document describes our system for training and evaluating LLM-based provers.**\n\n\n## Using the Model Directly in Lean\n\nCheck out [Lean Copilot](https://github.com/lean-dojo/LeanCopilot) if you want to run ReProver's tactic generator directly in Lean's VSCode workflow. \n\n\n## Requirements\n\n1. Download and install [Miniconda Python 3](https://docs.conda.io/en/latest/miniconda.html) (Anaconda should also work).\n2. Create the conda environment and install Python dependencies:\n```bash\nconda create --yes --name ReProver python=3.11 ipython\nconda activate ReProver\npip install torch  # Depending on your CUDA version; see https://pytorch.org/.\npip install tqdm loguru deepspeed \"pytorch-lightning[extra]\" transformers wandb openai rank_bm25 lean-dojo vllm\n```\n3. Prepend the repo's root to the `PYTHONPATH` environment variable.\n4. Make sure `wget` and `tar` are available. Then, run `python scripts/download_data.py` to download [LeanDojo Benchmark 4](https://zenodo.org/doi/10.5281/zenodo.8040109). They will be saved to `./data`.\n5. Satisfy the requirements of [LeanDojo](https://github.com/lean-dojo/LeanDojo#requirements).\n6. Use [LeanDojo](https://github.com/lean-dojo/LeanDojo) to trace all repos in the datasets: `python scripts/trace_repos.py`. This step may take some time. Please refer to [LeanDojo's documentation](https://leandojo.readthedocs.io/en/latest/) if you encounter any issues.\n7. Run `wandb login` to log in Weights \u0026 Biases.\n\n\n\n## Premise Selection\n\nWe use [Lightning CLI](https://pytorch-lightning.readthedocs.io/en/1.6.5/common/lightning_cli.html) to create [retrieval/main.py](retrieval/main.py) for training, validation, and testing the premise retrieval. It takes command line arguments as well as YAML config files. Please run `python retrieval/main.py --help` or refer to the documentation of [Lightning CLI](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.cli.LightningCLI.html) for details. \n\nThe config files for our experiments are in [./retrieval/confs](./retrieval/confs). We train all models on a single NVIDIA A100 GPU with 80GB memory. When using GPUs with smaller memory, you can change `batch_size`, `accumulate_grad_batches`, and `num_negatives`. However, it may impact the performance due to in-batch negatives in DPR.\n\n\n### Training the Premise Retriever\n\nRun `python retrieval/main.py fit --help` to see how to use the training script. For example:\n```bash\nmkdir logs  # Create the directory for training logs.\npython retrieval/main.py fit --config retrieval/confs/cli_lean4_random.yaml --trainer.logger.name train_retriever_random --trainer.logger.save_dir logs/train_retriever_random         # Train the retriever on the `random` split of LeanDojo Benchmark 4.\npython retrieval/main.py fit --config retrieval/confs/cli_lean4_novel_premises.yaml --trainer.logger.name train_retriever_novel_premises --trainer.logger.save_dir logs/train_retriever_novel_premises  # Train the retriever on the `novel_premises` split of LeanDojo Benchmark 4.\n```\nHyperparameters and model checkpoints are saved in `./logs/train_retriever_*`, and you can monitor the training process on Weights \u0026 Biases.\n\n\n### Retrieving Premises for All Proof States\n\nAfter the models are trained, run the following commands to retrieve premises for all proof states in the dataset.\n```bash\npython retrieval/main.py predict --config retrieval/confs/cli_lean4_random.yaml --ckpt_path $PATH_TO_RETRIEVER_CHECKPOINT --trainer.logger.name predict_retriever_random --trainer.logger.save_dir logs/predict_retriever_random \npython retrieval/main.py predict --config retrieval/confs/cli_lean4_novel_premises.yaml --ckpt_path $PATH_TO_RETRIEVER_CHECKPOINT --trainer.logger.name predict_retriever_novel_premises --trainer.logger.save_dir logs/predict_retriever_novel_premises\n```\nRetrieved premises are saved to `./logs/predict_retriever_*/predictions.pickle`. Note that `PATH_TO_RETRIEVER_CHECKPOINT` is the DeepSpeed model checkpoint produced in the previous step. If you want to use a Hugging Face checkpoint instead, a workaround would be to run the training for 1 step with zero learning rate.\n\n\n### Evaluating the Retrieved Premises\n\nAfter predictions are saved, evaluate them using metrics such as R@1, R@10, and MRR.\n```bash\npython retrieval/evaluate.py --data-path data/leandojo_benchmark_4/random --preds-file logs/predict_retriever_random/predictions.pickle\npython retrieval/evaluate.py --data-path data/leandojo_benchmark_4/novel_premises --preds-file logs/predict_retriever_novel_premises/predictions.pickle\n```\n\n\n## Theorem Proving\n\n\n### Training the Tactic Generator\n\nSimilar to premise selection, you can run `python generation/main.py --help` and  `python generation/main.py fit --help` to check the command line options.\n\nTo train tactic generators without retrieval:\n```bash\npython generation/main.py fit --config generation/confs/cli_lean4_random.yaml --trainer.logger.name train_generator_random --trainer.logger.save_dir logs/train_generator_random            # LeanDojo Benchmark 4, `random` split\npython generation/main.py fit --config generation/confs/cli_lean4_novel_premises.yaml --trainer.logger.name train_generator_novel_premises --trainer.logger.save_dir logs/train_generator_novel_premises    # LeanDojo Benchmark 4, `novel_premises` split\n```\nHyperparameters and model checkpoints are saved in `./logs/train_generator_*`, and you can monitor the training process on Weights \u0026 Biases.\n\nTo train models augmented by retrieval, we need to provide a retriever checkpoint and its predictions on all proof states in the dataset:\n```bash \npython generation/main.py fit --config generation/confs/cli_lean4_random.yaml --model.ret_ckpt_path $PATH_TO_RETRIEVER_CHECKPOINT --data.preds_path logs/predict_retriever_random/predictions.pickle --trainer.logger.name train_reprover_random --trainer.logger.save_dir logs/train_reprover_random\npython generation/main.py fit --config generation/confs/cli_lean4_novel_premises.yaml --model.ret_ckpt_path $PATH_TO_RETRIEVER_CHECKPOINT --data.preds_path logs/predict_retriever_novel_premises/predictions.pickle --trainer.logger.name train_reprover_novel_premises --trainer.logger.save_dir logs/train_reprover_novel_premises\n```\n\n\n### Theorem Proving Evaluation on LeanDojo Benchmark\n\nAfter the tactic generator is trained, we combine it with best-first search to prove theorems by interacting with Lean. The evaluation script takes Hugging Face model checkpoints (either local or remote) as input. For remote models, you can simply use their names, e.g., [kaiyuy/leandojo-lean4-tacgen-byt5-small](https://huggingface.co/kaiyuy/leandojo-lean4-tacgen-byt5-small). For locally trained models, you first need to convert them from PyTorch Ligthning checkpoints to Hugging Face checkpoints:\n```bash\npython scripts/convert_checkpoint.py generator --src $PATH_TO_GENERATOR_CHECKPOINT --dst ./leandojo-lean4-tacgen-byt5-small\npython scripts/convert_checkpoint.py retriever --src $PATH_TO_RETRIEVER_CHECKPOINT --dst ./leandojo-lean4-retriever-byt5-small\npython scripts/convert_checkpoint.py generator --src $PATH_TO_REPROVER_CHECKPOINT --dst ./leandojo-lean4-retriever-tacgen-byt5-small\n```\n, where `PATH_TO_GENERATOR_CHECKPOINT` and `PATH_TO_RETRIEVER_CHECKPOINT` are PyTorch Ligthning checkpoints produced by the training script.\n\n\nTo evaluate the model without retrieval, run (using the `random` data split as example):\n```bash\npython prover/evaluate.py --data-path data/leandojo_benchmark_4/random/ --gen_ckpt_path ./leandojo-lean4-tacgen-byt5-small --split test --num-workers 5 --num-gpus 1\n```\nYou may tweak `--num-workers` and `--num-gpus` to fit your hardware.\n\n\nFor the model with retrieval, first use the retriever to index the corpus (pre-computing the embeddings of all premises):\n```bash\npython retrieval/index.py --ckpt_path ./leandojo-lean4-retriever-byt5-small --corpus-path data/leandojo_benchmark_4/corpus.jsonl --output-path $PATH_TO_INDEXED_CORPUS\n```\nIt saves the indexed corpus as a pickle file to `PATH_TO_INDEXED_CORPUS`.\n\nThen, run:\n```bash\npython prover/evaluate.py --data-path data/leandojo_benchmark_4/random/  --gen_ckpt_path ./leandojo-lean4-retriever-tacgen-byt5-small --ret_ckpt_path ./leandojo-lean4-retriever-byt5-small --indexed-corpus-path $PATH_TO_INDEXED_CORPUS --split test --num-workers 5 --num-gpus 1\n```\n\n\n## Questions and Bugs\n\n* For general questions and discussions, please use [GitHub Discussions](https://github.com/lean-dojo/ReProver/discussions).  \n* To report a potential bug, please open an issue.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flean-dojo%2FReProver","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flean-dojo%2FReProver","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flean-dojo%2FReProver/lists"}