{"id":37073815,"url":"https://github.com/evalplus/repoqa","last_synced_at":"2026-01-14T08:39:56.285Z","repository":{"id":236247103,"uuid":"759290181","full_name":"evalplus/repoqa","owner":"evalplus","description":"RepoQA: Evaluating Long-Context Code Understanding","archived":false,"fork":false,"pushed_at":"2024-11-01T17:59:43.000Z","size":5638,"stargazers_count":119,"open_issues_count":4,"forks_count":7,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-10-26T04:28:12.136Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://evalplus.github.io/repoqa.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/evalplus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-18T07:04:19.000Z","updated_at":"2025-10-17T03:07:37.000Z","dependencies_parsed_at":"2024-05-05T02:22:20.734Z","dependency_job_id":"216e336e-b0f6-45a3-9357-4a0458527f1a","html_url":"https://github.com/evalplus/repoqa","commit_stats":null,"previous_names":["evalplus/repoqa"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/evalplus/repoqa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalplus%2Frepoqa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalplus%2Frepoqa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalplus%2Frepoqa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalplus%2Frepoqa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/evalplus","download_url":"https://codeload.github.com/evalplus/repoqa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evalplus%2Frepoqa/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28414676,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:38:59.149Z","status":"ssl_error","status_checked_at":"2026-01-14T08:38:43.588Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-14T08:39:55.684Z","updated_at":"2026-01-14T08:39:56.272Z","avatar_url":"https://github.com/evalplus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RepoQA: Evaluating Long-Context Code Understanding\n\n[![](https://img.shields.io/badge/arXiv-2406.06025-b31b1b.svg?style=for-the-badge)](https://arxiv.org/abs/2406.06025)\n[![](https://img.shields.io/pypi/v/repoqa?style=for-the-badge\u0026labelColor=black)](https://pypi.org/project/repoqa/)\n\n🏠 Homepage: https://evalplus.github.io/repoqa.html\n\n## 🚀 Installation\n\n```bash\n# without vLLM (can run openai, anthropic, and huggingface backends)\npip install --upgrade repoqa\n# To enable vLLM\npip install --upgrade \"repoqa[vllm]\"\n```\n\n\u003cdetails\u003e\u003csummary\u003e⏬ Install nightly version \u003ci\u003e:: click to expand ::\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\n```bash\npip install --upgrade \"git+https://github.com/evalplus/repoqa.git\"                 # without vLLM\npip install --upgrade \"repoqa[vllm] @ git+https://github.com/evalplus/repoqa@main\" # with vLLM\n```\n\n\u003c/div\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e⏬ Using RepoQA as a local repo? \u003ci\u003e:: click to expand ::\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\n```bash\ngit clone https://github.com/evalplus/repoqa.git\ncd repoqa\nexport PYTHONPATH=$PYTHONPATH:$(pwd)\npip install -r requirements.txt\n```\n\n\u003c/div\u003e\n\u003c/details\u003e\n\n## 🏁 Search Needle Function (SNF)\n\nSearch Needle Function is the first and base RepoQA task which aims to practice LLMs' ability of **long-context code understanding and retrieval**.\nIts corresponding real-life scenario is to perform precise code search from function description.\n\n\u003cdetails\u003e\u003csummary\u003e🔎 More dataset details \u003ci\u003e:: click to expand ::\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\n\u003e [!Note]\n\u003e\n\u003e SNF includes 500 tests (5 programming languages x 10 repos x 10 needle functions) where an LLM is given:\n\u003e\n\u003e 1. A large code context sorted in file dependency\n\u003e 2. A NL description of the needle function without revealing keywords like function names\n\u003e 3. An instruction to retrieve the described function\n\u003e\n\u003e The evaluator passes a test if the searched function is syntactically closest to the ground-truth compared against\n\u003e other functions (systematically parsed by `treesitter`) and the similarity is greater than a user defined threshold (by default 0.8).\n\n\u003c/div\u003e\n\u003c/details\u003e\n\nYou can run the SNF evaluation using various backends:\n\n### OpenAI Compatible Servers\n\n```bash\nrepoqa.search_needle_function --model \"gpt4-turbo\" --backend openai\n# 💡 If you use openai API compatible server such as vLLM servers:\n# repoqa.search_needle_function --base-url \"http://localhost:[PORT]/v1\" \\\n#                               --model \"Qwen/CodeQwen1.5-7B-Chat\" --backend openai\n```\n\n### Anthropic Compatible Servers\n\n```bash\nrepoqa.search_needle_function --model \"claude-3-haiku-20240307\" --backend anthropic\n```\n\n### vLLM\n\n```bash\nrepoqa.search_needle_function --model \"Qwen/CodeQwen1.5-7B-Chat\" --backend vllm\n```\n\n\u003cdetails\u003e\u003csummary\u003e🔎 Context extension for small-ctx models \u003ci\u003e:: click to expand ::\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\n\u003e There are two ways to unlock a model's context at inference time:\n\u003e\n\u003e 1. **Direct Extension**: Edit `max_positional_embedding` of the model's `config.json` (e.g., `hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/[hash]/config.json`) to something like `22528`.\n\u003e 2. **[Dynamic RoPE Scaling](https://blog.eleuther.ai/yarn/#dynamic-scaling)**:\n\u003e    To extend `Meta-Llama-3-8B-Instruct` from 8k to 32k (4x), edit the `config.json`:\n\u003e\n\u003e `\"rope_scaling\": {\"type\": \"dynamic\", \"factor\": 4.0}`\n\u003e\n\u003e Note: This works for vLLM `\u003c0.4.3` and HuggingFace transformers. RepoQA will automatically configure dynamic RoPE for vLLM `\u003e= 0.4.3`\n\n\u003c/div\u003e\n\u003c/details\u003e\n\n\u003e [!Note]\n\u003e\n\u003e Reference evaluation time:\n\u003e\n\u003e - Llama3-8B-Instruct: 45 minutes on 2xA6000 (PCIe NVLink)\n\u003e - Llama3-70B-Instruct: 100 minutes on 4xA100 (PCIe NVLink)\n\n### HuggingFace transformers\n\n```bash\nrepoqa.search_needle_function --model \"Qwen/CodeQwen1.5-7B-Chat\" --backend hf --trust-remote-code\n```\n\n\u003e [!Tip]\n\u003e\n\u003e Installing [flash-attn](https://github.com/Dao-AILab/flash-attention) and\n\u003e additionally set `--attn-implementation \"flash_attention_2\"` can largely\n\u003e lower the memory requirement.\n\n\u003cdetails\u003e\u003csummary\u003e🔨 Having trouble installing `flash-attn`? \u003ci\u003e:: click to expand ::\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\n\u003e If you have trouble with `pip install flash-attn --no-build-isolation`,\n\u003e you can try to directly use [pre-built wheels](https://github.com/Dao-AILab/flash-attention/releases):\n\u003e\n\u003e ```shell\n\u003e export FLASH_ATTN_VER=2.5.8 # check latest version at https://github.com/Dao-AILab/flash-attention/releases\n\u003e export CUDA_VER=\"cu122\"     # check available ones at https://github.com/Dao-AILab/flash-attention/releases\n\u003e export TORCH_VER=$(python -c \"import torch; print('.'.join(torch.__version__.split('.')[:2]))\")\n\u003e export PY_VER=$(python -c \"import platform; print(''.join(platform.python_version().split('.')[:2]))\")\n\u003e export OS_ARCH=$(python -c \"import platform; print(f'{platform.system().lower()}_{platform.machine()}')\")\n\u003e\n\u003e export WHEEL=flash_attn-${FLASH_ATTN_VER}+${CUDA_VER}torch${TORCH_VER}cxx11abiFALSE-cp${PY_VER}-cp${PY_VER}-${OS_ARCH}.whl\n\u003e wget https://github.com/Dao-AILab/flash-attention/releases/download/v${FLASH_ATTN_VER}/${WHEEL}\n\u003e pip install ${WHEEL}\n\u003e ```\n\n\u003c/div\u003e\n\u003c/details\u003e\n\n### Google Generative AI API (Gemini)\n\n```bash\nrepoqa.search_needle_function --model \"gemini-1.5-pro-latest\" --backend google\n```\n\n### CLI Usage\n\n- **Input**:\n  - `--model`: Hugging-Face model ID, such as `ise-uiuc/Magicoder-S-DS-6.7B`\n  - `--backend`: `vllm` (default) or `openai`\n  - `--base-url`: OpenAI API base URL\n  - `--code-context-size` (default: 16384): #tokens (by DeepSeekCoder tokenizer) of repository context\n  - `--caching` (default: True): accelerate subsequent runs by caching preprocessing; `--nocaching` to disable\n  - `--max-new-tokens` (default: 1024): Maximum #new tokens to generate\n  - `--system-message` (default: None): system message (note it's not supported by some models)\n  - `--tensor-parallel-size`: #GPUS for doing tensor parallelism (only for vLLM)\n  - `--languages` (default: None): List of languages to evaluate (None means all)\n  - `--result-dir` (default: \"results\"): Directory to save the model outputs and evaluation results\n  - `--clean-ctx-comments` (default: \"none\"): Clean context comments with padding (\"positional_padding\") or no padding (\"no_padding\")\n  - `--eval-ignore-comments` (default: False): During evaluation, ignore groundtruth and model comments\n  - `--trust-remote-code` (default: False): allow remote code (for HuggingFace transformers and vLLM)\n  - `--attn-implementation` (default: None): Use \"flash_attention_2\" if your HF hits OOM\n- **Output**:\n  - `results/ntoken_{code-context-size}/{model}.jsonl`: Model generated outputs\n  - `results/ntoken_{code-context-size}/{model}-SCORE.json`: Evaluation results\n\n### Compute Scores\n\nBy default, the `repoqa.search_needle_function` command will evaluate model outputs and compute scores after text generation.\nHowever, you can also separately compute scores using the following command:\n\n```shell\nrepoqa.compute_score --model-output-path={model-output}.jsonl\n```\n\n\u003e [!Tip]\n\u003e\n\u003e - **Input**: Path to the model generated outputs.\n\u003e - **Output**: The evaluation scores would be stored in `{model-output}-SCORES.json`\n\n## 📚 Read More\n\n- [RepoQA Homepage](https://evalplus.github.io/repoqa.html)\n- [RepoQA Dataset Curation](docs/curate_dataset.md)\n- [RepoQA Development Notes](docs/dev_note.md)\n\n## Citation\n\n```bibtex\n@article{repoqa,\n  title = {RepoQA: Evaluating Long Context Code Understanding},\n  author = {Liu, Jiawei and Tian, Jia Le and Daita, Vijay and Wei, Yuxiang and Ding, Yifeng and Wang, Yuhan Katherine and Yang, Jun and Zhang, Lingming},\n  year = {2024},\n  journal = {arXiv preprint arXiv:2406.06025},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevalplus%2Frepoqa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevalplus%2Frepoqa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevalplus%2Frepoqa/lists"}