{"id":15144335,"url":"https://github.com/aorwall/moatless-tools","last_synced_at":"2025-05-16T05:07:19.765Z","repository":{"id":189531999,"uuid":"680849474","full_name":"aorwall/moatless-tools","owner":"aorwall","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-30T03:51:48.000Z","size":113575,"stargazers_count":399,"open_issues_count":16,"forks_count":36,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-30T04:32:34.956Z","etag":null,"topics":["ai","code-gen","code-search","gpt-4","python","tree-sitter"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aorwall.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-08-20T15:55:05.000Z","updated_at":"2025-04-29T02:01:36.000Z","dependencies_parsed_at":"2023-08-20T17:12:34.164Z","dependency_job_id":"870de9d9-ab85-4b3a-89c2-5ff613a251b5","html_url":"https://github.com/aorwall/moatless-tools","commit_stats":{"total_commits":59,"total_committers":4,"mean_commits":14.75,"dds":"0.47457627118644063","last_synced_commit":"a50e3ef9da4e73e916e71294649f42038b9df47b"},"previous_names":["aorwall/code-blocks","aorwall/ghostcoder"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aorwall%2Fmoatless-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aorwall%2Fmoatless-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aorwall%2Fmoatless-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aorwall%2Fmoatless-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aorwall","download_url":"https://codeload.github.com/aorwall/moatless-tools/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254471060,"owners_count":22076585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","code-gen","code-search","gpt-4","python","tree-sitter"],"created_at":"2024-09-26T10:40:38.191Z","updated_at":"2025-05-16T05:07:14.757Z","avatar_url":"https://github.com/aorwall.png","language":"Python","funding_links":[],"categories":["Python","Evaluation"],"sub_categories":["Agentic Coding Frameworks (Scaffolding)"],"readme":"# Moatless Tools\nMoatless Tools is a hobby project where I experiment with some ideas I have about how LLMs can be used to edit code in large existing codebases. I believe that rather than relying on an agent to reason its way to a solution, it is crucial to build good tools to insert the right context into the prompt and handle the response.\n\n_For the implementation used in the paper [SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement](https://arxiv.org/abs/2410.20285), please see [moatless-tree-search](https://github.com/aorwall/moatless-tree-search)._\n\n## SWE-Bench\nI use the [SWE-bench benchmark](https://www.swebench.com/) as a way to verify my ideas. \n\n* [Claude 3.5 Sonnet v20241022 evaluation results](https://experiments.moatless.ai/evaluations/20250113_claude_3_5_sonnet_20241022_temp_0_0_iter_20_fmt_tool_call_hist_messages_lite) - 39% solve rate, 2.7 resolved instances per dollar\n* [Deepseek V3](https://experiments.moatless.ai/evaluations/20250111_deepseek_chat_v3_temp_0_0_iter_20_fmt_react_hist_react) - 30.7% solve rate, 24 resolved instances per dollar\n\n# Try it out\n\n## Environment Setup\n\nYou can install Moatless Tools either from PyPI or from source:\n\n### Install from PyPI\n\n```bash\n# Install base package only\npip install moatless\n\n# Install with streamlit visualization tools\npip install \"moatless[streamlit]\"\n\n# Install with API server\npip install \"moatless[api]\"\n\n# Install everything (including dev dependencies)\npip install \"moatless[all]\"\n```\n\n### Install from source\n\nClone the repository and install using Poetry:\n\n```bash\n# Clone the repository\ngit clone https://github.com/aorwall/moatless-tools.git\ncd moatless-tools\n\n# Using Poetry:\n\n# Install base package only\npoetry install\n\n# Install with streamlit visualization tools\npoetry install --with streamlit\n\n# Install with API server\npoetry install --with api\n\n# Alternative: Install all optional components at once\npoetry install --all-extras\n```\n\n## Environment Variables\n\nBefore running the evaluation, you'll need:\n1. At least one LLM provider API key (e.g., OpenAI, Anthropic, etc.)\n2. A Voyage AI API key from [voyageai.com](https://voyageai.com) to use the pre-embedded vector stores for SWE-Bench instances.\n3. (Optional) Access to a testbed environment - see [moatless-testbeds](https://github.com/aorwall/moatless-testbeds) for setup instructions\n\nYou can configure these settings by either:\n\n1. Create a `.env` file in the project root (copy from `.env.example`):\n\n```bash\n# Using Poetry:\ncp .env.example .env\n# Edit .env with your values\n\n# Using pip:\ncurl -O https://raw.githubusercontent.com/aorwall/moatless-tools/main/.env.example\nmv .env.example .env\n# Edit .env with your values\n```\n\n2. Or export the variables directly:\n   \n```bash\n# Directory for storing vector index store files  \nexport INDEX_STORE_DIR=\"/tmp/index_store\"    \n\n# Directory for storing cloned repositories \nexport REPO_DIR=\"/tmp/repos\"\n\n# Required: At least one LLM provider API key\nexport OPENAI_API_KEY=\"\u003cyour-key\u003e\"\nexport ANTHROPIC_API_KEY=\"\u003cyour-key\u003e\"\n\n# ...or Base URL for custom LLM API service (optional)\nexport CUSTOM_LLM_API_BASE=\"\u003cyour-base-url\u003e\"\nexport CUSTOM_LLM_API_KEY=\"\u003cyour-key\u003e\"\n\n# Required: API Key for Voyage Embeddings\nexport VOYAGE_API_KEY=\"\u003cyour-key\u003e\"\n\n# Optional: Configuration for testbed environment (https://github.com/aorwall/moatless-testbeds)\nexport TESTBED_API_KEY=\"\u003cyour-key\u003e\"\nexport TESTBED_BASE_URL=\"\u003cyour-base-url\u003e\"\n```\n\n## Verified Models\n\n\u003e **Note**: The current version of litellm lacks support for computer use tools required by Claude 3.5 Sonnet. You need to use a specific dependency:\n\u003e ```toml\n\u003e litellm = { git = \"https://github.com/aorwall/litellm.git\", branch = \"anthropic-computer-use\" }\n\u003e ```\n\nDefault model configurations are provided for verified models. Note that other models may work but have not been extensively tested. \nVerified models are models that have been tested and found to work with the [Verified Mini subset](https://huggingface.co/datasets/MariusHobbhahn/swe-bench-verified-mini) of the SWE-Bench dataset.\n\nWhen specifying just the `--model` argument, the following configurations are used:\n\n| Model | Response Format | Message History | Thoughts in Action | Verified Mini |\n|-------|----------------|-----------------|-------------------|---------------|\n| claude-3-5-sonnet-20241022 | tool_call | messages | no | [46%](https://experiments.moatless.ai/evaluations/20250119_claude_3_5_sonnet_20241022_0_0_n_20_fmt_tool_call_verified_mini) | \n| claude-3-5-haiku-20241022 | tool_call | messages | no | [28%](https://experiments.moatless.ai/evaluations/20250118_claude_3_5_haiku_20241022_0_0_n_20_fmt_tool_call_verified_mini) |\n| gpt-4o-2024-11-20 | tool_call | messages | yes | [32%](https://experiments.moatless.ai/evaluations/20250119_azure_gpt_4o_0_0_n_20_fmt_tool_call_thoughts-in-action_1_verified_mini) |\n| gpt-4o-mini-2024-07-18 | tool_call | messages | yes | [16%](https://experiments.moatless.ai/evaluations/20250118_gpt_4o_mini_2024_07_18_0_0_n_20_fmt_tool_call_thoughts-in-action_6_verified_mini) |\n| o1-mini-2024-09-12 | react | react | no (disabled thoughts) | [28%](https://experiments.moatless.ai/evaluations/20250114_o1_mini_2024_09_12_0_0_n_20_fmt_react_hist_react_verified_mini) |\n| deepseek/deepseek-chat | react | react | no | [36%](https://experiments.moatless.ai/evaluations/20250118_deepseek_deepseek_chat_0_0_n_20_fmt_react_verified_mini) |\n| deepseek/deepseek-reasoner | react | react | no (disabled thoughts) | [50%](https://experiments.moatless.ai/evaluations/20250120_deepseek_deepseek_reasoner_None_n_20_fmt_react_verified_mini) |\n| gemini/gemini-2.0-flash-exp | react | react | no | [38%](https://experiments.moatless.ai/evaluations/20250119_gemini_gemini_2.0_flash_exp_0_0_n_20_fmt_react_verified_mini) |\n| openrouter/meta-llama/llama-3.1-70b-instruct | react | react | no | - |\n| openrouter/meta-llama/llama-3.1-405b-instruct | react | react | no | [28%](https://experiments.moatless.ai/evaluations/20250119_openai_meta_llama_Meta_Llama_3.1_405B_Instruct_FP8_0_0_n_20_fmt_react_verified_mini) | - |\n| openrouter/qwen/qwen-2.5-coder-32b-instruct | react | react | no | [32%](https://experiments.moatless.ai/evaluations/20250119_openai_Qwen_Qwen2.5_Coder_32B_Instruct_0_0_n_20_fmt_react_verified_mini) | - |\n\n## Verify Setup\n\nBefore running the full evaluation, you can verify your setup using the integration test script:\n\n```bash\n# Run a single model test\npython -m moatless.validation.validate_simple_code_flow --model claude-3-5-sonnet-20241022\n```\n\nThe script will run the model against a sample SWE-Bench instance\n\nResults are saved in `test_results/integration_test_\u003ctimestamp\u003e/` .\n\n\n## Run evaluation\n\nThe evaluation script supports various configuration options through command line arguments:\n\n```bash\npython -m moatless.benchmark.run_evaluation [OPTIONS]\n```\n\nRequired arguments:\n- `--model MODEL`: Model to use for evaluation (e.g., 'claude-3-5-sonnet-20241022', 'gpt-4o')\n\nOptional arguments:\n- Model settings:\n  - `--model MODEL`: Model identifier. Can be a supported model from the table below or any custom model identifier. \n  - `--api-key KEY`: API key for the model\n  - `--base-url URL`: Base URL for the model API\n  - `--response-format FORMAT`: Response format ('tool_call' or 'react'). Defaults to 'tool_call' for custom models\n  - `--message-history TYPE`: Message history type ('messages', 'summary', 'react', 'messages_compact', 'instruct'). Defaults to 'messages' for custom models\n  - `--thoughts-in-action`: Enable thoughts in action\n  - `--temperature FLOAT`: Temperature for model sampling. Defaults to 0.0\n\n- Dataset settings:\n  - `--split SPLIT`: Dataset split to use. Defaults to 'lite'\n  - `--instance-ids ID [ID ...]`: Specific instance IDs to evaluate\n\n- Loop settings:\n  - `--max-iterations INT`: Maximum number of iterations\n  - `--max-cost FLOAT`: Maximum cost in dollars\n\n- Runner settings:\n  - `--num-workers INT`: Number of parallel workers. Defaults to 10\n  - `--evaluation-name NAME`: Custom name for the evaluation run\n  - `--rerun-errors`: Rerun instances that previously errored\n\nAvailable dataset splits that can be specified with the `--split` argument:\n\n| Split Name | Description | Instance Count |\n|------------|-------------|----------------|\n| lite | All instances from the lite dataset | 300 | \n| verified | All instances from the verified dataset | 500 | \n| verified_mini | [MariusHobbhahn/swe-bench-verified-mini](https://huggingface.co/datasets/MariusHobbhahn/swe-bench-verified-mini), a subset of SWE-Bench Verified  | 50 |\n| lite_and_verified_solvable | Instances that exist in both lite and verified datasets and have at least one solved submission to SWE-Bench | 84 |\n\nExample usage:\n```bash\n# Run evaluation with Claude 3.5 Sonnet using the ReACT format\npython -m moatless.benchmark.run_evaluation \\\n  --model claude-3-5-sonnet-20241022 \\\n  --response-format react \\\n  --message-history react \\\n  --num-workers 10\n\n# Run specific instances with GPT-4\npython -m moatless.benchmark.run_evaluation \\\n  --model gpt-4o-2024-11-20 \\\n  --instance-ids \"django__django-16527\"\n```\n\n# Running the UI and API\n\nThe project includes a web UI for visualizing saved trajectory files, built with SvelteKit. The UI is packaged with the Python package and will be served by the API server.\n\nFirst, make sure you have the required components installed:\n```bash\npip install \"moatless[api]\"\n```\n\n### Start the API Server\n```bash\nmoatless-api\n```\n\nThis will start the FastAPI server on http://localhost:8000 and serve the UI at the same address.\n\n### Development Mode\n\nIf you want to develop the UI, you can run it in development mode:\n\n```bash\n# From the ui directory\ncd ui\npnpm install\npnpm run dev\n```\nThe UI development server will be available at http://localhost:5173.\n\n# Code Examples\n\nBasic setup using the `AgenticLoop` to solve a SWE-Bench instance.\n\n## Example 1: Using Claude 3.5 Sonnet\n```python\nfrom moatless.benchmark.swebench import create_repository\nfrom moatless.benchmark.utils import get_moatless_instance\nfrom moatless.agent.code_agent import CodingAgent\nfrom moatless.index import CodeIndex\nfrom moatless.loop import AgenticLoop\nfrom moatless.file_context import FileContext\nfrom moatless.completion.base import LLMResponseFormat\nfrom moatless.schema import MessageHistoryType\n\nindex_store_dir = os.getenv(\"INDEX_STORE_DIR\", \"/tmp/index_store\")\nrepo_base_dir = os.getenv(\"REPO_DIR\", \"/tmp/repos\")\npersist_path = \"trajectory.json\"\n\ninstance = get_moatless_instance(\"django__django-16379\")\nrepository = create_repository(instance)\ncode_index = CodeIndex.from_index_name(\n    instance[\"instance_id\"], \n    index_store_dir=index_store_dir, \n    file_repo=repository\n)\nfile_context = FileContext(repo=repository)\n\n# Create agent using Claude 3.5 Sonnet with explicit config\nagent = CodingAgent.create(\n    repository=repository, # Repository instance with codebase\n    code_index=code_index, # Code index for semantic search\n    \n    model=\"claude-3-5-sonnet-20241022\",\n    temperature=0.0, \n    max_tokens=4000,\n    few_shot_examples=False, # We don't need few-shot examples for this model\n    \n    response_format=LLMResponseFormat.TOOLS,\n    message_history_type=MessageHistoryType.MESSAGES, # We must show the full message history to make us of claude's prompt cache\n)\n\nloop = AgenticLoop.create(\n    message=instance[\"problem_statement\"],\n    agent=agent,\n    file_context=file_context,\n    repository=repository,\n    persist_path=persist_path,\n    max_iterations=50,\n    max_cost=2.0\n)\n\nfinal_node = loop.run()\nif final_node:\n    print(final_node.observation.message)\n```\n\n## Example 2: Using Deepseek V3\n```python\nfrom moatless.benchmark.swebench import create_repository\nfrom moatless.benchmark.utils import get_moatless_instance\nfrom moatless.agent.code_agent import CodingAgent\nfrom moatless.index import CodeIndex\nfrom moatless.loop import AgenticLoop\nfrom moatless.file_context import FileContext\nfrom moatless.completion.base import LLMResponseFormat\nfrom moatless.schema import MessageHistoryType\n\nindex_store_dir = os.getenv(\"INDEX_STORE_DIR\", \"/tmp/index_store\")\nrepo_base_dir = os.getenv(\"REPO_DIR\", \"/tmp/repos\")\npersist_path = \"trajectory.json\"\n\ninstance = get_moatless_instance(\"django__django-16379\")\nrepository = create_repository(instance)\ncode_index = CodeIndex.from_index_name(\n    instance[\"instance_id\"], \n    index_store_dir=index_store_dir, \n    file_repo=repository\n)\nfile_context = FileContext(repo=repository)\n\n# Create agent using Deepseek Chat with explicit config\nagent = CodingAgent.create(\n    repository=repository,\n    code_index=code_index,\n    \n    model=\"deepseek/deepseek-chat\",\n    temperature=0.0,\n    max_tokens=4000,\n    few_shot_examples=True,\n    \n    response_format=LLMResponseFormat.REACT,\n    message_history_type=MessageHistoryType.REACT\n)\n\nloop = AgenticLoop.create(\n    message=instance[\"problem_statement\"],\n    agent=agent,\n    file_context=file_context,\n    repository=repository,\n    persist_path=persist_path,\n    max_iterations=50,\n    max_cost=2.0\n)\n\nfinal_node = loop.run()\nif final_node:\n    print(final_node.observation.message)\n```\n\n## CodingAgent Parameters\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `model` | str | Required | Model identifier from supported models table (e.g., \"claude-3-5-sonnet-20241022\") |\n| `repository` | Repository | Required | Repository instance containing the codebase |\n| `code_index` | CodeIndex | None | Code index for semantic search functionality |\n| `runtime` | RuntimeEnvironment | None | Environment for running tests |\n| `message_history_type` | MessageHistoryType | From config | How to format the message history in the prompt ('messages', 'react', etc.) |\n| `thoughts_in_action` | bool | From config | Whether to include thoughts in action responses, used when the LLM can't provide the reasoning in the message content |\n| `disable_thoughts` | bool | From config | Whether to completely disable thought generation, used for reasoning models like o1 and Deepseek R1 |\n| `few_shot_examples` | bool | From config | Whether to use few-shot examples in prompts |\n| `temperature` | float | From config | Temperature for model sampling (0.0 = deterministic) |\n| `max_tokens` | int | From config | Maximum tokens per model completion |\n\nThe default values for optional parameters are taken from the model's configuration in `model_config.py`. See the Verified Models table above for model-specific defaults.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faorwall%2Fmoatless-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faorwall%2Fmoatless-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faorwall%2Fmoatless-tools/lists"}