{"id":48590467,"url":"https://github.com/RoyCoding8/ReasonForge-MCP-Server","last_synced_at":"2026-04-08T19:03:20.541Z","repository":{"id":341694822,"uuid":"1171107147","full_name":"RoyCoding8/ReasonForge-MCP-Server","owner":"RoyCoding8","description":"A deterministic tool-calling framework for small LLMs. Integrates SymPy and sandboxed Python to achieve better accuracy/latency without massive parameter counts.","archived":false,"fork":false,"pushed_at":"2026-03-02T22:26:52.000Z","size":268,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-03T01:21:58.276Z","etag":null,"topics":["llms","mcp","mcp-server"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RoyCoding8.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-02T22:03:48.000Z","updated_at":"2026-03-02T22:28:43.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/RoyCoding8/ReasonForge-MCP-Server","commit_stats":null,"previous_names":["roycoding8/reasonforge-mcp-server"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/RoyCoding8/ReasonForge-MCP-Server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2FReasonForge-MCP-Server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2FReasonForge-MCP-Server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2FReasonForge-MCP-Server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2FReasonForge-MCP-Server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RoyCoding8","download_url":"https://codeload.github.com/RoyCoding8/ReasonForge-MCP-Server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2FReasonForge-MCP-Server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31569400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llms","mcp","mcp-server"],"created_at":"2026-04-08T19:02:56.743Z","updated_at":"2026-04-08T19:03:20.533Z","avatar_url":"https://github.com/RoyCoding8.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ReasonForge\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RoyCoding8/MCP/blob/main/ReasonForge_Colab.ipynb)\n\n**Deterministic math tools for small language models.**\n\nReasonForge gives small LLMs (8B–32B) access to a verified SymPy computation backend via tool calling.\nInstead of relying on the model to compute, all math is delegated to deterministic tools — the model only reasons about *what* to compute and *how* to present results.\n\n## Architecture\n\n```\nUser Question → LLM (Qwen3) → Tool Calls → SymPy Backend → Verified Results → LLM → Final Answer\n```\n\n**Multi-Turn Agentic Loop:**\n1. **Reason:** The model uses `\u003cthink\u003e` tags to analyze the problem and decide on a strategy.\n2. **Execute:** The model delegates computation to a deterministic tool (SymPy or Python sandbox).\n3. **Iterate:** The model observes the verified tool output and either concludes the answer or calls another tool until solved (up to `MAX_ROUNDS`).\n\n## Tools\n\n| Tool | Operations | Backend |\n|------|-----------|---------| \n| `math_tool` | compute, solve, simplify, factor, expand, gcd, lcm, prime_factors, divisors, mod_inverse, nsolve, crt + SymPy builtins (totient, fibonacci, isprime...) | SymPy |\n| `calculus_tool` | differentiate, integrate, limit, series, summation, partial_fraction, trigsimp, ode_solve, laplace | SymPy |\n| `matrix_tool` | determinant, inverse, eigenvalues, eigenvectors, rank, rref, transpose, multiply, add, trace, nullspace, columnspace, charpoly, norm, adjugate, solve (Ax=b) | SymPy |\n| `statistics_tool` | describe, mean, median, mode, std, variance, correlation, regression, percentile, zscore, skewness, kurtosis, geometric_mean, harmonic_mean | Python stdlib |\n| `code_tool` | run, check, ast_inspect — sandboxed Python code execution, syntax checking, and structure analysis | subprocess |\n\n## Project Structure\n\n```\nMCP/\n├── core.py                    # Shared LLM request logic, expert definitions, tool schemas\n├── experts/\n│   ├── math/\n│   │   ├── server.py          # MCP server entry point (math tools)\n│   │   └── tools/\n│   │       ├── preprocess.py  # Expression parser (^ → **, implicit multiplication)\n│   │       ├── algebra.py     # algebra + number theory\n│   │       ├── calculus.py    # derivatives, integrals, ODEs\n│   │       ├── matrix.py      # linear algebra\n│   │       └── statistics.py  # descriptive \u0026 inferential stats\n│   └── code/\n│       ├── server.py          # MCP server entry point (code execution)\n│       └── tools/\n│           └── code.py        # Sandboxed Python runner \u0026 syntax checker\n├── tests/\n│   ├── sanity.py              # Tool unit tests (16 checks)\n│   ├── math_benchmark.py      # A/B math benchmark (MATH-500 dataset)\n│   ├── code_benchmark.py      # A/B code benchmark (HumanEval)\n│   └── results/               # Local benchmark outputs \n├── ui/\n│   ├── app.py                 # Gradio chat interface with intermediate thinking steps\n│   └── style.css              # Custom UI styles (dark mode, thinking blocks)\n├── ReasonForge_Colab.ipynb    # One-click Colab deployment notebook\n├── pyproject.toml\n├── requirements.txt\n├── run_tests.bat              # Local tests launcher (Windows)\n└── run_ui.bat                 # Local UI launcher (Windows)\n```\n\n## Quick Start (Local)\n\n```bash\n# Requires: Ollama running with a supported model (qwen3:8b, qwen3:32b, etc.)\nuv sync\nuv run python -m ui.app\n# Open at http://localhost:7861\n```\n\n### Endpoint Defaults (Basic Robustness)\n\n- Outbound model endpoints default to localhost-only.\n- Allow remote endpoints explicitly with `RF_ALLOW_REMOTE_ENDPOINTS=1`.\n- Extend allowed hosts with `RF_ENDPOINT_ALLOWLIST`.\n\nExamples:\n\n```bash\nexport RF_ENDPOINT_ALLOWLIST=\"localhost,127.0.0.1,::1,api.mycompany.com\"\nexport RF_ALLOW_REMOTE_ENDPOINTS=1\n```\n\n### Code Tool Docker Option (Basic Sandbox)\n\n`code_tool` supports optional Docker isolation with safe fallback:\n\n- `RF_CODE_TOOL_ISOLATION=auto` (default): use Docker if available, else process mode\n- `RF_CODE_TOOL_ISOLATION=docker`: prefer Docker, fallback to process if unavailable\n- `RF_CODE_TOOL_ISOLATION=process`: force process mode\n\nOptional image override:\n\n```bash\nexport RF_CODE_TOOL_DOCKER_IMAGE=python:3.11-alpine\n```\n\n## Colab Deployment (GPU)\n\nOpen `ReasonForge_Colab.ipynb` in Google Colab Pro with an A100 GPU.\nIt clones this repo, installs Ollama + `qwen3:32b`, and launches the UI with a public Gradio link.\n\n## Benchmarking\n\n```bash\n# Math benchmark — MATH-500 (requires Ollama running)\nuv run python -m tests.math_benchmark --model llama3.2:3b --n 10\nuv run python -m tests.math_benchmark --model qwen3:32b --n 50 --think\n\n# Code benchmark — HumanEval (requires Ollama running)\nuv run python -m tests.code_benchmark --model qwen3:8b --n 20\nuv run python -m tests.code_benchmark --model qwen3:32b --n 164 --think\n```\n\n## Running Sanity Tests\n\n```bash\nuv run python -m tests.sanity\n```\n\n## Running All Unit Tests\n\n```bash\nuv run python -m tests.test_all\n```\n\n## Running Release Gate\n\n```bash\nuv run python -m tests.release_gate\n```\n\n## Benchmark Results\n\n### MATH-500 (`qwen3:8b`, 50 problems)\n\n| Metric | Baseline | ReasonForge |\n|---|---|---|\n| **Correct** | 43/50 | **45/50** |\n| **Uniform Accuracy** | 86.0% | **90.0%** (▲ +4.0%) |\n| **Weighted Score**  | 144/176 | **154/176** |\n| **Weighted Accuracy** | 81.8% | **87.5%** (▲ +5.7%) |\n\n- **Delegation:** 40.0% (20/50) of tasks used tools\n- **Avg Rounds:** 1.5 \n- **Avg Time:** Baseline 46.3s vs ReasonForge 31.0s (Δ -15.2s)\n\n#### By Difficulty\n```text\nLevel 1      5/5   100%  ████████████████████\nLevel 2      7/7   100%  ████████████████████\nLevel 3      8/9   89%   █████████████████\nLevel 4     14/15  93%   ██████████████████\nLevel 5     11/14  79%   ███████████████  (+14%)\n```\n\n#### By Category\n```text\nAlgebra                   10/12  83%   ████████████████\nCounting \u0026 Probability     4/4   100%  ████████████████████\nGeometry                   4/4   100%  ████████████████████\nIntermediate Algebra      11/13  85%   ████████████████  (+8%)\nNumber Theory              2/2   100%  ████████████████████\nPrealgebra                 7/7   100%  ████████████████████\nPrecalculus                7/8   88%   █████████████████  (+12%)\n```\n\n### HumanEval (Code: `qwen3:8b`, 160 problems)\n\n| Metric | Baseline | ReasonForge |\n|---|---|---|\n| **Pass@1** | 4/160 | **102/160** |\n| **Accuracy** | 2.5% | **63.7%** (▲ +61.2%) |\n\n- **Delegation:** 31.2% (50/160) of tasks used tools\n- **Avg Rounds:** 1.5 \n- **Avg Time:** Baseline 23.9s vs ReasonForge **24.8s** (Δ +0.9s)\n- **Wins vs Losses:** ReasonForge successfully solved 100 problems that the Baseline failed on, while only losing 2.\n\n### Key Takeaways\n\nTesting the **8-billion parameter** `qwen3` model reveals exactly why deterministic tool-delegation is crucial for smaller models:\n\n1. **Math (MATH-500):** While both models achieved incredibly high baseline accuracy, giving the model access to the SymPy backend **massively reduced latency** (cutting the average computation time from `46.3s` down to `31.0s`), all while squeezing out an extra `~5%` in weighted grading accuracy.\n2. **Code (HumanEval):** Without sandboxed execution tools, the 8B model almost entirely collapsed on HumanEval, only passing a dismal `4/160` (2.5%) of the problems. However, the simple addition of the ReasonForge Python runtime tools allowed the exact same model to safely hypothesize, test, and iteratably structure its code, propelling its accuracy to **102/160 (63.7%)**—a gigantic **+61.2% improvement** with zero fine-tuning required.\n\n## Tech Stack\n\n- **LLM Backend:** [Ollama](https://ollama.com) (local) or any OpenAI-compatible API\n- **Math Engine:** [SymPy](https://sympy.org) — symbolic computation\n- **Math Grading:** [math-verify](https://github.com/huggingface/Math-Verify) — deterministic LaTeX parser (Linux/Colab)\n- **Code Grading:** Self-contained HumanEval harness (inspired by [openai/human-eval](https://github.com/openai/human-eval))\n- **UI:** [Gradio](https://gradio.app) — chat interface with LaTeX rendering\n- **Protocol:** [MCP](https://modelcontextprotocol.io) (Model Context Protocol) compatible\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRoyCoding8%2FReasonForge-MCP-Server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRoyCoding8%2FReasonForge-MCP-Server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRoyCoding8%2FReasonForge-MCP-Server/lists"}