{"id":51169200,"url":"https://github.com/amgustav/kineforge","last_synced_at":"2026-06-26T23:00:27.606Z","repository":{"id":367345903,"uuid":"1280292113","full_name":"amgustav/kineForge","owner":"amgustav","description":"RL-first robot policy testbed: train, stress-test, evaluate, and replay MuJoCo robot policies.","archived":false,"fork":false,"pushed_at":"2026-06-25T16:00:36.000Z","size":16,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-25T16:22:12.266Z","etag":null,"topics":["embodied-ai","gymnasium","mujoco","ppo","reinforcement-learning","robot-learning","robotics","robotics-simulation","simulation","stable-baselines3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amgustav.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-06-25T12:52:41.000Z","updated_at":"2026-06-25T16:06:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/amgustav/kineForge","commit_stats":null,"previous_names":["amgustav/kineforge"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/amgustav/kineForge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amgustav%2FkineForge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amgustav%2FkineForge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amgustav%2FkineForge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amgustav%2FkineForge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amgustav","download_url":"https://codeload.github.com/amgustav/kineForge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amgustav%2FkineForge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34835779,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-26T02:00:06.560Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embodied-ai","gymnasium","mujoco","ppo","reinforcement-learning","robot-learning","robotics","robotics-simulation","simulation","stable-baselines3"],"created_at":"2026-06-26T23:00:26.860Z","updated_at":"2026-06-26T23:00:27.601Z","avatar_url":"https://github.com/amgustav.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# kineForge\n\n**Train, stress-test, and evaluate robot policies in simulation.**\n\nkineForge is an open-source RL-first embodied AI testbed. It trains a small MuJoCo robot arm with PPO, evaluates it under configurable failure modes, and writes reproducible local run artifacts.\n\nThe repo is intentionally small: one robot, one task, one reward config, one training loop, one eval gate, JSON reports, and matplotlib PNG replay/diagnostic plots.\n\n```text\nrobot → task → reward → train → failures → eval → scorecard → replay\n```\n\n---\n\n## Quickstart\n\n```bash\ngit clone https://github.com/amgustav/kineForge.git\ncd kineForge\n\npython3 -m venv .venv\nsource .venv/bin/activate\n\npip install -e .\npython -m pytest tests/test_env_smoke.py -q\n```\n\nTrain a smoke-test policy:\n\n```bash\npython train.py --task tabletop_reach --robot arm_v0 --timesteps 1000 --seed 1\n```\n\nTrain the recommended local learning run:\n\n```bash\npython train.py --task tabletop_reach --robot arm_v0 --timesteps 25000 --seed 1\n```\n\nEvaluate the normal no-failure gate:\n\n```bash\npython eval.py --policy runs/latest/policy.zip --seed 1\n```\n\nStress-test it with failure modes:\n\n```bash\npython eval.py --policy runs/latest/policy.zip --failures moved_target,noisy_observation,weak_actuator --seed 1\n```\n\nRun the default eval matrix. This evaluates the same policy in `baseline`, `target_shift`, `low_friction`, `high_friction`, `observation_noise`, `action_noise`, and `combined_hard` scenarios:\n\n```bash\npython eval_matrix.py --policy runs/latest/policy.zip --seed 1\n```\n\nRun a custom named eval matrix:\n\n```bash\npython eval_matrix.py --policy runs/latest/policy.zip --scenario baseline= --scenario target_shift=moved_target --scenario observation_noise=noisy_observation --scenario action_noise=action_noise --scenario combined_hard=moved_target,noisy_observation,action_noise,weak_actuator --seed 1\n```\n\nCompare two eval matrix summaries:\n\n```bash\npython compare_eval.py --before runs/eval-matrix-YYYYMMDD-HHMMSS/matrix_summary.json --after runs/eval-matrix-YYYYMMDD-HHMMSS/matrix_summary.json --output runs/matrix_comparison.json\n```\n\nCurrent status: kineForge has the MuJoCo tabletop reach environment, PPO training, YAML configs, deterministic eval, JSON scorecards, trajectory PNGs, timestamped runs, explicit seeds, metadata, config snapshots, eval matrices, replay indexes, and matrix summary comparison.\n\nOutputs:\n\n```text\nruns/\n  train-YYYYMMDD-HHMMSS/\n    policy.zip\n    train_metadata.json\n    config_snapshot.yaml\n  eval-YYYYMMDD-HHMMSS/\n    policy.zip\n    scorecard.json\n    eval_metadata.json\n    config_snapshot.yaml\n    trajectory.png\n    distance_over_time.png\n    episode_rewards.png\n  eval-matrix-YYYYMMDD-HHMMSS/\n    policy.zip\n    train_metadata.json        # present when the evaluated policy came from a kineForge train run\n    matrix_summary.json\n    replay_index.json\n    report.html\n    summary.csv\n    scenarios/\n      baseline/\n        scorecard.json\n        eval_metadata.json\n        config_snapshot.yaml\n        trajectory.png\n        distance_over_time.png\n        episode_rewards.png\n      combined_hard/\n        scorecard.json\n        eval_metadata.json\n        config_snapshot.yaml\n        trajectory.png\n        distance_over_time.png\n        episode_rewards.png\n  latest/\n    policy.zip\n    scorecard.json\n    train_metadata.json        # present when the evaluated policy came from a kineForge train run\n    eval_metadata.json\n    config_snapshot.yaml\n    trajectory.png\n    distance_over_time.png\n    episode_rewards.png\n```\n\n---\n\n## What it does\n\n|                     |                                                              |\n| ------------------- | ------------------------------------------------------------ |\n| **Robot**           | 2-DoF Reacher-style MuJoCo arm                               |\n| **Task**            | tabletop reach-to-target                                     |\n| **Training**        | PPO from scratch via Stable-Baselines3                       |\n| **Environment API** | Gymnasium                                                    |\n| **Configs**         | YAML robot, task, reward, randomization, and failure configs |\n| **Failure modes**   | moved target, noisy observation, action noise, weak actuator; low/high friction are documented matrix placeholders |\n| **Eval output**     | JSON scorecards, matrix summaries, replay indexes, and eval metadata |\n| **Replay output**   | matplotlib trajectory, distance, and reward PNGs             |\n| **Tests**           | smoke tests for env behavior, configs, scorecards, matrices, and plots |\n\nA short `1000` timestep run is a smoke test. It proves the pipeline works; the `25000` timestep command is the recommended local run expected to pass normal no-failure eval.\n\n---\n\n## Scorecard\n\nEvaluation writes a machine-readable scorecard with:\n\n* `summary`: success rate, mean final distance, timeout rate, mean episode reward, and collision rate.\n* `gate.status`: `PASS` only when all gate criteria pass.\n* `gate.thresholds`: min success rate, max mean final distance, and max timeout rate.\n* `per_episode`: seed, success, timeout, final distance, episode reward, step count, target/final positions, and active failures for each episode.\n* `failure_modes`: sorted failure modes active during evaluation.\n* `artifacts`: paths to the policy snapshot, scorecard, metadata, config snapshot, and PNG plots.\n* `collision_rate_explanation`: v0 does not implement collision detection, so `collision_rate` is fixed at `0.0` and is not a safety metric.\n\nThe eval gate separates two different things:\n\n* did the code run?\n* did the policy actually satisfy the task threshold?\n\nA failed gate after short training means the policy did not pass yet, not that the repo is broken.\n\n---\n\n## Eval matrix\n\n`eval_matrix.py` runs one policy snapshot across multiple named scenarios. Each scenario uses the same robot, task, reward config, seed, and episode count, with only the scenario failure set changing.\n\nScenario syntax is `name=failure_a,failure_b`. Use `name=` for a no-failure scenario. If no scenarios are provided, the default matrix runs:\n\n* `baseline=` — no injected failures.\n* `target_shift=moved_target` — moved target offset.\n* `low_friction=low_friction` — documented limitation; not physically modeled in v0.\n* `high_friction=high_friction` — documented limitation; not physically modeled in v0.\n* `observation_noise=noisy_observation` — Gaussian observation noise.\n* `action_noise=action_noise` — Gaussian action perturbation.\n* `combined_hard=moved_target,noisy_observation,action_noise,weak_actuator` — combined compatible stressors. It intentionally excludes friction placeholders until friction is physically modeled.\n\nEach matrix run writes:\n\n* one scenario directory per scenario under `scenarios/\u003cname\u003e/`;\n* one `scorecard.json` per scenario;\n* one aggregate `matrix_summary.json`;\n* one `replay_index.json` mapping scenario names to replay PNG artifacts that were written;\n* one static `report.html` for local inspection;\n* one `summary.csv` for spreadsheet, CLI, or downstream analysis.\n\nOpen the static matrix report locally after a run:\n\n```bash\npython eval_matrix.py --policy runs/latest/policy.zip --seed 1\nopen runs/eval-matrix-YYYYMMDD-HHMMSS/report.html\n```\n\nThe matrix output directory shape is:\n\n```text\nruns/eval-matrix-YYYYMMDD-HHMMSS/\n  matrix_summary.json\n  replay_index.json\n  report.html\n  summary.csv\n  scenarios/\n    \u003cscenario\u003e/\n      scorecard.json\n```\n\n`report.html` is a dependency-free local summary for quick inspection. `summary.csv` has one row per scenario and can be opened in a spreadsheet, inspected with command-line tools, or imported into downstream analysis.\n\n`compare_eval.py` compares two `matrix_summary.json` files and reports aggregate and per-scenario metric deltas.\n\n---\n\n\n## How it works\n\n`TabletopReachEnv` wraps a simple MuJoCo arm as a Gymnasium environment.\n\nObservations include joint state, end-effector position, target position, and the relative vector to the target. Actions are continuous joint controls.\n\nTraining is handled by `train.py` using Stable-Baselines3 PPO.\n\nEvaluation is handled by `eval.py`, which snapshots the policy, can inject configured failures, then writes a scorecard, metadata, config snapshot, trajectory plot, distance plot, and episode reward plot. Eval matrices are handled by `eval_matrix.py`; summary comparison is handled by `compare_eval.py`.\n\nReward terms are loaded from YAML and include distance-to-target, progress shaping, success bonus, control penalty, and timeout penalty.\n\n---\n\n## Repo layout\n\n```text\nconfigs/\n  failures/basic_failures.yaml\n  rewards/reach_v0.yaml\n  robots/arm_v0.yaml\n  tasks/tabletop_reach.yaml\n\nkineforge/\n  envs/tabletop_reach_env.py\n  config.py\n  evals.py\n  eval_artifacts.py\n  randomization.py\n  matrix.py\n  replay.py\n  reports.py\n  rewards.py\n\ntests/\n  test_env_smoke.py\n\ntrain.py\neval.py\neval_matrix.py\ncompare_eval.py\n```\n\n---\n\n## Current limitations\n\n* one simple robot arm\n* one tabletop reaching task\n* basic failure injection\n* no collision detection\n* no real robot deployment\n* PNG plots only\n* no web, cloud, database, or backend\n\n---\n\n## Roadmap\n\nNext experiment-quality steps:\n\n* config sweep runner\n* configurable matrix presets\n* stricter eval gates\n* real collision/contact metrics\n* optional video replay\n\n---\n\n## Credits\n\nBuilt with MuJoCo, Gymnasium, Stable-Baselines3, NumPy, PyYAML, matplotlib, and pytest.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famgustav%2Fkineforge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famgustav%2Fkineforge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famgustav%2Fkineforge/lists"}