{"id":20670680,"url":"https://github.com/amyangxyz/mikapo","last_synced_at":"2026-05-03T04:01:00.059Z","repository":{"id":258469539,"uuid":"859071345","full_name":"AmyangXYZ/MiKaPo","owner":"AmyangXYZ","description":"Real-time MMD motion capture on Web","archived":false,"fork":false,"pushed_at":"2025-01-26T23:51:26.000Z","size":198193,"stargazers_count":415,"open_issues_count":1,"forks_count":40,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-05-09T03:39:47.682Z","etag":null,"topics":["ai","mediapipe","mmd","motion-capture","pose","react","web"],"latest_commit_sha":null,"homepage":"https://mikapo.amyang.dev","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AmyangXYZ.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-18T03:10:41.000Z","updated_at":"2025-05-08T20:49:58.000Z","dependencies_parsed_at":"2024-11-16T20:35:03.508Z","dependency_job_id":null,"html_url":"https://github.com/AmyangXYZ/MiKaPo","commit_stats":null,"previous_names":["amyangxyz/mikapo"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmyangXYZ%2FMiKaPo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmyangXYZ%2FMiKaPo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmyangXYZ%2FMiKaPo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmyangXYZ%2FMiKaPo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AmyangXYZ","download_url":"https://codeload.github.com/AmyangXYZ/MiKaPo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254471061,"owners_count":22076585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","mediapipe","mmd","motion-capture","pose","react","web"],"created_at":"2024-11-16T20:22:39.595Z","updated_at":"2026-05-03T04:01:00.051Z","avatar_url":"https://github.com/AmyangXYZ.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MiKaPo: Real-time MMD Motion Capture\n\nA web-based tool that drives MikuMikuDance (MMD) models — **full body, both hands, and face** — from a webcam, video, or photo in real time. One shot, no offline preprocessing, no multi-pass.\n\n## Overview\n\n[MiKaPo](https://mikapo.vercel.app) covers all three motion modalities in one pipeline:\n\n- **Body and hands** are driven by MMD **bone rotations** — 3D landmarks from MediaPipe are mapped to per-bone quaternions in each bone's parent-local frame.\n- **Face is driven by MMD morphs**, not bone retargeting — face blendshapes from MediaPipe are converted directly into MMD morph weights (`まばたき`, `あ`, `ワ`, `ウィンク`, `ウィンク右`), which is how MMD models are natively rigged for facial expression. Eye direction is the one face channel that does drive bones (`左目` / `右目`).\n\nThe hard part isn't detection — it's the transformation. MediaPipe and MMD use different coordinate systems, every MMD model has its own rest-pose reference directions, and the bone hierarchy means each rotation has to be computed in its parent chain's local space.\n\n**MiKaPo 2.0** is a complete rewrite of the solver:\n\n- Hierarchical bone solver with per-frame parent-chain transforms\n- Auto-calibration from each loaded model's rest pose — no hardcoded reference vectors\n- One-Euro filter for jitter reduction without lag\n- Swing-twist quaternion decomposition for clean forearm rotation\n- Migrated from Vite → Next.js\n- Renderer migrated from [babylon-mmd](https://github.com/noname0310/babylon-mmd) to my custom WebGPU MMD renderer [Reze Engine](https://github.com/AmyangXYZ/reze-engine)\n\n![](./screenshots/1.png)\n![](./screenshots/2.png)\n![](./screenshots/3.png)\n![](./screenshots/3.webp)\n![](./screenshots/4.webp)\n\nDemo model: 深空之眼 - 裁暗之锋·塞尔凯特\n\n## Features\n\n- **Holistic capture** — body pose, both hands (21 points each), and face all run through one MediaPipe HolisticLandmarker pass\n- **Body \u0026 hands → MMD bones** — 33-point pose drives upper/lower body, arms, legs, and per-finger phalanges; forearm twist via swing-twist decomposition\n- **Face → MMD morphs** — face blendshapes convert directly to native MMD morph weights (`まばたき`, `あ`, `ワ`, `ウィンク`, `ウィンク右`); eye gaze drives `左目` / `右目` bones\n- **Per-model calibration** — reference directions derived from each loaded MMD's rest pose at load time, so swapping models works without a config file\n- **Three input modes** — webcam (live), uploaded video, single image\n- **Custom model upload** — drop a PMX folder to swap the default avatar\n- **VMD export** — record live capture to a standard MMD `.vmd` motion file (30fps)\n- **WebGPU rendering** via [Reze Engine](https://github.com/AmyangXYZ/reze-engine)\n\n## Stack\n\n- **Detection** — [MediaPipe HolisticLandmarker](https://ai.google.dev/edge/mediapipe/solutions/vision/holistic_landmarker)\n- **Renderer** — [Reze Engine](https://github.com/AmyangXYZ/reze-engine) (custom WebGPU MMD)\n- **Framework** — [Next.js 15](https://nextjs.org/)\n- **UI** — Tailwind v4 + shadcn/ui\n\n## Run locally\n\n```bash\nnpm install\nnpm run dev\n```\n\nThen open [http://localhost:4000](http://localhost:4000).\n\n## How the solver works\n\nMediaPipe gives world-space 3D landmark positions per frame. MMD bones rotate in their parent's local frame, with each model defining its own rest orientation. The solver bridges these:\n\n1. **Calibrate (once, on model load)** — read each rest-pose bone world position from the loaded MMD. Since the bone chain is identity at rest, world-space `parent → child` direction equals the parent-local reference direction.\n2. **Solve (per frame, per bone)** — compose the parent chain into a single quaternion, invert to get world-to-parent-local, transform the runtime landmarks into that frame, then rotate the calibrated reference onto the live direction.\n3. **Smooth** — pass each output quaternion through a [One-Euro filter](https://gery.casiez.net/1euro/) to remove jitter without lag.\n\n```typescript\nfunction solveBone(name: string, parentChain: string[], landmarks): Quaternion {\n  // Compose parent rotations and invert to get world → parent-local\n  const parentQ = parentChain.reduce((acc, p) =\u003e acc.multiply(boneStates[p].rotation), Quaternion.Identity())\n  const worldToLocal = Matrix.FromQuaternion(parentQ).invert()\n\n  // Transform landmarks into parent-local space\n  const head = Vector3.TransformCoordinates(landmarks.head, worldToLocal)\n  const tail = Vector3.TransformCoordinates(landmarks.tail, worldToLocal)\n  const direction = tail.subtract(head).normalize()\n\n  // Rotate the rest-pose reference onto the runtime direction\n  const reference = calibratedRefs[name] ?? DEFAULT_REFS[name]\n  return Quaternion.FromUnitVectorsToRef(reference, direction, new Quaternion())\n}\n```\n\n### Notable cases\n\n- **Forearm twist** (`左手捩` / `右手捩`) — uses swing-twist decomposition along the elbow's forearm axis. A naive Euler-based approach bleeds wrist roll into pitch/yaw and gimbals.\n- **Lower body bend** (`下半身`) — 3-axis Gram-Schmidt basis from hip line + spine direction so the pelvis tilts forward when leaning, instead of staying vertical and kinking the spine at the waist.\n- **Head** (`頭`) — single rotation matrix from a Gram-Schmidt basis (ear axis + ear→eye direction) decomposed to a quaternion, instead of two `FromUnitVectors` calls composed (which compounds error).\n- **Ankle** (`左足首` / `右足首`) — calibrated from the `足首 → つま先` bone direction; runtime uses `ankle → foot_index` landmarks (not heel) so the rest and runtime measurement frames line up.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famyangxyz%2Fmikapo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famyangxyz%2Fmikapo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famyangxyz%2Fmikapo/lists"}