{"id":50507601,"url":"https://github.com/thought2code/video-driven-skill","last_synced_at":"2026-06-02T17:30:51.566Z","repository":{"id":354785862,"uuid":"1223474130","full_name":"thought2code/video-driven-skill","owner":"thought2code","description":"video driven skill","archived":false,"fork":false,"pushed_at":"2026-05-20T02:15:14.000Z","size":5697,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-20T05:50:43.814Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thought2code.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-28T11:05:33.000Z","updated_at":"2026-05-20T02:15:19.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/thought2code/video-driven-skill","commit_stats":null,"previous_names":["ingorewho/video-driven-skill","thought2code/video-driven-skill"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thought2code/video-driven-skill","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thought2code%2Fvideo-driven-skill","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thought2code%2Fvideo-driven-skill/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thought2code%2Fvideo-driven-skill/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thought2code%2Fvideo-driven-skill/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thought2code","download_url":"https://codeload.github.com/thought2code/video-driven-skill/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thought2code%2Fvideo-driven-skill/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33833277,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-02T17:30:50.165Z","updated_at":"2026-06-02T17:30:51.558Z","avatar_url":"https://github.com/thought2code.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eEnglish\u003c/strong\u003e · \u003ca href=\"README.zh-CN.md\"\u003e简体中文\u003c/a\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eVideo Driven Skill\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eAutomate from how you actually work.\u003c/strong\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  Turn screen recordings into skills you can run, edit, and reuse.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#quick-start\"\u003eQuick Start\u003c/a\u003e · \u003ca href=\"#features\"\u003eFeatures\u003c/a\u003e · \u003ca href=\"#architecture\"\u003eArchitecture\u003c/a\u003e · \u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Java-17-orange?logo=openjdk\u0026logoColor=white\" alt=\"Java 17\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Spring_Boot-4.1-6DB33F?logo=springboot\u0026logoColor=white\" alt=\"Spring Boot 4.1\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/React-19-61DAFB?logo=react\u0026logoColor=white\" alt=\"React 19\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Vite-8-646CFF?logo=vite\u0026logoColor=white\" alt=\"Vite 8\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Tailwind_CSS-4-38B2AC?logo=tailwindcss\u0026logoColor=white\" alt=\"Tailwind CSS 4\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/SQLite-3-003B57?logo=sqlite\u0026logoColor=white\" alt=\"SQLite\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/FFmpeg-007808?logo=ffmpeg\u0026logoColor=white\" alt=\"FFmpeg\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg\" alt=\"License\"\u003e\n\u003c/p\u003e\n\n---\n\n## Overview\n\nVideo Driven Skill is an open-source automation studio that transforms **screen recordings** into **runnable, editable skill packages**. Upload a video, extract key frames, annotate intent, let a multimodal AI model draft the skill — then refine, run, version, archive, and export it.\n\nThe project is designed for teams and individuals who want automation to start from **how work is actually performed**, not from a blank script editor.\n\n\u003e **Workflow:** Record the process → Pick the frames that matter → Annotate intent → Generate a skill → Review \u0026 run → Export \u0026 deploy\n\n---\n\n## Features\n\n- **Video-to-Skill Pipeline** — Upload an operation recording and automatically convert it into a structured skill package with `SKILL.md`, `package.json`, scripts, and variables.\n- **Smart Frame Extraction** — Auto-extract key frames via FFmpeg, or manually capture the moments that matter.\n- **Visual Annotation** — Mark up frames with arrows, notes, and corrections to tell the AI exactly what to do.\n- **Multimodal AI Generation** — Leverages any OpenAI-compatible vision model to generate browser, Android, iOS, or desktop automation code.\n- **In-Browser Code Editor** — Review, edit, and refine generated code with syntax highlighting and variable management.\n- **Incremental Regeneration** — Regenerate the full skill or just a selected code range, with diff review between versions.\n- **Local Skill Runner** — Run skills directly with streamed logs and optional screenshots.\n- **Skill Repository** — Browse, search, import, export (ZIP), and drag-to-reorder your skill collection.\n- **Knowledge Base** — Attach reference images, documents, and notes to each skill for richer context.\n- **Archive System** — Preserve videos, frames, and requirements for building future skills from past material.\n\n---\n\n## Quick Start\n\nInstall [Docker](https://docs.docker.com/get-docker/) first, then choose the path that matches your goal.\n\n### Option 1: Run pre-built images\n\nUse this if you just want to run the app. The install script downloads the release Compose file, creates `.env`, pulls the pre-built images, and starts the stack.\n\n#### macOS / Linux\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/thought2code/video-driven-skill/main/scripts/install.sh | bash\n```\n\n#### Windows\n\n```powershell\nirm https://raw.githubusercontent.com/thought2code/video-driven-skill/main/scripts/install.ps1 | iex\n```\n\nDefault install location:\n\n- macOS / Linux: `~/video-driven-skill`\n- Windows: `%USERPROFILE%\\video-driven-skill`\n\nOpen `http://localhost` after the script finishes (Docker uses standard ports 80 / 443).\n\nTo use AI generation, set your API key in the generated `.env` file:\n\n```env\nAI_API_KEY=your-key-here\nAI_BASE_URL=your-base-url\nAI_MODEL=your-model\n```\n\nCommon install options: `--tag v1.0.0`, `--dir \u003cpath\u003e`, `--no-open`. Local dev with `npm run dev` uses port 3000.\n\n### Option 2: Build from source\n\nUse this for development, unreleased `main`, or local builds. It requires Docker and Git.\n\n```bash\ngit clone https://github.com/thought2code/video-driven-skill.git\ncd video-driven-skill\n```\n\n#### macOS / Linux\n\n```bash\nchmod +x scripts/run-in-docker.sh\n./scripts/run-in-docker.sh\n```\n\n#### Windows\n\n```bat\n.\\scripts\\run-in-docker.cmd\n```\n\nOn first run, `.env` is created from `.env.example`; set `AI_API_KEY` before using AI features:\n\n```env\nAI_API_KEY=your-key-here\nAI_BASE_URL=your-base-url\nAI_MODEL=your-model\n```\n\nFor faster base-image pulls in China, add `--cn`. To skip opening the browser, add `--no-open`.\n\n### Public HTTPS (Let's Encrypt)\n\nThe frontend runs **Caddy** as a reverse proxy. Set a public hostname in `.env` and Caddy will obtain and renew **Let's Encrypt** certificates automatically. With no domain configured, the stack serves **HTTP only** at `http://localhost`.\n\n**Prerequisites**\n\n1. A server with a public IP and Docker installed.\n2. An **A record** for your hostname (e.g. `vds.example.com`) pointing to that IP.\n3. Firewall / security group allowing **80** and **443** (TCP; optional **443/UDP** for HTTP/3).\n\n**Configuration** (see `.env.example`):\n\n```env\nVDS_DOMAIN=vds.example.com\nACME_EMAIL=you@example.com\n```\n\n- `VDS_DOMAIN`: hostname only (no `https://` or path).\n- `ACME_EMAIL`: optional, for Let's Encrypt expiry notices.\n\n**Start**\n\n```bash\ndocker compose up -d --build\n```\n\nOn first start with `VDS_DOMAIN` set, allow time for ACME validation (often 30s–few minutes), then open `https://vds.example.com`. HTTP redirects to HTTPS.\n\nCertificates persist in Docker volumes `caddy-data` and `caddy-config`.\n\n**Troubleshooting**\n\n- Certificate not issued: verify DNS (`dig vds.example.com`) and that ports 80/443 are reachable from the internet.\n- Logs: `docker compose logs -f frontend`\n\n---\n\n## Typical Workflow\n\n1. **Upload** — Upload an operation recording (e.g., a screen capture of a workflow).\n2. **Extract Frames** — Auto-extract key frames or manually capture the moments that matter.\n3. **Annotate** — Mark up frames with arrows, notes, and corrections.\n4. **Describe Intent** — Tell the AI what you want, e.g., \"Collect item names from this page and export them.\"\n5. **Generate** — Let the multimodal model produce a complete skill package.\n6. **Review \u0026 Edit** — Inspect generated code, adjust variables, and refine the output.\n7. **Run** — Execute the skill locally with streamed log output.\n8. **Iterate** — Regenerate the full skill or just a selected section, with diff comparison.\n9. **Export \u0026 Deploy** — Package as a ZIP or deploy to your local skill directory.\n\n---\n\n## Architecture\n\n```text\nvideo-driven-skill/\n├── backend/                 # Spring Boot — API, video processing, AI, skill runner\n├── frontend/                # React + Vite — studio UI\n├── docker-compose.yml           # Docker deployment (build from source)\n├── docker-compose.release.yml   # GHCR images (no clone)\n├── docker-compose.cn.yml        # Optional mirror overlay (local build)\n├── ARCHITECTURE.md              # Architecture (English)\n├── ARCHITECTURE.zh-CN.md        # Architecture (Chinese)\n├── scripts/\n│   ├── install.sh / install.ps1     # Install from GHCR (no clone)\n│   ├── run-in-docker.cmd / .sh      # Build \u0026 run from source\n│   └── kill-midscene.sh         # Optional cleanup helper\n```\n\n### Backend (Spring Boot / Java 17)\n\n| Module                       | Responsibility                                                   |\n|------------------------------|------------------------------------------------------------------|\n| `controller/`                | REST API \u0026 WebSocket entry points                                |\n| `service/VideoService`       | Video upload, FFmpeg frame extraction, streaming                 |\n| `service/AIService`          | Prompt construction \u0026 multimodal API calls                       |\n| `service/SkillService`       | Skill CRUD, import/export, versioning                            |\n| `service/SkillRunnerService` | Workspace setup, dependency injection, execution, log collection |\n| `service/KnowledgeService`   | Per-skill reference files \u0026 manifest                             |\n| `model/` \u0026 `repository/`     | SQLite-backed domain entities                                    |\n\nRuntime data lives under `~/video-driven-skill/` by default (override with `VIDEO_DRIVEN_SKILL_HOME`; on Windows, the same folder name under your user profile):\n\n- `uploads/` — uploaded videos \u0026 extracted frames\n- `skills/` — generated skill source files\n- `archives/` — reusable video/frame/requirement resources\n- `video-driven-skill.db` — SQLite database\n\nWith **Docker Compose**, the same layout is stored at `/data` inside the backend container (Compose volume `app-data`), not under `~/video-driven-skill/`. Inspect the host path with `docker volume inspect video-driven-skill_app-data`.\n\n### Frontend (React + Vite + Tailwind CSS)\n\n| Component                                        | Responsibility                        |\n|--------------------------------------------------|---------------------------------------|\n| `HomePage`                                       | Upload, import, and recent resources  |\n| `PlaygroundPage`                                 | Frame annotation \u0026 skill workspace    |\n| `FrameTimeline` / `FrameAnnotator` / `FrameList` | Visual evidence collection            |\n| `AIProcessor`                                    | Generation control \u0026 streamed status  |\n| `SkillList`                                      | Skill repository with drag-to-reorder |\n| `SkillEditor` / `SkillExport` / `SkillRunner`    | Review, export \u0026 execution            |\n| `RegeneratePanel` / `CodeComparisonView`         | Iteration workflow                    |\n| `KnowledgeBasePanel`                             | Extra context per skill               |\n\n### Skill Package Structure\n\n```text\nSKILL.md              # Skill intent, instructions, and variable docs\npackage.json          # Metadata\nvariables.json        # User-editable runtime inputs\nscripts/main.js       # Executable entrypoint\nknowledge/            # Optional reference files\n```\n\nFor a deeper walkthrough, see [ARCHITECTURE.md](ARCHITECTURE.md).\n\n---\n\n## API Overview\n\n| Method | Path                                  | Purpose                        |\n|--------|---------------------------------------|--------------------------------|\n| `POST` | `/api/videos/upload`                  | Upload a video                 |\n| `POST` | `/api/videos/{id}/frames/auto`        | Auto-extract frames            |\n| `POST` | `/api/videos/{id}/frames/manual`      | Manual frame capture           |\n| `GET`  | `/api/videos/{id}/stream`             | Stream uploaded video          |\n| `GET`  | `/api/skills`                         | List all skills                |\n| `PUT`  | `/api/skills/order`                   | Persist skill ordering         |\n| `POST` | `/api/skills/generate`                | Generate a skill               |\n| `GET`  | `/api/skills/{id}`                    | Read a skill                   |\n| `PUT`  | `/api/skills/{id}/files`              | Update skill files             |\n| `GET`  | `/api/skills/{id}/export`             | Export skill as ZIP            |\n| `POST` | `/api/skills/{id}/regenerate`         | Generate candidate revision    |\n| `POST` | `/api/skills/{id}/partial-regenerate` | Regenerate selected code range |\n| `POST` | `/api/skills/{id}/accept`             | Accept candidate revision      |\n| `GET`  | `/api/skills/{id}/versions`           | List skill versions            |\n| `POST` | `/api/skills/{id}/deploy`             | Deploy skill locally           |\n\n---\n\n## Security \u0026 Privacy\n\nThis repository is prepared for open-source use:\n\n- No API keys or credentials are committed.\n- Local databases, uploads, archives, generated skills, logs, and build outputs are git-ignored.\n- Runtime configuration comes from environment variables or local `.env` files.\n- **Do not** upload private recordings, credentials, customer data, or production screenshots to any public instance.\n\nIf you discover a security issue, please report it responsibly. See [SECURITY.md](SECURITY.md).\n\n---\n\n## License\n\nThis project is licensed under the **MIT License**. See [LICENSE](LICENSE) for details.\n\n---\n\n\u003cp align=\"center\"\u003e\n  Built with care by the \u003cstrong\u003eVideo Driven Skill\u003c/strong\u003e team.\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthought2code%2Fvideo-driven-skill","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthought2code%2Fvideo-driven-skill","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthought2code%2Fvideo-driven-skill/lists"}