{"id":50925141,"url":"https://github.com/henry1786580051-lang/subforge","last_synced_at":"2026-06-16T22:01:24.955Z","repository":{"id":360797488,"uuid":"1251751403","full_name":"henry1786580051-lang/SubForge","owner":"henry1786580051-lang","description":"AI 驱动的视频字幕工具 — 语音转录、字幕优化、智能翻译，一站完成。","archived":false,"fork":false,"pushed_at":"2026-06-07T02:51:39.000Z","size":21264,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-07T04:14:42.245Z","etag":null,"topics":["ai","bilingual-subtitles","fastapi","llm","media-processing","nextjs","python","speech-recognition","speech-to-text","subtitle","subtitle-generator","transcription","translation","tts","video-processing","video-subtitles","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/henry1786580051-lang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-27T22:04:53.000Z","updated_at":"2026-06-07T02:51:42.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/henry1786580051-lang/SubForge","commit_stats":null,"previous_names":["henry1786580051-lang/subforge"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/henry1786580051-lang/SubForge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henry1786580051-lang%2FSubForge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henry1786580051-lang%2FSubForge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henry1786580051-lang%2FSubForge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henry1786580051-lang%2FSubForge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/henry1786580051-lang","download_url":"https://codeload.github.com/henry1786580051-lang/SubForge/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henry1786580051-lang%2FSubForge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34425024,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-16T02:00:06.860Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","bilingual-subtitles","fastapi","llm","media-processing","nextjs","python","speech-recognition","speech-to-text","subtitle","subtitle-generator","transcription","translation","tts","video-processing","video-subtitles","whisper"],"created_at":"2026-06-16T22:01:23.968Z","updated_at":"2026-06-16T22:01:24.950Z","avatar_url":"https://github.com/henry1786580051-lang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SubForge\n\n![License](https://img.shields.io/badge/license-GPL--3.0-blue.svg)\n![Python](https://img.shields.io/badge/Python-3.10--3.12-3776AB?logo=python\u0026logoColor=white)\n![Next.js](https://img.shields.io/badge/Next.js-16-000000?logo=next.js\u0026logoColor=white)\n![FastAPI](https://img.shields.io/badge/FastAPI-0.115+-009688?logo=fastapi\u0026logoColor=white)\n![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows%20%7C%20Linux-lightgrey)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"furnace_app_icon_v2.svg\" alt=\"SubForge Logo\" width=\"120\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshot.png\" alt=\"SubForge Screenshot\" width=\"800\" /\u003e\n\u003c/p\u003e\n\nSubForge 是一个 AI 驱动的视频字幕工具，覆盖转录、断句、优化、翻译、字幕样式与视频合成等流程。它既可以作为桌面/网页工具使用，也可以通过 CLI 和 Python 模块集成到自己的工作流中。\n\n## 能做什么\n\n| 能力 | 说明 |\n| --- | --- |\n| 语音转文字 | 默认使用 WhisperX 工作流：MLX Whisper 转录 + forced alignment 词级时间轴 + VAD 保守校验 |\n| 智能断句 | 使用 LLM 按语义重排字幕，避免机械切分和超长字幕 |\n| 字幕优化 | 自动修正错别字、补全标点、去除冗余语气词 |\n| 智能翻译 | 支持上下文感知翻译、反思翻译和免费翻译引擎 |\n| 双语字幕 | 可导出 SRT、VTT、ASS、TXT、JSON 等格式 |\n| 语音合成 | 支持字幕配音与视频合成相关工作流 |\n| Web 界面 | 拖拽上传、实时进度、在线编辑、请求日志查看 |\n\n## 核心技术与实测\n\nSubForge 的重点不只是“把语音转成文字”，而是尽量让字幕达到可发布、可阅读、可翻译的状态。当前主流程围绕 Apple Silicon 本地转录、WhisperX 对齐和上下文翻译组织：\n\n### 转录优化\n\n转录前会先对音频做预处理，再使用 MLX Whisper 完成 Apple Silicon 加速转录。文本在对齐前会将数字、单位和符号转为可对齐的口语形式，然后由 WhisperX forced alignment 生成词级时间轴。TEN-VAD 仅对可疑边界做保守校验，避免为了修正少量错误而破坏已经正确的字幕：\n\n```text\n原始音频\n  -\u003e DeepFilterNet3 可选降噪\n  -\u003e MLX Whisper 转录\n  -\u003e 数字/单位/符号语音规范化\n  -\u003e WhisperX forced alignment\n  -\u003e TEN-VAD 时间轴保守校验（Silero VAD 回退）\n  -\u003e 词级时间轴\n  -\u003e 智能断句与翻译\n```\n\n| 技术 | 作用 |\n| --- | --- |\n| DeepFilterNet3 | 降低车内、户外、咖啡厅等场景的背景噪音，突出人声 |\n| MLX Whisper | Apple Silicon 专门优化的本地 Whisper 推理，默认使用本地 MLX 模型 |\n| WhisperX forced alignment | 使用独立对齐模型把转录文本落到词级时间轴 |\n| 对齐前语音规范化 | 将 `350` 、`mph` 、`kg` 等数字与单位展开为可对齐的口语 token，对齐后恢复原文展示 |\n| TEN-VAD | 默认的语音活动检测器，用于校验可疑句首和句尾，不全局覆盖 WhisperX 的正确对齐 |\n| Silero VAD | TEN-VAD 不可用或运行失败时的回退方案，保证跨平台可用性 |\n| Content Integrity Score | 用语音时长比例监控内容完整性，避免参数过严导致漏转录 |\n| Whisper.cpp 兼容通道 | 保留 whisper.cpp 作为备用本地引擎，适合已有 ggml 模型的用户 |\n\n### 可靠性与桌面端\n\n- 转录、断句和翻译任务通过 WebSocket 实时推送进度与中间结果。\n- 上传文件、缩略图和导出结果使用独立路径与范围请求处理，避免同名文件或并发任务相互覆盖。\n- LLM 请求日志按请求绑定，并发翻译不会错配 prompt 和 response。\n- macOS 桌面包内置 FFmpeg、DeepFilterNet3 运行时和 TEN-VAD；Whisper/forced alignment 模型由用户在设置页管理，不重复打包大模型。\n\n### 智能翻译\n\nLLM 翻译会结合上下文处理整段内容，而不是逐句机械直译。对于表达质量要求更高的视频，可以启用反思翻译模式：\n\n```text\n初译 -\u003e 反思机翻痕迹和语境问题 -\u003e 重写为更自然的译文\n```\n\n示例：\n\n| 阶段 | 内容 |\n| --- | --- |\n| 初译 | 今天我们驾驶的是全新2026款雷克萨斯ES 350h。 |\n| 反思 | “今天我们驾驶的是”偏英文语序，“全新2026款”更像新闻稿。 |\n| 重写 | 今天来试试2026新款雷克萨斯ES 350h。 |\n\n### Token 消耗\n\n以下是 21 分钟英文试驾视频的实测数据，模型为小米 MiMo v2.5-pro，流程包含智能分句、纠错优化和智能翻译：\n\n| 阶段 | 调用次数 | Prompt | Completion | 其中 Reasoning | Total | 耗时 |\n| --- | ---: | ---: | ---: | ---: | ---: | ---: |\n| 智能分句 | 12 | 17,709 | 51,876 | 44,322 | 69,585 | 13.8min |\n| 纠错优化 | 21 | 20,660 | 77,424 | 70,844 | 98,084 | 23.2min |\n| 智能翻译 | 20 | 13,103 | 16,745 | 10,923 | 29,848 | 5.6min |\n| 合计 | 53 | 51,472 | 146,045 | 126,089 | 197,517 | 42.5min |\n\nMiMo v2.5-pro 是推理模型，Reasoning tokens 占 Completion 的主要部分。实际输出约 20K tokens，最终得到 389 段双语字幕。使用 Bing / Google 等免费翻译引擎可以跳过 LLM 翻译阶段；使用 LM Studio、Ollama 等本地模型则可以进一步降低 API 成本。\n\n## 快速开始\n\n### 运行 Web 版本\n\n```bash\ngit clone https://github.com/henry1786580051-lang/SubForge.git\ncd SubForge\n\nuv sync\nPYTHONPATH=backend .venv/bin/uvicorn app.main:app --port 8000\n```\n\n另开一个终端启动前端：\n\n```bash\ncd frontend\nnpm install\nnpm run dev\n```\n\n打开 \u003chttp://localhost:3000\u003e 即可使用。\n\n### 使用 CLI\n\n```bash\nuv run subforge --help\nuv run subforge doctor\n```\n\n常用命令包括：\n\n```bash\nuv run subforge transcribe input.mp4\nuv run subforge subtitle input.srt\nuv run subforge dub input.srt\n```\n\n### 启动桌面版\n\n```bash\nuv run subforge-gui\n```\n\n## 推荐配置\n\n### LLM\n\n智能断句、优化和翻译使用 OpenAI 兼容接口。可以在设置页中配置 API Base、API Key 和模型名称。\n\n| 提供商 | 示例模型 |\n| --- | --- |\n| 小米 MiMo | `mimo-v2.5-pro` |\n| DeepSeek | `deepseek-chat` |\n| OpenAI | `gpt-4o` / `gpt-4o-mini` |\n| 通义千问 | `qwen-plus` |\n| 本地模型 | LM Studio / Ollama 等 OpenAI 兼容服务 |\n\n### ASR\n\n| 引擎 | 适合场景 |\n| --- | --- |\n| WhisperX + MLX Whisper | 默认推荐；Apple Silicon 本地加速，配合 forced alignment 生成词级时间轴 |\n| WhisperX Alignment | forced alignment 模型分类管理，用于英语等语言的词级对齐 |\n| Whisper.cpp | 备用本地转录通道，适合已有 ggml 模型的用户 |\n| Whisper API | 云端转录，配置简单 |\n\n## 示例\n\n项目内提供了一组 Lexus 试驾视频字幕样例，用来展示 ASR 原始字幕和 SubForge 处理后的差异：\n\n- [ASR 原始输出](examples/lexus_original.srt)\n- [断句与翻译后输出](examples/lexus_processed.srt)\n\n处理后的字幕会更短、更自然，并尽量保持完整语义：\n\n```text\n官方数据显示，它高速巡航的油耗大概在百公里4.9升左右。\n市区大概百公里5.4升，综合下来5.1升左右。\n这油耗表现相当惊人。\n综合马力有244匹。\n```\n\n## 项目结构\n\n```text\nSubForge/\n├── frontend/        # Next.js Web 界面\n├── backend/         # FastAPI 服务\n├── subforge/        # Python 核心库、CLI、桌面端\n├── docs/            # VitePress 文档\n├── resource/        # 字体、图标、翻译、样式资源\n├── tests/           # 自动化测试\n└── examples/        # 示例字幕\n```\n\n## 开发\n\n```bash\nuv sync --group dev\nuv run pytest\nuv run ruff check .\n```\n\n前端：\n\n```bash\ncd frontend\nnpm install\nnpm run lint\nnpm run dev\n```\n\n文档：\n\n```bash\ncd docs\nnpm install\nnpm run docs:dev\n```\n\n## 文档与链接\n\n- 文档站点：\u003chttps://henry1786580051-lang.github.io/SubForge/\u003e\n- 问题反馈：\u003chttps://github.com/henry1786580051-lang/SubForge/issues\u003e\n- 贡献指南：[docs/dev/contributing.md](docs/dev/contributing.md)\n\n## 许可证\n\n本项目基于 [GPL-3.0 License](LICENSE) 发布。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenry1786580051-lang%2Fsubforge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhenry1786580051-lang%2Fsubforge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenry1786580051-lang%2Fsubforge/lists"}