{"id":24211646,"url":"https://github.com/zjrwtx/sft-data-builder","last_synced_at":"2026-03-07T18:04:30.938Z","repository":{"id":261375574,"uuid":"883837434","full_name":"zjrwtx/SFT-data-builder","owner":"zjrwtx","description":"利用免费的大模型api来结合你的私域数据来生成sft训练数据（妥妥白嫖）支持llamafactory等工具的训练数据格式synthetic data","archived":false,"fork":false,"pushed_at":"2024-11-24T06:26:19.000Z","size":514,"stargazers_count":161,"open_issues_count":4,"forks_count":17,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-25T05:05:04.650Z","etag":null,"topics":["agents","alpaca","cot","datagene","gpt40","llm","mllm","multiagents","o1","python","react","sharegpt","slm","synthetic-data","tailwindcss","visionlanguagemodel"],"latest_commit_sha":null,"homepage":"https://sft-data-builder.vercel.app","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjrwtx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-11-05T16:55:07.000Z","updated_at":"2025-05-24T12:20:33.000Z","dependencies_parsed_at":"2024-11-06T08:31:00.055Z","dependency_job_id":"ab64a26e-0b37-4d6e-8782-226a13898851","html_url":"https://github.com/zjrwtx/SFT-data-builder","commit_stats":null,"previous_names":["zjrwtx/sft-data-builder"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjrwtx/SFT-data-builder","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjrwtx%2FSFT-data-builder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjrwtx%2FSFT-data-builder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjrwtx%2FSFT-data-builder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjrwtx%2FSFT-data-builder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjrwtx","download_url":"https://codeload.github.com/zjrwtx/SFT-data-builder/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjrwtx%2FSFT-data-builder/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30225465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T17:00:40.062Z","status":"ssl_error","status_checked_at":"2026-03-07T17:00:39.026Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","alpaca","cot","datagene","gpt40","llm","mllm","multiagents","o1","python","react","sharegpt","slm","synthetic-data","tailwindcss","visionlanguagemodel"],"created_at":"2025-01-14T02:36:10.062Z","updated_at":"2026-03-07T18:04:30.918Z","avatar_url":"https://github.com/zjrwtx.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🤖 捕获月球大模型合成数据平台-公众号：正经人王同学\n### 捕获月球大模型合成数据平台：致力于打造低成本的人人都懂用的多模态合成数据解决方案助力各类大模型的预训练、微调、gpto1(cot)、function calling等训练场景,欢迎加入我们或与我们合作！\n\n在线体验地址：https://sft-data-builder.vercel.app\n演示视频：[https://www.bilibili.com/video/BV1dvDQYBEew/?spm_id_from=333.999.0.0](https://www.bilibili.com/video/BV19qD6YqEJ2/?spm_id_from=333.999.0.0)\n\n![License](https://img.shields.io/badge/license-MIT-blue.svg)\n![Version](https://img.shields.io/badge/version-1.0.0-green.svg)\n![React](https://img.shields.io/badge/React-18.x-61dafb.svg)\n![image](https://github.com/user-attachments/assets/ffd1f820-dd6f-4d11-8411-0c12d6ba76ce)\n\n\u003cimg width=\"861\" alt=\"8d5400bce0635b5e236cba05e923c44\" src=\"https://github.com/user-attachments/assets/531dee7e-949f-4fac-b646-06ab518f5612\"\u003e\n\u003cimg width=\"861\" alt=\"0972de00f8afa29489cba138ecac6ac\" src=\"https://github.com/user-attachments/assets/90202679-5a01-48a2-b80d-e6f1a06f910e\"\u003e\n\n\u003cimg width=\"954\" alt=\"d5445bacd9f03810e326039f9653267\" src=\"https://github.com/user-attachments/assets/8e49cc36-b5aa-419f-a748-141b94a27161\"\u003e\n\u003cimg width=\"954\" alt=\"4570df76058f5bd3e996b4f6bdba9db\" src=\"https://github.com/user-attachments/assets/a353a4ed-77e3-4c63-9948-5f6caabab764\"\u003e\n\n\n\u003cimg width=\"861\" alt=\"a03d915893cfcec4a2ff76e8cf93fbb\" src=\"https://github.com/user-attachments/assets/1fa622c1-3539-41fe-91a9-8d903ae013a8\"\u003e\n\u003cimg width=\"861\" alt=\"cfb9e2c681df09534217d12fc79c1c3\" src=\"https://github.com/user-attachments/assets/7e235e69-2dd1-4ee3-b9c6-7e7a6e1fc317\"\u003e\n![image](https://github.com/user-attachments/assets/c8c2ddf0-f3c6-4baf-9b81-ea21e7422ae9)\n\n\n\u003cimg width=\"861\" alt=\"1fb4e0bc5e6c94936a07184aec76ed6\" src=\"https://github.com/user-attachments/assets/3a152963-d32f-4101-90a4-74e9b20ea1ea\"\u003e\n\u003cimg width=\"861\" alt=\"63303795320f7f0f2410b405a367704\" src=\"https://github.com/user-attachments/assets/eba5efed-39af-42ac-9288-c7a36c1c8377\"\u003e\n\n\u003cimg width=\"861\" alt=\"2bfe538bbe133542a2235bfd4b90df9\" src=\"https://github.com/user-attachments/assets/9862e5ed-61f3-472a-9802-9368d8d757e6\"\u003e\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/images/demo.gif\" alt=\"演示\" width=\"800\"\u003e\n\u003c/p\u003e\n\n## ✨ 特性\n\n- 🎯 **一键生成训练数据**：支持众多openai格式调用的本地或云端模型（包括GLM-4-Flash等免费调用模型） 将普通文本秒变高质量AI训练数据 支持直接从微信公众号文章等链接内容生成训练数据\n- 📝 **合成gpto1类的cot数据合成功能，且可同步上传至huggingface**\n- 📤 **支持vison language model的sharegpt微调格式数据合成**\n- 🔄 **批量生成**：一次生成多条不同角度的训练数据、支持批量url文章自动生成批量数据\n- 📝 **灵活编辑**：所有生成的数据都可以随时编辑和调整\n- 💾 **本地存储**：自动保存所有数据到本地\n- 🔌 **上传合成数据到huggingface平台**：填好accesstoken和仓库等信息后 一键上传合成好的数据到huggingface平台存储或分享给他人\n- 📤 **导出简单**：一键导出标准格式JSON文件\n- 🎨 **优雅界面**：简洁直观的用户界面，操作便捷\n- 🔌 **多模型支持**：支持多种主流AI模型，可自定义模型\n- 📚 **多格式支持**：支持PDF、Word、TXT等多种文件格式\n- 📚 **主流训练格式互换**：增加了 Alpaca训练格式与Openai训练格式互换功能 支持批量文件互换格式\n\n## 📅 更新动态\n### v1.1.4 (2024-11-24)\n- ✨ 支持vison language model的sharegpt微调格式数据合成\n\n### v1.1.3 (2024-11-22)\n- ✨ 增加了合成gpto1类的cot数据合成功能，且可同步上传至huggingface\n\n\n### v1.1.2 (2024-11-20)\n- ✨ 增加了上传合成数据到huggingface平台：填好accesstoken和仓库等信息后 一键上传合成好的数据到huggingface平台存储或分享给他人\n\n### v1.1.1 (2024-11-12)\n- ✨ 增加了 Alpaca训练格式与Openai训练格式互换功能 支持批量文件互换格式\n### v1.1.0 (2024-11-09)\n- ✨ 新增支持批量url文章自动生成批量数据\n- 🔧 优化了数据生成的速度\n- 🐛优化界面\n\n## 🚀 快速开始\n\n### 安装依赖 \n```bash\nnpm install\n```\n### 启动项目\n```bash\nnpm run start\n```\n\n## 📖 使用指南\n\n1. **配置API**\n   - 点击\"打开配置\"按钮\n   - 设置API地址和密钥\n   - 选择或自定义AI模型\n   - 设置每次生成的数据条数\n\n2. **输入内容**\n   - 上传文件（支持PDF、DOCX、TXT）\n   - 或直接输入文本内容\n\n3. **生成数据**\n   - 点击\"生成AI响应\"按钮\n   - 在多个生成结果中切换\n   - 根据需要编辑生成的内容\n\n4. **管理数据**\n   - 添加到数据列表\n   - 预览所有生成的数据\n   - 删除不需要的数据\n   - 导出为JSON文件\n\n## 🎯 训练数据格式\njson\n{\n\"instruction\": \"用户指令\",\n\"input\": \"用户输入（可选）\",\n\"output\": \"AI回答\",\n\"system\": \"系统提示词（可选）\",\n\"history\": [\n[\"历史问题1\", \"历史回答1\"],\n[\"历史问题2\", \"历史回答2\"]\n]\n}\n\n\n## 🛠️ 技术栈\n\n- ⚛️ React 18\n- 🎨 TailwindCSS\n- 📄 PDF.js\n- 📝 Mammoth.js\n- 💾 LocalStorage API\n\n## 📋 待办功能\n\n- [ ] 支持更多文件格式\n- [ ] 添加数据验证功能\n- [ ] 批量导入功能\n- [ ] 数据标签系统\n- [ ] 导出更多格式\n\n## 🤝 贡献指南\n\n1. Fork 本仓库\n2. 创建特性分支 (`git checkout -b feature/AmazingFeature`)\n3. 提交改动 (`git commit -m 'Add some AmazingFeature'`)\n4. 推送到分支 (`git push origin feature/AmazingFeature`)\n5. 提交 Pull Request\n\n## 📜 许可证\n\n本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情\n\n## 👨‍💻 作者\n\n正经人王同学\n\n- 微信公众号：正经人王同学\n- 微信:whatisallineed\n- GitHub：[https://github.com/zjrwtx](https://github.com/zjrwtx)\n- Email：[3038880699@qq.com](mailto:3038880699@qq.com)\n\n## 🌟 Star 历史\n\n[![Star History Chart](https://api.star-history.com/svg?repos=zjrwtx/SFT-data-builder\u0026type=Date)](https://star-history.com/#zjrwtx/SFT-data-builder\u0026Date)\n\n\n## 🙏 致谢\n特别感谢以下开源项目和贡献者：\n- [LaiWei魏来](https://github.com/waltonfuture) -提供算法指导等支持\n- gpto1(cot)数据合成的参考来源 -https://github.com/HKAIR-Lab/HK-O1aw\n- 所有提供反馈和建议的用户\n  \n\n\n\n\n\n---\n\n如果这个项目对你有帮助，请给一个 ⭐️ 鼓励一下！\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjrwtx%2Fsft-data-builder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjrwtx%2Fsft-data-builder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjrwtx%2Fsft-data-builder/lists"}