{"id":43047546,"url":"https://github.com/modelengine-group/datamate","last_synced_at":"2026-03-06T11:10:53.207Z","repository":{"id":320034147,"uuid":"1036006880","full_name":"ModelEngine-Group/DataMate","owner":"ModelEngine-Group","description":"DataMate is an enterprise-level data processing platform designed for model fine-tuning and RAG retrieval. ","archived":false,"fork":false,"pushed_at":"2026-01-28T10:37:51.000Z","size":8055,"stargazers_count":324,"open_issues_count":20,"forks_count":35,"subscribers_count":8,"default_branch":"main","last_synced_at":"2026-01-28T17:21:16.666Z","etag":null,"topics":["data-evaluation","data-pipeline","data-synthesis","rag"],"latest_commit_sha":null,"homepage":"https://github.com/ModelEngine-Group/DataMate","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ModelEngine-Group.png","metadata":{"files":{"readme":"README-zh.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-11T12:24:04.000Z","updated_at":"2026-01-28T10:37:55.000Z","dependencies_parsed_at":"2025-10-21T16:38:59.243Z","dependency_job_id":null,"html_url":"https://github.com/ModelEngine-Group/DataMate","commit_stats":null,"previous_names":["modelengine-group/data-platform"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ModelEngine-Group/DataMate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelEngine-Group%2FDataMate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelEngine-Group%2FDataMate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelEngine-Group%2FDataMate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelEngine-Group%2FDataMate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ModelEngine-Group","download_url":"https://codeload.github.com/ModelEngine-Group/DataMate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelEngine-Group%2FDataMate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28937821,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T08:53:31.997Z","status":"ssl_error","status_checked_at":"2026-01-31T08:51:38.521Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-evaluation","data-pipeline","data-synthesis","rag"],"created_at":"2026-01-31T10:03:21.491Z","updated_at":"2026-02-13T13:16:04.684Z","avatar_url":"https://github.com/ModelEngine-Group.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataMate 一站式数据工作平台\n\n\u003cdiv align=\"center\"\u003e\n\n[![Backend CI](https://github.com/ModelEngine-Group/DataMate/actions/workflows/docker-image-backend.yml/badge.svg)](https://github.com/ModelEngine-Group/DataMate/actions/workflows/docker-image-backend.yml)\n[![Frontend CI](https://github.com/ModelEngine-Group/DataMate/actions/workflows/docker-image-frontend.yml/badge.svg)](https://github.com/ModelEngine-Group/DataMate/actions/workflows/docker-image-frontend.yml)\n![GitHub Stars](https://img.shields.io/github/stars/ModelEngine-Group/DataMate)\n![GitHub Forks](https://img.shields.io/github/forks/ModelEngine-Group/DataMate)\n![GitHub Issues](https://img.shields.io/github/issues/ModelEngine-Group/DataMate)\n![GitHub License](https://img.shields.io/github/license/ModelEngine-Group/datamate-docs)\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ModelEngine-Group/DataMate)\n\n**DataMate是面向模型微调与RAG检索的企业级数据处理平台，支持数据归集、数据管理、算子市场、数据清洗、数据合成、数据标注、数据评估、知识生成等核心功能。**\n\n[简体中文](./README-zh.md) | [English](./README.md)\n\n如果您喜欢这个项目，希望您能给我们一个Star⭐️!\n\n\u003c/div\u003e\n\n## 🌟 核心特性\n\n- **核心模块**：数据归集、数据管理、算子市场、数据清洗、数据合成、数据标注、数据评估、知识生成\n- **可视化编排**：拖拽式数据处理流程设计\n- **算子生态**：丰富的内置算子和自定义算子支持\n\n## 🚀 快速开始\n\n### 前置条件\n\n- Git (用于拉取源码)\n- Make (用于构建和安装)\n- Docker (用于构建镜像和部署服务)\n- Docker-Compose (用于部署服务-docker方式)\n- kubernetes (用于部署服务-k8s方式)\n- Helm (用于部署服务-k8s方式)\n\n### 拉取代码\n\n```bash\ngit clone git@github.com:ModelEngine-Group/DataMate.git\ncd DataMate\n```\n\n### 部署基础服务\n\n```bash\nmake install\n```\n\n本项目支持docker-compose和helm两种方式部署，请在执行命令后输入部署方式的对应编号，命令回显如下所示：\n```shell\nChoose a deployment method:\n1. Docker/Docker-Compose\n2. Kubernetes/Helm\nEnter choice:\n```\n\n若您使用的机器没有make，您也可以执行如下命令部署:\n```bash\nREGISTRY=ghcr.io/modelengine-group/ docker compose -f deployment/docker/datamate/docker-compose.yml --profile milvus up -d\n```\n\n当容器运行后，请在浏览器打开 http://localhost:30000 查看前端界面。\n\n要查看所有可用的 Make 目标、选项和帮助信息，请运行：\n\n```bash\nmake help\n```\n\n如果您是离线环境，您可以执行如下命令下载所有依赖的镜像:\n```bash\nmake download\n```\n\n### 部署Label Studio作为标注工具\n```bash\nmake install-label-studio\n```\n\n### 构建并部署Mineru增强pdf处理\n```bash\nmake build-mineru\nmake install-mineru\n```\n\n### 部署DeerFlow服务\n```bash\nmake install-deer-flow\n```\n\n### 本地开发部署\n本地代码修改后，请执行以下命令构建镜像并使用本地镜像部署\n```bash\nmake build\nmake install dev=true\n```\n\n### 卸载服务\n```bash\nmake uninstall\n```\n\n在运行 `make uninstall` 时，卸载流程会只询问一次是否删除卷（数据），该选择会应用到所有组件。卸载顺序为：milvus -\u003e label-studio -\u003e datamate，确保在移除 datamate 网络前，所有使用该网络的服务已先停止。\n\n## 🤝 贡献指南\n\n感谢您对本项目的关注！我们非常欢迎社区的贡献，无论是提交 Bug 报告、提出功能建议，还是直接参与代码开发，都能帮助项目变得更好。\n\n• 📮 [GitHub Issues](../../issues)：提交 Bug 或功能建议。\n\n• 🔧 [GitHub Pull Requests](../../pulls)：贡献代码改进。\n\n## 📄 许可证\n\nDataMate 基于 [MIT](LICENSE) 开源，您可以在遵守许可证条款的前提下自由使用、修改和分发本项目的代码。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelengine-group%2Fdatamate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodelengine-group%2Fdatamate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelengine-group%2Fdatamate/lists"}