{"id":28729796,"url":"https://github.com/didi/dlflow","last_synced_at":"2025-06-15T17:11:04.786Z","repository":{"id":86305743,"uuid":"250220777","full_name":"didi/dlflow","owner":"didi","description":"DLFlow is a deep learning framework.","archived":false,"fork":false,"pushed_at":"2020-08-17T16:28:33.000Z","size":1276,"stargazers_count":156,"open_issues_count":4,"forks_count":33,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-04-14T13:44:07.606Z","etag":null,"topics":["deeplearning","deeplearning-framework","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/didi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-03-26T09:57:55.000Z","updated_at":"2024-02-29T04:54:15.000Z","dependencies_parsed_at":"2023-03-13T09:38:30.727Z","dependency_job_id":null,"html_url":"https://github.com/didi/dlflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/didi/dlflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/didi%2Fdlflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/didi%2Fdlflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/didi%2Fdlflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/didi%2Fdlflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/didi","download_url":"https://codeload.github.com/didi/dlflow/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/didi%2Fdlflow/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260016055,"owners_count":22946321,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deeplearning","deeplearning-framework","tensorflow"],"created_at":"2025-06-15T17:11:04.065Z","updated_at":"2025-06-15T17:11:04.778Z","avatar_url":"https://github.com/didi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n[![contributions](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)\n[![license](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n[![python](https://img.shields.io/badge/python-%3E%3D3.6-blue.svg)]()\n\n\n# DLFlow - A Deep Learning WorkFlow\n\n\n## DLFlow概述\nDLFlow是一套深度学习pipeline，它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测，十分适合离线环境下的生产任务。利用DLFlow，用户只需专注于模型开发，而无需关心原始特征处理、pipeline构建、生产部署等工作。\n\n\n## 功能支持\n**配置驱动：** DLFlow通过配置驱动，修改配置可以快速更换特征、模型超参数、任务流程等等，极大提高工作效率。\n\n**模块化结构：** 任务和模型以插件形式存在，便于使用和开发，用户可以可以轻地将自定义任务和模型注册到框架内使用。\n\n**任务自组织：** 通过内置的Workflow框架，根据任务的产出标记自动解决任务依赖，轻松构建深度学习pipeline。\n\n**最佳实践：** 融入滴滴用户画像团队深度学习离线任务的最佳实践，有效应对离线生产中的多种问题。将Tensorflow和Spark进行合理结合，更适合离线深度学习任务。\n\n## 快速开始\n\n### 环境准备\n首先请确保环境中已经安装和配置Hadoop和Spark，并设置好了基本的环境变量。\n\n- **Tensorflow访问HDFS**\n\n为了能够使用让Tensorflow访问HDFS，需要确保如下环境变量生效：\n```bash\n# 确保libjvm.so被添加到LD_LIBRARY_PATH\nexport LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${JAVA_HOME}/jre/lib/amd64/server\n\n# 确保hadoop jars被添加到CLASSPATH\nexport CLASSPATH=${CLASSPATH}:$(hadoop classpath --glob)\n```\n关于Tensorflow访问HDFS更多内容请参见 [TensorFlow on Hadoop](https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/hadoop.md)。\n\n- **Spark读写TFReocrds**\n```bash\n# Clone tensorflow/ecosystem项目\ngit clone https://github.com/tensorflow/ecosystem.git\n\ncd ecosystem/spark/spark-tensorflow-connector/\n\n# 构建spark-tensorflow-connector\nmvn versions:set -DnewVersion=1.14.0\nmvn clean install\n```\n项目构建后生成 `target/spark-tensorflow-connector_2.11-1.14.0.jar`，后续需要确保该jar被添加到 \n`spark.jars` 中。\n关于Spark读写TFRecoreds更多内容请参见 [spark-tensorflow-connector](https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector)。\n\n\n\n### 安装\n\n通过pip安装：\n```\npip install dlflow\n```\n\n通过源代码安装：\n```\ngit clone  https://github.com/didi/dlflow.git\ncd dlflow\npython setup.py install\n```\n\n### 使用\n- **配置文件**\n\n运行配置可参考 `conf` 目录中的配置。\n关于配置详情请参考 [配置说明](docs/tutorials/zh/CONFIGURATION.md)。\n\n- **以模块运行**\n\n```bash\npython -m dlflow.main --config \u003cCONFIGURATION FILE\u003e.conf\n```\n\n- **以脚本运行**\n\n确保python环境的 `bin` 目录已经被添加到环境变量 `PATH` 中\n```bash\nexport PATH=$PATH:/usr/local/python/bin\n```\n之后通过如下命令运行\n```\ndlflow --config \u003cCONFIGURATION FILE\u003e.conf\n```\n\n更详细的使用参见 [使用说明](docs/tutorials/zh/USAGE.md)。\n\n### 预定义任务\n| 预定义任务 | 描述 |\n| :---: | :--- |\n| **Merge** | 特征融合任务，请参见 [特征融合](dmflow/README.md) |\n| **Encode** | 解析原始特征，对特征进行编码和预处理，生成能够直接输入模型的特征 |\n| **Train** | 模型训练任务 |\n| **Evaluate** | 模型评估任务 |\n| **Predict** | 模型预测任务，使用Spark进行分布式预测，具备处理大规模数据能力 |\n\n\n## 手册目录\n- [使用说明](docs/tutorials/zh/USAGE.md)\n- [配置说明](docs/tutorials/zh/CONFIGURATION.md)\n- [Fmap说明](docs/tutorials/zh/FMAP.md)\n- [模型开发](docs/tutorials/zh/MODEL_DEV.md)\n- [任务开发](docs/tutorials/zh/TASK_DEV.md)\n- [版本记录](docs/tutorials/zh/RELEASE_NOTES.md)\n\n\n## 技术方案\n**DLFlow整体架构**\n\n![整体架构](https://gitee.com/wubalabadubdub/dlflow/raw/master/architecture.png)\n\n**DLFLow pipeline**\n\n![Pipeline](https://gitee.com/wubalabadubdub/dlflow/raw/master/pipeline.png)\n\n\n## Contributing\n欢迎使用并参与到本项目的建设中，详细内容请参见 [Contribution Guide](CONTRIBUTING.md)。\n\n\n## License\nDLFlow 基于Apache-2.0协议进行分发和使用，更多信息参见 [LICENSE](LICENSE)。\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdidi%2Fdlflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdidi%2Fdlflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdidi%2Fdlflow/lists"}