{"id":15348776,"url":"https://github.com/zuston/raytf","last_synced_at":"2026-03-03T03:38:34.175Z","repository":{"id":43466438,"uuid":"375887181","full_name":"zuston/raytf","owner":"zuston","description":"Distributed Deep Learning Framework on Ray, including tensorflow/pytorch/mxnet","archived":false,"fork":false,"pushed_at":"2022-03-01T12:45:43.000Z","size":48,"stargazers_count":5,"open_issues_count":6,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-15T04:13:36.383Z","etag":null,"topics":["distributed-tensorflow","ray","ray-tensorflow","ray-tf","tensorflow","tensorflow-estimator","tensorflow-estimator-api","tensorflow-on-ray","tensorflow2"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zuston.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-11T02:54:24.000Z","updated_at":"2022-12-12T16:14:20.000Z","dependencies_parsed_at":"2022-08-28T13:52:52.523Z","dependency_job_id":null,"html_url":"https://github.com/zuston/raytf","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zuston%2Fraytf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zuston%2Fraytf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zuston%2Fraytf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zuston%2Fraytf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zuston","download_url":"https://codeload.github.com/zuston/raytf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249003958,"owners_count":21196793,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-tensorflow","ray","ray-tensorflow","ray-tf","tensorflow","tensorflow-estimator","tensorflow-estimator-api","tensorflow-on-ray","tensorflow2"],"created_at":"2024-10-01T11:52:08.211Z","updated_at":"2026-03-03T03:38:29.123Z","avatar_url":"https://github.com/zuston.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Distributed Deep Learning Framework on Ray\n--------------------------------------------------\n\nThe raytf framework provides a simple interface to support distributed training on ray,\nincluding tensorflow/pytorch/mxnet. Now tensorflow has been supported,\nothers will be included in later.\n\nQuick Start\n~~~~~~~~~~~\nOnly tested under Python3.6 version\n\n1. Install the latest ray version: ``pip install ray``\n2. Install the latest raytf: ``pip install raytf``\n3. Git clone this project: ``git clone https://github.com/zuston/raytf.git``\n4. Enter the example folder and execute the python script file, like the following command.\n\n.. code:: bash\n\n        cd raytf\n        cd example\n        python mnist.py\n\n\nHow to Use\n~~~~~~~~~~~\n\n.. code:: python\n\n        from raytf.raytf_driver import Driver\n        # When you using it in local single machine\n        # ray.init()\n        tf_cluster = Driver.build(resources=\n            {\n                'ps': {'cores': 2, 'memory': 2, 'gpu': 2, 'instances': 2},\n                'worker': {'cores': 2, 'memory': 2, 'gpu': 2, 'instances': 6},\n                'chief': {'cores': 2, 'memory': 2, 'gpu': 2, 'instances': 1}\n            },\n            event_log='/tmp/opal/4',\n            resources_allocation_timeout=10\n        )\n        tf_cluster.start(model_process=process, args=None)\n\nThis training code will be attached to the existed on-prem Ray cluster. If \ndebug, you can use ``ray.init()`` to init Ray cluster in local.\n\nWhen you specify the event\\_log in tf builder, sidecar tensorboard will\nbe started on one worker.\n\nGANG scheduler has been supported. Besides raytf provides the configuration of \ntimeout for waiting resources which is shown in above code, and the option \nof ``resources_allocation_timeout`` unit is sec.\n\nHow to build and deploy\n~~~~~~~~~~~~~~~~~~~~~~~\n\n\u003cRequirement\u003e ``python -m pip install twine``\n\n1. ``python setup.py bdist\\_wheel --universal``\n2. ``python -m pip install xxxxxx.whl``\n3. ``twine upload dist/*``\n\nTips\n~~~~\n\n1. To solve the problem of Python module importing on Ray on-prem cluster,\n   this project must use Ray 1.5+ version, refer to this\n   RFC(https://github.com/ray-project/ray/issues/14019)\n2. This project is only be tested by Tensorflow estimator training\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzuston%2Fraytf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzuston%2Fraytf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzuston%2Fraytf/lists"}