Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cosmic-heart/parallel-distributed-ml-workspace
Documentation of Setting up Parallel & Distributed ML Workspace in your systems. And to work seamlessly without error. Package to easily setup a environment in the group of systems
https://github.com/cosmic-heart/parallel-distributed-ml-workspace
deep-learning distributed machine-learning nfs parallel pytorch pytorch-lightning ray setup
Last synced: 17 days ago
JSON representation
Documentation of Setting up Parallel & Distributed ML Workspace in your systems. And to work seamlessly without error. Package to easily setup a environment in the group of systems
- Host: GitHub
- URL: https://github.com/cosmic-heart/parallel-distributed-ml-workspace
- Owner: cosmic-heart
- License: mit
- Created: 2023-04-05T20:58:40.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-04-15T02:28:14.000Z (almost 2 years ago)
- Last Synced: 2025-01-19T11:30:03.456Z (22 days ago)
- Topics: deep-learning, distributed, machine-learning, nfs, parallel, pytorch, pytorch-lightning, ray, setup
- Language: Python
- Homepage: https://parallel-distributed-ml-workspace.readthedocs.io/en/latest/
- Size: 37.1 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Welcome to Distributed-ML-Workspace
## Ray Distributed Made EasyRay is an open source framework for building distributed and parallel applications. It was developed by the RISELab at UC Berkeley and is now maintained by the Ray team at Anyscale.
With Ray, you can easily scale your Python applications to run on a cluster of machines, using a simple and intuitive API. Ray provides a number of useful features, including distributed computing, parallel processing, and fault tolerance.
[Ray Distributed Documentation](https://docs.ray.io/en/latest/index.html)
**para-dist-workspace** (Distributed ML) is a Workspace Documented for Deep Learning, Climate Modelling, Molecular Dynamics, and other parallel & distributed applications
that creates a procedure of setting up and running the Ray Distributed Framework.Check out the [usage](usage) section for further information, including how to [install](usage#installation) the project.
!!! note
This Documentation is under active development.
## Features
- **Distributed computing:** Ray allows you to easily distribute your computations across a cluster of machines, enabling you to scale your applications to handle large datasets and complex workloads.
- **Parallel processing:** Ray makes it easy to parallelize your code, so that you can take advantage of multiple CPUs or GPUs to speed up your computations.
- **Fault tolerance:** Ray provides fault tolerance out of the box, so that your applications can continue running even if one or more machines in your cluster fail.
- **Easy to use:** Ray's API is designed to be simple and intuitive, so that you can focus on writing your application logic instead of worrying about the details of distributed computing.
## Getting started
To get started with Distributed Environment , you can follow these steps:
**1.** Your Can view Our Installation Documentation and run the scripts to install you environment
**2.** View our Examples and run the scripts to get started with Ray Distributed Framework
**3.** When got Stuck with the error, you can view our and run the scripts to get started with Ray Distributed Framework
!!! Warning
Scrits are written for Specific Environment, so you may need to update the scripts according to your environment. Overall
!!! Danger
1. Most of the problems, you may face while running the Ray Distributed Framework is due to the Firewall, so you have to configure or disable the firewall to run the Ray Distributed Framework across mupltiple Nodes.## Distributed Environment
- **Ubuntu 20.04 LTS**
- **Python 3.10.8**
- **Ray 2.3.0**
- **Ray Lightning 0.3.0**
- **Pytorch 1.13.1**
- **Nvidia GeForce Driver 515.63**
- **Cuda Version 11.7**
- **Pytorch Lightning 1.6.0**!!! Note
By default cuda gets installed with the installation of pytorch, but after the release of the new version of pytorch i.e `PyTorch 2.0` release, all the version is by default installs the pycuda for Pytorch, so you need to install the cuda version manually to run cuda programming for other needs.
More information about the cuda version can be found [here](https://pytorch.org/get-started/previous-versions/)## Conclusion
Ray is a powerful framework for building distributed and parallel applications. It simplicity provides a *simple* and *intuitive* API. With our additional amount of documentation, you can easily set up and run your Ray Distributed Framework. We hope that you find our Distributed ML Workspace useful and that it helps you to build your own distributed applications.