Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/uber/fiber
Distributed Computing for AI Made Simple
https://github.com/uber/fiber
distributed-computing machine-learning multiprocessing python sandbox
Last synced: about 1 month ago
JSON representation
Distributed Computing for AI Made Simple
- Host: GitHub
- URL: https://github.com/uber/fiber
- Owner: uber
- License: apache-2.0
- Archived: true
- Created: 2020-01-07T18:16:24.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-03-19T22:55:22.000Z (over 1 year ago)
- Last Synced: 2024-09-21T15:49:53.592Z (about 2 months ago)
- Topics: distributed-computing, machine-learning, multiprocessing, python, sandbox
- Language: Python
- Homepage: https://uber.github.io/fiber/
- Size: 8.35 MB
- Stars: 1,040
- Watchers: 19
- Forks: 108
- Open Issues: 26
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- StarryDivineSky - uber/fiber
- awesome-python-machine-learning-resources - GitHub - 68% open · ⏱️ 15.03.2021): (分布式机器学习)
- awesome-production-machine-learning - Fiber - Distributed computing library for modern computer clusters from Uber. (Computation Load Distribution)
README
[**Project Home**](https://uber.github.io/fiber/)
[**Blog**](https://eng.uber.com/fiberdistributed/)
[**Documents**](https://uber.github.io/fiber/getting-started/)
[**Paper**](https://arxiv.org/abs/2003.11164)
[**Media Coverage**](https://venturebeat.com/2020/03/26/uber-details-fiber-a-framework-for-distributed-ai-model-training/)Join Fiber users email list [[email protected]](https://groups.google.com/forum/#!forum/fiber-users)
# Fiber
### Distributed Computing for AI Made Simple
*This project is experimental and the APIs are not considered stable.*
Fiber is a Python distributed computing library for modern computer clusters.
* It is easy to use. Fiber allows you to write programs that run on a computer cluster level without the need to dive into the details of computer cluster.
* It is easy to learn. Fiber provides the same API as Python's standard [multiprocessing](https://docs.python.org/3.6/library/multiprocessing.html) library that you are familiar with. If you know how to use multiprocessing, you can program a computer cluster with Fiber.
* It is fast. Fiber's communication backbone is built on top of [Nanomsg](https://nanomsg.org/) which is a high-performance asynchronous messaging library to allow fast and reliable communication.
* It doesn't need deployment. You run it as the same way as running a normal application on a computer cluster and Fiber handles the rest for you.
* It it reliable. Fiber has built-in error handling when you are running a pool of workers. Users can focus on writing the actual application code instead of dealing with crashed workers.Originally, it was developed to power large scale parallel scientific computation projects like [POET](https://eng.uber.com/poet-open-ended-deep-learning/) and it has been used to power similar projects within Uber.
## Installation
```
pip install fiber
```Check [here](https://uber.github.io/fiber/installation/) for details.
## Quick Start
### Hello Fiber
To use Fiber, simply import it in your code and it works very similar to multiprocessing.```python
import fiberif __name__ == '__main__':
fiber.Process(target=print, args=('Hello, Fiber!',)).start()
```Note that `if __name__ == '__main__':` is necessary because Fiber uses *spawn* method to start new processes. Check [here](https://stackoverflow.com/questions/50781216/in-python-multiprocessing-process-do-we-have-to-use-name-main) for details.
Let's take look at another more complex example:
### Estimating Pi
```python
import fiber
import random@fiber.meta(cpu=1)
def inside(p):
x, y = random.random(), random.random()
return x * x + y * y < 1def main():
NUM_SAMPLES = int(1e6)
pool = fiber.Pool(processes=4)
count = sum(pool.map(inside, range(0, NUM_SAMPLES)))
print("Pi is roughly {}".format(4.0 * count / NUM_SAMPLES))if __name__ == '__main__':
main()
```Fiber implements most of multiprocessing's API including `Process`, `SimpleQueue`, `Pool`, `Pipe`, `Manager` and it has its own extension to the multiprocessing's API to make it easy to compose large scale distributed applications. For the detailed API guild, check out [here](https://uber.github.io/fiber/process/).
### Running on a Kubernetes cluster
Fiber also has native support for computer clusters. To run the above example on Kubernetes, fiber provided a convenient command line tool to manage the workflow.
Assume you have a working docker environment locally and have finished configuring [Google Cloud SDK](https://cloud.google.com/sdk/docs/quickstarts). Both `gcloud` and `kubectl` are available locally. Then you can start by writing a Dockerfile which describes the running environment. An example Dockerfile looks like this:
```dockerfile
# example.docker
FROM python:3.6-buster
ADD examples/pi_estimation.py /root/pi_estimation.py
RUN pip install fiber
```
**Build an image and launch your job**```
fiber run -a python3 /root/pi_estimation.py
```This command will look for local Dockerfile and build a docker image and push it to your Google Container Registry . It then launches the main job which contains your code and runs the command `python3 /root/pi_estimation.py` inside your job. Once the main job is running, it will start 4 subsequent jobs on the cluster and each of them is a Pool worker.
## Supported platforms
* Operating system: Linux
* Python: 3.6+
* Supported cluster management systems:
* Kubernetes (Tested with Google Kubernetes Engine on Google cloud)We are interested in supporting other cluster management systems like [Slurm](https://slurm.schedmd.com/), if you want to contribute to it please let us know.
Check [here](https://uber.github.io/fiber/platforms/) for details.
## Documentation
The documentation, including method/API references, can be found [here](https://uber.github.io/fiber/getting-started/).
## Testing
Install test dependencies. You'll also need to make sure [docker](https://docs.docker.com/install/) is available on the testing machine.
```bash
$ pip install -e .[test]
```Run tests
```bash
$ make test
```## Contributing
Please read our [code of conduct](CODE_OF_CONDUCT.md) before you contribute! You can find details for submitting pull requests in the [CONTRIBUTING.md](CONTRIBUTING.md) file. Issue [template](https://help.github.com/articles/about-issue-and-pull-request-templates/).## Versioning
We document versions and changes in our changelog - see the [CHANGELOG.md](CHANGELOG.md) file for details.## License
This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.## Cite Fiber
```
@misc{zhi2020fiber,
title={Fiber: A Platform for Efficient Development and Distributed Training for Reinforcement Learning and Population-Based Methods},
author={Jiale Zhi and Rui Wang and Jeff Clune and Kenneth O. Stanley},
year={2020},
eprint={2003.11164},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```## Acknowledgments
* Special thanks to Piero Molino for designing the logo for Fiber