https://github.com/facultyai/faculty-distributed

Distributed execution on Faculty platform
https://github.com/facultyai/faculty-distributed

distributed-computing faculty-platform python

Last synced: 3 months ago
JSON representation

Distributed execution on Faculty platform

Host: GitHub
URL: https://github.com/facultyai/faculty-distributed
Owner: facultyai
Created: 2019-05-10T10:22:58.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2020-08-11T15:02:12.000Z (almost 6 years ago)
Last Synced: 2025-06-04T06:14:51.758Z (about 1 year ago)
Topics: distributed-computing, faculty-platform, python
Language: Jupyter Notebook
Homepage: https://faculty.ai/platform/
Size: 43 KB
Stars: 5
Watchers: 6
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE-2.0.txt

Awesome Lists containing this project

README

          # faculty-distributed

Tool running functions in parallel across multiple servers using Faculty Jobs. To access the functionality one makes use of the class:

```python

faculty_distributed.FacultyJobExecutor

```

Additional information is found in the notebook in the `examples` directory.

## Installation

Either install using pip,

```bash

pip install faculty-distributed

```

or clone the repository in a directory of your choosing and install from the local copy

```bash

git clone git@github.com:facultyai/faculty-distributed.git

cd faculty-distributed

pip install .

```

## Set up

### Create a job definition

Next, create a new job definition named `distributed_example`. In the `COMMAND` section, paste the following:

```bash

pip install faculty-distributed && faculty_distributed_job $path $worker_id

```

Then, add a `PARAMETER` with the name `path`, of type `text` and ensure that the `Make field mandatory` box is checked. Create another `PARAMETER` named `worker_id` of type `text` and ensure that the `Make field mandatory` box is checked.

Finally, under `SERVER SETTINGS`, add `faculty_distributed` to the `ENVIRONMENTS` section. Note that any libraries used in the function to be executed that are not installed automatically on Faculty servers need to be installed on the job server via a separate environment. 

Depending on the level of parallelisation required and how long each function takes to run it may be better to use dedicated rather than shared instances. To achieve this, click on `Large and GPU servers` under `SERVER RESOURCES`, and select an appropriate server type from the dropdown menu.

Remember to click `SAVE` when you are finished.

## Usage

Import the `faculty-distributed` module and find the faculty platform project ID and job ID. Here the job name is `distributed_example`. 

```python

import faculty_distributed

import os

project_id = os.environ["FACULTY_PROJECT_ID"]

job_id = faculty_distributed.job_name_to_job_id("distributed_example")

```

Then define function to be sent to distributed workers and a list of arguments to be sent the workers.

```python

def foo(x, y):

    return 2*x + y

    

args_list = [[1, 2], [2, 3], [3, 4]]

```

Finally, instantiate the class `FacultyJobExecutor`, passing the project and job IDs. Optional arguments are `clean`, a boolean [default = True] that determines whether temporary files created for the run are deleted immediately after the completion of the job, and `tmpdir_prefix`, a string [default = '/project/.faculty-distributed'] that defines the path to where the temporary directory is created. 

Call `map`, passing the function and the list of arguments, to execute the function. Once `map` has been called, a job will start with as many subruns as there are arguments passed. The output of these subruns will be returned as a list. 

```python

fje = faculty_distributed.FacultyJobExecutor(project_id, job_id)

output = fje.map(foo, args_list)

```

## Try the example

An example of excecuting a function with `faculty-distributed` is provided in the directory `examples/`. The notebook loads an example dataset, defines a function that trains a model, defines a list of arguments to be passed to the function in parallel, runs the jobs and then collects the results. The notebook will wait for the jobs to be completed before executing the remainder of the script.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/facultyai/faculty-distributed

Awesome Lists containing this project

README