Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/qqaatw/cs230-project
https://github.com/qqaatw/cs230-project
Last synced: 22 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/qqaatw/cs230-project
- Owner: qqaatw
- Created: 2024-01-26T21:55:44.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-10-21T22:01:58.000Z (25 days ago)
- Last Synced: 2024-10-22T18:05:48.435Z (24 days ago)
- Language: Python
- Size: 146 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CS230 Project - Distributed Job Scheduling System in Machine Learning Clusters
## Setup
**For scheduler and worker**
1. Install conda virtual environment.
2. Run `conda env create --prefix cs230 -f configuration.yaml` under `worker` directory.
3. Activate the environment.
4. Install common library with `pip install -e .` command under `common` directory.
5. Install `torch` and `torchvision` with `pip`.**For FTP server and RabbitMQ broker**
Deploy with docker from docker hub:
`rabbitmq:3.12-management`
`garethflowers/ftp-server`
**For users**
1. Install common library with `pip install -e .` command under `common` directory.
## `config.json`
The RabbitMQ broker, FTP server, and the GPU capacity of each worker should be set up in the `worker/config.json` file.
```json
"broker" : {
"broker_host": "18.119.97.104",
"broker_port": "5673",
"topics": {
"broker_scheduling_topic": "node_1_scheduling"
}
},
...
"ftp" : {
"ftp_host": "169.234.56.23",
"ftp_port": "21"
},
"workers": {
"1": {
"GPU": 8589934592
},
"2": {
"GPU": 2147483648
},
"3": {
"GPU": 0
}
},
...
```
## Launch**Scheduler**
There are three scheduling algorithm available:
```python scheduler.py [next-available, round-robin, priority-based]```
e.g.
```python scheduler.py next-available```
**Worker**
```python daemon.py [worker_id]```
e.g.
```python daemon.py 1```
## Workflow
Please look at `tester.py` to learn how a user should send a new task request to scheduler and retrieve the results.