https://github.com/shadensmith/deepspeed-test-worker
https://github.com/shadensmith/deepspeed-test-worker
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/shadensmith/deepspeed-test-worker
- Owner: ShadenSmith
- License: mit
- Created: 2020-08-28T17:57:29.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-09-29T17:40:04.000Z (over 5 years ago)
- Last Synced: 2025-04-01T19:51:32.543Z (about 1 year ago)
- Language: Shell
- Size: 4.88 KB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# deepspeed-test-worker
This is a simple script for spinning up new Azure Pipelines agents to run
DeepSpeed integration tests. This script installs prerequisites, registers the worker,
and begins listening for jobs.
## Prerequisites
* `$DEEPSPEED_PAT` must store a [personal authentication token (PAT)](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/v3-linux?view=azure-devops#permissions) configured for DeepSpeed's GPU testing pool.
* `sudo` priviledges are required to install the agent prerequisite software. You may be
prompted for a password once at the beginning of this script execution.
## Instructions
To spin up a worker, simply run:
```bash
DEEPSPEED_PAT=mytoken ./prep_test_node.sh
```
**Note:** the worker will stop once this script is killed. For continued execution, we
strongly recommend you run this script in a `tmux` or `screen` environment.
## Configuring
### Agent installation path
The testing agent sets up in `/tmp/deepspeed-testing/` by default. You can change the base directory
by setting the environment variable `$DEEPSPEED_TEST_BASE` at the time of running:
```bash
DEEPSPEED_TEST_BASE=/my/fast/dir DEEPSPEED_PAT=mytoken ./prep_test_node.sh
```
**Note**: we recommend you base the testing agent on fast local storage.
### Data paths
DeepSpeed's model tests expect training data to be found under
* `/data/Megatron-LM`
* `/data/BingBertSquad`
We don't currently provide a way to configure the model test training data location.