Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jrieke/awstrainer
🛠️ Command line tool for machine learning on AWS
https://github.com/jrieke/awstrainer
amazon-web-services aws command-line-tool deep-learning ec2 machine-learning server sync training
Last synced: about 1 month ago
JSON representation
🛠️ Command line tool for machine learning on AWS
- Host: GitHub
- URL: https://github.com/jrieke/awstrainer
- Owner: jrieke
- License: mit
- Created: 2020-07-06T20:19:37.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-07-08T20:32:29.000Z (over 4 years ago)
- Last Synced: 2024-12-09T01:39:50.669Z (about 2 months ago)
- Topics: amazon-web-services, aws, command-line-tool, deep-learning, ec2, machine-learning, server, sync, training
- Language: Python
- Homepage:
- Size: 2.97 MB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# awstrainer
🛠️ Command line tool for machine learning on AWS
awstrainer helps you run machine learning tasks (or any other long-running computations)
on AWS. With one simple command, it spins up an AWS instance (from your own account),
transfers your code & dataset, starts the training run, syncs all output files back to
your computer, and terminates the instance after training has finished. It really shines
when you need to quickly launch multiple, long-running jobs in parallel (e.g. for
hyperparameter optimization).## Demo
![](docs/images/demo.gif)
## Installation
1. `pip install git+https://github.com/jrieke/awstrainer`
2. Install the AWS CLI from [here](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
and run `aws configure` to [connect your AWS account](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html) (alternatively, you can create a credentials file as
described [here](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration)).## Usage
### Starting a training run
First, you need to create a launch template for your AWS instance. This specifies which
instance type should be used, how big the storage is, which packages should be
installed, etc. You can either follow the instructions [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html#create-launch-template) or create a launch
template [from an existing instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html#create-launch-template-from-instance).Then, navigate into your project dir and run:
awstrainer run --launch_template_id "/home/ubuntu/anaconda3/bin/python train.py"
This launches an AWS instance (based on your launch template), uploads the project dir
(excluding subdirs `.git` and `out`), executes a command via ssh (here it's starting a
training script, but this can be any command - note that you have to use absolute
paths because $PATH won't be available), and terminates the instance after
training has finished. Note that this assumes your private key file from AWS to be
stored as `aws-key.pem` in the project dir. To adapt this, set the `--key_file` option.
Based on which operating system your instance uses, you may also need to set the
`--user` option (default: `ubuntu`).For a complete list of options, run `awstrainer run --help`.
### Syncing output back to your machine
awstrainer also allows you to sync any output files from the AWS instances back to your
local machine. For this to work, you need to write output files to a folder `out`.
Then, on your local machine, run:awstrainer sync --every 60
This pulls output files from all running AWS instances every 60 seconds and syncs them
to a local dir `aws-synced-out`. You can also run `awstrainer sync` without the
`--every` option for a one-time sync.For a complete list of options, run `awstrainer sync --help`.
## Known issues
If `awstrainer run` shows a "Connection refused" error, try increasing the
waiting time after instance launch via the `--wait_time` option (default: 20).
Sometimes, the instance doesn't allow a connection even though the AWS API reports it
as ready, which may lead to this error.