Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ottogroup/clash
Clash is a tool for running jobs on the Google Compute Engine :rocket:
https://github.com/ottogroup/clash
Last synced: 3 months ago
JSON representation
Clash is a tool for running jobs on the Google Compute Engine :rocket:
- Host: GitHub
- URL: https://github.com/ottogroup/clash
- Owner: ottogroup
- License: apache-2.0
- Archived: true
- Created: 2018-10-16T07:46:00.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2023-02-08T00:43:16.000Z (almost 2 years ago)
- Last Synced: 2024-06-15T16:38:35.512Z (5 months ago)
- Language: Python
- Homepage:
- Size: 411 KB
- Stars: 28
- Watchers: 4
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
- awesome-repositories - ottogroup/clash - Clash is a tool for running jobs on the Google Compute Engine :rocket: (Python)
README
# Clash
*Clash* is a simple Python library for running jobs on the [Google Compute Engine](https://cloud.google.com/compute/). Typical use cases are batch jobs which require very specific hardware configurations at runtime (e.g. multiple GPUs for model training). The library offers the following features:
* Automatic management of compute resources (allocation and deallocation)
* Definition of jobs via custom docker images
* Fine-grained cost-control by optionally using [preemptible VMs](https://cloud.google.com/preemptible-vms/)
* An easy-to-use Python API including operators for [Apache Airflow](https://airflow.apache.org/)## Why Clash?
There are several ways for running jobs on the *Google Cloud Platform* (GCP) where each one comes with its pros and cons. For example, [Google's ML engine](https://cloud.google.com/ml-engine/) is now able to run dockerized jobs as well and should definitely be considered before using Clash, because of its excellent integration in the GCP ecosystem. In fact, the development of Clash started at a point in time when the options for running jobs on GCP were very limited.
On the other hand, Clash might still help to drastically reduce costs by offering the option to use preemptible VMs. In practice, one can usually make jobs robust towards sudden preemptions, for example, by continuously storing checkpoints. Clash automatically attempts to restart preempted machines and, thus, jobs can load their latest checkpoint and just continue where they have been interrupted. This way, we are successfully saving up to 80% of our compute costs.
## Requirements
* Python >= 3.7
Because Clash uses the [Google Cloud SDK](https://github.com/googleapis/google-cloud-python), you first have to set up your local environment to access GCP. Please visit the [gcloud docs](https://cloud.google.com/sdk/gcloud/reference/auth/) for that matter. In addition, Clash requires the following IAM roles to run correctly:
* roles/pubsub.editor # Clash uses [PubSub](https://cloud.google.com/pubsub/docs/) to communicate with its VMs
* roles/compute.instanceAdmin.* # Clash needs to be able to create custom VMs on the projectIf Clash should create VMs that are configured to run as a service account, one must also grant the roles/iam.serviceAccountUser role.
Note that *Clash VMs also need to be able to delete themselves and delete / publish to PubSub topics* in order to work correctly.## Usage
```
$ pip install pyclash
``````Python
from pyclash.clash import JobConfigBuilder, JobJOB_CONFIG = (
JobConfigBuilder()
.project_id("my-gcp-project")
.image("google/cloud-sdk:latest")
.machine_type("n1-standard-1")
.subnetwork("default")
.preemptible(True)
.build()
)result = Job(job_config=JOB_CONFIG, name_prefix="myjob").run(
["echo", "hello world"], wait_for_result=True
)if result["status"] != 0:
raise ValueError(f"The command failed with status code {result['status']}")
```By default, Clash runs VMs with the [Compute Engine default service account](https://cloud.google.com/compute/docs/access/service-accounts). One can also use Clash in the [Cloud Composer](https://cloud.google.com/composer/). To deploy the operators, run
```Bash
COMPOSER_ENVIRONMENT="mycomposer-env" \
COMPOSER_LOCATION="europe-west1" ./run.sh deploy-airflow-plugin
```Note that the pyclash package must be available to the Composer in order to use the Clash operators (the [official documentation](https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies) describes how to install custom packages from PyPi). The following example shows how to run jobs using Clash's *ComputeEngineJobOperator*:
```Python
from airflow.operators import ComputeEngineJobOperator
from airflow import DAGJOB_CONFIG = (
JobConfigBuilder()
.project_id("my-gcp-project")
.image("google/cloud-sdk:latest")
.machine_type("n1-standard-32")
.subnetwork("default")
.build()
)with DAG(
"dag_id",
start_date=datetime(2018, 10, 1),
catchup=False,
) as dag:
task_run_script = ComputeEngineJobOperator(
cmd="echo hello",
job_config=JOB_CONFIG,
name_prefix="myjob",
task_id="run_script_task"
)
```## Contributing
The best way to start working on Clash is to first install all the dependencies and run the tests:
```Bash
pipenv install --dev
./run.sh unit-test
GCP_PROJECT_ID='your-project-id' ./run.sh integration-test
```