Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/criteo/fromconfig-yarn
A fromconfig Launcher for Yarn
https://github.com/criteo/fromconfig-yarn
config-as-code fromconfig python3 yarn
Last synced: 21 days ago
JSON representation
A fromconfig Launcher for Yarn
- Host: GitHub
- URL: https://github.com/criteo/fromconfig-yarn
- Owner: criteo
- License: apache-2.0
- Created: 2021-04-20T12:20:38.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-11-25T13:47:00.000Z (about 2 years ago)
- Last Synced: 2024-11-15T21:56:05.701Z (3 months ago)
- Topics: config-as-code, fromconfig, python3, yarn
- Language: Python
- Homepage:
- Size: 33.2 KB
- Stars: 0
- Watchers: 8
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# FromConfig Yarn
[![pypi](https://img.shields.io/pypi/v/fromconfig-yarn.svg)](https://pypi.python.org/pypi/fromconfig-yarn)
[![ci](https://github.com/criteo/fromconfig-yarn/workflows/Continuous%20integration/badge.svg)](https://github.com/criteo/fromconfig-yarn/actions?query=workflow%3A%22Continuous+integration%22)A [fromconfig](https://github.com/criteo/fromconfig) `Launcher` for yarn execution.
- [Install](#install)
- [Quickstart](#quickstart)
- [Usage Reference](#usage-reference)
- [Options](#options)```bash
pip install fromconfig_yarn
```Once installed, the launcher is available with the name `yarn`.
Given the following module
```python
class Model:
def __init__(self, learning_rate: float):
self.learning_rate = learning_ratedef train(self):
print(f"Training model with learning_rate {self.learning_rate}")
```and config files
```yaml
# config.yaml
model:
_attr_: foo.Model
learning_rate: "${params.learning_rate}"# params.yaml
params:
learning_rate: 0.001# launcher.yaml
yarn:
name: test-fromconfiglogging:
level: 20launcher:
run: yarn
```Run (assuming you are in a Hadoop environment)
```bash
fromconfig config.yaml params.yaml launcher.yaml - model - train
```Which prints
```
INFO:fromconfig.launcher.logger:- yarn.name: test-fromconfig
INFO:fromconfig.launcher.logger:- logging.level: 20
INFO:fromconfig.launcher.logger:- params.learning_rate: 0.001
INFO:fromconfig.launcher.logger:- model._attr_: foo.Model
INFO:fromconfig.launcher.logger:- model.learning_rate: 0.001
INFO skein.Driver: Driver started, listening on 12345
INFO:fromconfig_yarn.launcher:Uploading pex to viewfs://root/user/path/to/pex
INFO:cluster_pack.filesystem:Resolved base filesystem:
INFO:cluster_pack.uploader:Zipping and uploading your env to viewfs://root/user/path/to/pex
INFO skein.Driver: Uploading application resources to viewfs://root/user/...
INFO skein.Driver: Submitting application...
INFO impl.YarnClientImpl: Submitted application application_12345
INFO:fromconfig_yarn.launcher:TRACKING_URL: http://12.34.56/application_12345
```You can also monkeypatch the relevant functions to "fake" the Hadoop environment with
```bash
python monkeypatch_fromconfig.py config.yaml params.yaml launcher.yaml - model - train
```This example can be found in [`docs/examples/quickstart`](docs/examples/quickstart).
To configure Yarn, add a `yarn` entry to your config.
You can set the following parameters.
- `env_vars`: A list of environment variables to forward to the container(s)
- `hadoop_file_systems`: The list of available filesystems
- `ignored_packages`: The list of packages not to include in the environment
- `jvm_memory_in_gb`: The JVM memory (default, `8`)
- `memory`: The executor's memory (default, `32 GiB`)
- `num_cores`: The executor's number of cores (default, `8`)
- `package_path`: The HDFS location where to save the environment
- `zip_file`: The path to an existing `pex` file, either local or on HDFS
- `name`: The application name
- `queue`: The yarn queue to submit the application to
- `node_label`: The label of the hadoop node to be scheduled
- `pre_script_hook`: A script to be executed before python is invoked
- `extra_env_vars`: A mapping of extra environment variables to forward to the container(s)