Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rcmdnk/gcpm

HTCondor pool manager for Google Cloud Platform.
https://github.com/rcmdnk/gcpm

Last synced: 25 days ago
JSON representation

HTCondor pool manager for Google Cloud Platform.

Host: GitHub
URL: https://github.com/rcmdnk/gcpm
Owner: rcmdnk
License: apache-2.0
Created: 2019-01-04T01:20:01.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2023-02-19T06:18:06.000Z (about 2 years ago)
Last Synced: 2025-01-19T05:48:32.101Z (about 1 month ago)
Language: Python
Size: 301 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Google Cloud Platform Condor Pool Manager (GCPM)

HTCondor pool manager for Google Cloud Platform.

## Installation

### Package installation

GCPM can be installed by `pip`:

    $ pip install gcpm

### Service file installation

To install as service, do:

    $ gcpm install

:warning: Service installation is valid only for the system managed by **Systemd**.

If **logrotate** is installed, logrotation definition for **/var/log/gcpm.log** is also installed.

## Configuration file

### Configuration file path

The default configuration file is **~/.config/gcpm/gcpm.yml**.

For service, the configuration file is **/etc/gcpm.yml**.

To change the configuration file, use `--config` option:

    $ gcpm run --config /path/to/my/gcpm.yml

### Configuration file content

  

    A configuration file is YAML format.

  

  


Name|Description|Default Value|Mandatory|

:---|:----------|:------------|:--------|

config_dir   | Directory for some gcpm related files.|**~/.config/gcpm/** (user)
**/var/cache/gcpm** (service)|No

oatuh_file   | Path to OAuth information file for GCE/GCS usage.|**/oauth**|No

service_account_file | Service account JSON file for GCE/GCS usage.
If not specified, OAuth connection is tried.|-|No

project      | Google Cloud Platform Project Name.|-|Yes

zone         | Zone for Google Compute Engine.|-|Yes

machines     | Array of machine settings.
Each setting is array of [core, mem, disk, idle, image] (see below).|[]|Yes

machines:core     | Number of core of the machine type.|-|Yes

machines:mem      | Memory (MB) of the machine type.|-|Yes

machines:swap     | Swap memory (MB) of the machine type.|Same as mem|No

machines:disk     | Disk size (GB) of the machine type.|-|Yes

machines:max      | Limit of the number of instances for the machine type.|-|Yes

machines:idle     | Number of idle machines for the machine type.|-|Yes

machines:image    | Image of the machine type.|-|Yes

machines:<others> | Other any options can be defined for creating instance.|-|No

max_cores    | Limit of the total number of cores of all instances.
If it is set 0, no limit is applied.|0|No

static_wns   | Array of instance names of static worker nodes, which are added as condor worker nodes.|[]|No

required_machines          | Array of machines which should be running other than worker nodes.|[]|No

required_machines:name     | Number of core of the machine type.|-|Yes

required_machines:mem      | Memory (MB) of the machine type.|-|Yes

required_machines:swap     | Swap memory (MB) of the machine type.|Same as mem|No

required_machines:disk     | Disk size (GB) of the machine type.|-|Yes

required_machines:image    | Image of the machine type.|-|Yes

required_machines:<others> | Other any options can be defined for creating instance.|-|No

primary_accounts |User accounts which jobs must run normal worker nodes. See below about primary accounts.|[]|No

prefix       | Prefix of machine names.|**gcp-wn**|No

preemptible  | 1 for preemptible machines, 0 for not.|0|No

off_timer    | Second to send condor_off after starting.|0|No

startup_cmd  | Additional commands at WN startup.|""|No

shutdown_cmd | Additional commands at WN shutdown.|""|No

network_tag  | Array of GCP network tag.|[]|No

reuse        | 1 to reused terminated instance. Otherwise delete and re-created instances.|0|No

interval     | Second of interval for each loop.|10|No

clean_time   | Time to clean up residual instances in starting/deleting status.|600|No

head_info    | If **head** is empty, head node information is automatically taken for each option:
hostname: Hostname
ip: IP address
gcp: Hostname|**gcp**|No

head         | Head node Hostname/IP address.|""|No

port         | HTCondor port.|9618|No

domain       | Domain of the head node.
Set empty to take it from hostnaem.|""|No

admin        | HTCondor admin email address.|""|Yes

owner        | HTCondor owner name.|""|Yes

wait_cmd     | 1 to wait GCE commands result (create/start/stop/delete...).|0|No

bucket       | Bucket name for pool_password file.|""|Yes

storageClass | Storage class name of the bucket.|"REGIONAL"|No

location     | Storage location for the bucket.
If empty, it is decided from the **zone**.|""|No

log_file     | Log file path. Empty to put it in stdout.|""|No

log_level    | Log level. (**debug**, **info**, **warning**, **error**, **critical**)|**info**|No

Note:

* Primary accounts

If primary accounts are set, jobs of **non-primary** accounts can run on test worker nodes.

If there are already max number of 1 core worker nodes

and idle jobs of non-primary accounts are there,

test worker node named **<prefix>-test-1core-XXXX** will be launched

and only non-primary account jobs can run on it.

This able to run such a test job w/o waiting for finishing any normal jobs.

Such test worker nodes can be launched until total cores are smaller than `max_core`.

To use this function effectively, set total of `max` of each core to less than `max_core`.

e.g.)

```yml

---

machines:

  core: 1

  max: 10

machines:

  core: 8

  max:  2

max_core: 20

primary_accounts:

  - condor_primary

```

In this case, normal jobs can launch 10 1-core machines and 2 8-core machines,

then 16 cores are used.

Even if there are a log of idle **condor_primary**'s jobs,

1 core test jobs by other accounts can run: 4 jobs at most.

  


## Puppet setup

* [rcmdnk/puppet-gcpm](https://github.com/rcmdnk/puppet-gcpm)

A puppet module for GCPM.

* [rcmdnk/gcpm-puppet](https://github.com/rcmdnk/gcpm-puppet)

A puppet example to create head (manager) node and worker node with puppet.

* [rcmdnk/frontiersquid-puppet](https://github.com/rcmdnk/frontiersquid-puppet)

A puppet example to create frontier squid proxy server in GCP.