https://github.com/dcos/dcos-ansible
Ansibles roles to manage a Mesosphere DC/OS clusters life cycle
https://github.com/dcos/dcos-ansible
dcos dcos-testing-guild
Last synced: 4 days ago
JSON representation
Ansibles roles to manage a Mesosphere DC/OS clusters life cycle
- Host: GitHub
- URL: https://github.com/dcos/dcos-ansible
- Owner: dcos
- Created: 2018-12-05T10:31:00.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-09-23T23:24:28.000Z (over 3 years ago)
- Last Synced: 2024-04-09T22:30:07.190Z (about 1 year ago)
- Topics: dcos, dcos-testing-guild
- Language: Python
- Homepage: https://dcos.io
- Size: 608 KB
- Stars: 29
- Watchers: 11
- Forks: 55
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Ansible Roles: Mesosphere DC/OS
A set of Ansible Roles that manage a DC/OS cluster lifecycle on RedHat/CentOS Linux.
## Requirements
To make best use of these roles, your nodes should resemble the Mesosphere recommended way of setting up infrastructure. Depending on your setup, it is expected to deploy to:
* One or more master node ('masters')
* One bootstrap node ('bootstraps')
* Zero or more agent nodes, used for public facing services ('agents_public')
* One or more agent nodes, not used for public facing services ('agents_private')### An example inventory file is provided as shown here:
```ini
[bootstraps]
bootstrap1-dcos112s.example.com[masters]
master1-dcos112s.example.com
master2-dcos112s.example.com
master3-dcos112s.example.com[agents_private]
agent1-dcos112s.example.com
remoteagent1-dcos112s.example.com[agents_public]
publicagent1-dcos112s.example.com[agents:children]
agents_private
agents_public[common:children]
bootstraps
masters
agents
agents_public
```## Role Variables
The Mesosphere DC/OS Ansible roles make use of two sets of variables:
1. A set of per-node type `group_var`'s
2. A multi-level dictionary called `dcos`, that should be available to all nodes### Per group vars
```ini
[bootstraps:vars]
node_type=bootstrap[masters:vars]
node_type=master
dcos_legacy_node_type_name=master[agents_private:vars]
node_type=agent
dcos_legacy_node_type_name=slave[agents_public:vars]
node_type=agent_public
dcos_legacy_node_type_name=slave_public
```### Global vars
```yml
dcos:
download: "https://downloads.dcos.io/dcos/stable/1.13.4/dcos_generate_config.sh"
download_checksum: "sha256:a3d295de33ad55b10f5dc66c9594d9175a40f5aaec7734d664493968a9f751fd"
version: "1.13.4"
enterprise_dcos: false
selinux_mode: enforcingconfig:
cluster_name: "examplecluster"
security: strict
bootstrap_url: http://int-bootstrap1-examplecluster.example.com:8080
exhibitor_storage_backend: static
master_discovery: static
master_list:
- 172.31.42.1
```#### Cluster wide variables
| Name | Required? | Description |
|:------------------------|:-------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| download | REQUIRED | (https) URL to download the Mesosphere DC/OS install from |
| download_checksum | no | Checksum to check the download against. It should start with the method being used. E.g. "sha256:"
| version | REQUIRED | Version string that reflects the version that the installer (given by `download`) installs. Can be collected by running `dcos_generate_config.sh --version`. |
| version_to_upgrade_from | for upgrades | Version string of Mesosphere DC/OS the upgrade procedure expectes to upgrade FROM. A per-version upgrade script will be generated on the bootstrap machine, each cluster node downloads the proper upgrade for its currenly running DC/OS version. |
| image_commit | no | Can be used to force same version / same config upgrades. Mostly useful for deploying/upgrading non-released versions, e.g. `1.12-dev`. This parameter takes precedence over `version`. |
| enterprise_dcos | REQUIRED | Specifies if the installer (given by `download`) installs an 'open' or 'enterprise' version of Mesosphere DC/OS. This is required as there are additional post-upgrade checks for enterprise-only components. |
| selinux_mode | REQUIRED | Indicates the cluster nodes operating sytems SELinux mode. Mesosphere DC/OS supports running in `enforcing` mode starting with **1.12**. Older versions require `permissive`. |
| | | |
| config | yes | Yaml structure that represents a valid Mesosphere DC/OS config.yml, see below. |#### DC/OS config.yml parameters
Please see [the official Mesosphere DC/OS configuration reference](https://docs.mesosphere.com/1.13/installing/production/advanced-configuration/configuration-reference/) for a full list of possible parameters.
There are a few parameters that are used by these roles outside the DC/OS config.yml, specifically:* `bootstrap_url`: Should point to http://*your bootstrap node*:8080. Will be used internally and conviniently overwritten for the installer/upgrader to point to a version specific sub-directory.
* `ip_detect_contents`: Is used to determine a user-supplied IP detection script. Overwrites the build-in enviroment detection and usage of a generic AWS and/or on premise script. [Official Mesosphere DC/OS ip-detect reference](https://docs.mesosphere.com/1.13/installing/production/deploying-dcos/installation/#create-an-ip-detection-script)
* `ip_detect_public_contents`: Is used to determine a user-supplied public IP detection script. Overwrites the build-in enviroment detection and usage of a generic AWS and/or on premise script. [Official Mesosphere DC/OS ip-detect reference](https://docs.mesosphere.com/1.13/installing/production/deploying-dcos/installation/#create-an-ip-detection-script)
* `fault_domain_detect_contents`: Is used to determine a user-supplied fault domain detection script. Overwrites the build-in enviroment detection and usage of a generic AWS and/or on premise script.#### Ansible dictionary merge behavior caveat
Due to the nested structure of the `dcos` configuration, it might be required to set Ansible to ['merge' instead of 'replace'](https://docs.ansible.com/ansible/2.4/intro_configuration.html#hash-behaviour), when combining config from multiple places.
##### Example
```ini
# ansible.cfg
hash_behaviour = merge
```#### Safeguard during interactive use: `dcos_cluster_name_confirmed`
When invoking these roles interactively (for example from the operator's machine), the `DCOS.bootstrap` role will require a manual confirmation of the cluster to run against. This is a safeguarding mechanism to avoid unintentional upgrade or config changes. In non-interactive plays, a variable can be set to skip this step, e.g.:
```bash
ansible-playbook -e 'dcos_cluster_name_confirmed=True' dcos.yml
```## Example playbook
Mesosphere DC/OS is a complex system, spanning multiple nodes to form a full multi-node cluster. There are some constraints in making a playbook use the provided roles:
1. Order of groups to run their respective roles on (e.g. bootstrap node first, then masters, then agents)
2. Concurrency for upgrades (e.g. `serial: 1` for master nodes)The provided `dcos.yml` playbook can be used as-is for installing and upgrading Mesosphere DC/OS.
## Tested OS and Mesosphere DC/OS versions
* CentOS 7, RHEL 7
* DC/OS 1.12, both open as well as enterprise version## License
[Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
## Author Information
This role was created by team SRE @ Mesosphere and others in 2018, based on multiple internal tools and non-public Ansible roles that have been developed internally over the years.