Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/adobe-platform/mesos-systemd

Systemd scripts to run Mesos on CoreOS
https://github.com/adobe-platform/mesos-systemd

Last synced: 2 months ago
JSON representation

Systemd scripts to run Mesos on CoreOS

Lists

README

        

# mesos-systemd

Adobe Platform scripts to bootstrap a CoreOS [`cluster`](https://github.com/adobe-platform/mesos-cluster) & run Mesos/Marathon/Zookeeper-Exhibitor.

Provides node-level services as [`Fleet Units`](https://coreos.com/using-coreos/clustering/) for every machine in the cluster.

Most services (logging, metrics, monitoring) run on all nodes, some only run on specific tiers based on the metadata that is injected into Fleet.

The aim of this setup is to move instance provisioning steps into the CoreOS machine level, automated via fleetctl/systemctl. Almost all of our systemd units utilize docker to run our services. Consequently, we're able to use the vanilla CoreOS EC2 AMI (i.e.: we don't bake AMIs at all). That being said, we have methods in this repo that also deal with sensitive data/secrets to configure various services (more below).

DISCLAIMER:
====

This repository may reference private repositories or scripts. Most should be replaceable with your own, but either way - proceed with caution as this project is highly experimental and certain nuances may not be well documented. If you want to use this repo, you may have to prune the code a bit and edit/delete certain files.

Concepts
====

The purpose of this repository is to house all setup scripts and systemd/fleetd units in a central location, separate of our infrastructure provisioning scripts (cloudformation).

All setup behavior is defined in the [`init`](https://github.com/adobe-platform/mesos-systemd/blob/master/init) script.

Assumptions:

- Your infrastructure has 3 tiers: `control`, `proxy`, `worker`
- ALL nodes run a `bootstrap.service`, whatever that may be.
- Some of the scripts require `/etc/environment` to contain certain variables (usually cloudformation parameters such as route53 entries)
- S3 buckets are set correctly and all required credential files (SSH keys, datadog & sumologic credentials) are properly provided to `init` & can be downloaded using [behance/docker-aws-s3-downloader](https://github.com/adobe-platform/docker-aws-s3-downloader)

#### `init` bootstrap

Our `bootstrap.service` just clones this repo and runs the `init` script.

From there, it does a couple of things:

1. ensure that any credentials/secure files are downloaded from S3 (to allow docker & git to pull private dependencies)
2. configure SSH configs to allow github.com access
3. copy `.dockercfg` into `/root` # TODO: refactor process as this is a hack
4. runs ALL scripts in `v2/setup`
- these scripts will always be run with `sudo` (i.e.: as root)
- set things up like create motds, aliases, dropins for various services
5. starts up tier-specific template units that are specified by the running machines' IP (provided by CoreOS / cloudinit)
- these are started via fleet, event though they are NOT global units and run on specific machines
- rationale for this is to give us granular control over certain units, such as mesos-slaves. It allows us to control individual nodes, or perform rolling actions (such as deploys) while retaining visibility into the cluster as a whole.
6. submits and starts generic fleet units

Services
====

### Global Services (run on ALL nodes in ALL tiers)

#### Monitoring
- [Datadog](https://www.datadoghq.com/)
- [Sumologic](https://www.sumologic.com/)

#### Util/Automated Maintenance
- Docker Logrotate (based on [michaloo/logrotate](https://github.com/michaloo/logrotate))
- Docker Image/Container Cleanup

#### MISC
- SSHD mask
- [bug in CoreOS](https://github.com/coreos/bugs/issues/966)
- [proposed changes](https://github.com/coreos/init/pull/188)

### Control Tier Nodes:

- [Mesos](http://mesos.apache.org/) Master
- [Marathon](https://mesosphere.github.io/marathon/)
- [Exhibitor (for Zookeeper)](https://github.com/Netflix/exhibitor)
- [Flight Director](https://github.com/adobe-platform/flight-director) - private Marathon deployment wrapper/manager (stay tuned!)
- [HUD](https://github.com/adobe-platform/flight-director-hud) - private UI shim for flight-director (stay tuned!)

### Proxy Tier Nodes:
- [`CAPCOM`](https://github.com/adobe-platform/capcom) - private Container-Proxy Manager (stay tuned!)
- Heatshield Proxy (our version of nginx) or HAProxy

### Worker Tier Nodes:
- Mesos Slave

Key/Secret Management & Configuration
====

All secrets & key management is a bit adhoc. Most of the `setup` scripts, which house the logic for setting up the data for then fleet units to use, require a few things to download secrets & keys:

- the `$CONTROL_TIER_S3SECURE_BUCKET` environment variable, written into `/etc/environment` by cloudformation
- [behance/docker-aws-s3-downloader](https://github.com/behance/docker-aws-s3-downloader) container to download files
- IAM roles to access `$CONTROL_TIER_S3SECURE_BUCKET`

Secrets make it onto the nodes in the form of flat text files that live within `$CONTROL_TIER_S3SECURE_BUCKET`. The `setup` files **individually** know which file(s) they need to download & how to read, set or use the data for their corresponding units. So for example, the [datadog unit](https://github.com/adobe-platform/mesos-systemd/blob/master/v2/fleet/datadog.service#L21) requires an `etcd` key, `/ddapikey`. Knowing this, we have a [datadog setup script](https://github.com/adobe-platform/mesos-systemd/blob/master/v2/setup/datadog.sh) which downloads a `.datadog` file from `$CONTROL_TIER_S3SECURE_BUCKET`, expects it to be in a certain format, and sets the etcd key.

#### Files in S3

We are planning to deprecate the following in favor other solutions (DynamoDB + KMS?).

##### Services, dotfiles, dotfile formats

| Service | File | Format |
| ------------- | ------------- | ------------- |
| Datadog | `.datadog` | Just the key. Nothing else. |
| Sumologic | `.sumologic` | `ID=YOURID`
`SECRET=YOURSECRET`|
| Flight Director | `.flight-director` | `/FD/GITHUB_CLIENT_ID (YOUR GITHUB APP ID)`
`/FD/GITHUB_CLIENT_SECRET (YOUR GITHUB APP SECRET)`
`/FD/GITHUB_ALLOWED_TEAMS org/team` |
| HUD | `.hud` | `/HUD/client-id (GITHUB_APP_ID can == value in .flight-director)`
`/HUD/client-secret (GITHUB_APP_SECRET can == value in .flight-director)`|
| Marathon | `.marathon` | `/marathon/username a-username`
`/marathon/password a-password` |

##### MISC

- `.dockercfg` to download private containers
- `id_rsa` to clone any private repositories

Nothing special needs to be done for these two just as long as the cloudformation templates sets the following in `/etc/environment`

```
$SECURE_FILES=.dockercfg:id_rsa,0600,.ssh/id_rsa
```

The format of this environment variable just needs to conform to [behance/docker-aws-s3-downloader](https://github.com/behance/docker-aws-s3-downloader)