Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://gitlab.com/blackfish/blackfish

a Swarm Cluster Maker for Dev & Production
https://gitlab.com/blackfish/blackfish

Last synced: about 2 months ago
JSON representation

a Swarm Cluster Maker for Dev & Production

Lists

README

        

#+TITLE: Blackfish: a CoreOS VM to build swarm clusters for Dev & Production
#+OPTIONS: toc:1
#+SETUPFILE: theme.setup

Note: you may prefer reading the [[https://blackfish.gitlab.io/blackfish/][README]] on the gitlab website.

* Description

Blackfish is a pre-built CoreOS VM that ease the bootstrap of swarm clusters.
It contains all basic services required to ease the deployment of your apps and manage basic production operations.
All the services are pre configured for production with HA and TLS communication enabled.

- a Consul cluster with a registrator
- an internal HA Docker registry
- an HAProxy load balancer with auto discovery
- centralized logs with an HA graylog2 setup
- centralized monitoring with telegraf+influx+grafana

The project also comes with all the materials to boot a cluster on your local machine with a Vagrant setup, or on AWS EC2 with terraform scripts.

We hope it will help you fill the gap between "Docker in Dev" and "Docker in Production"

#+CAPTION: Docker in Dev VS Prod
#+NAME: fig:docker-prod.jpg
#+ATTR_HTML: width="100px"
[[https://blackfish.gitlab.io/blackfish/images/docker-prod.jpg]]

You'll find other and maybe simpler tutorials or Github projects to deploy Swarm on AWS, but if you don't want your cluster to be exposed on public facing IPs, you'll then have to get your hands dirty on a lot of other things.

Blackfish is built on top of the following components:
- [[http://packer.io/][Packer]] for building the boxes for various providers (Virtualbox, AWS, Kvm, ...)
- [[https://www.terraform.io/][Terraform]] for provisioning the infrastructure on AWS
- [[http://vagrantup.com][Vagrant]] for running the Swarm cluster in Virtualbox

* Pre-Requisites

To use this project you will need at least this list of tools properly installed on your box:

- Docker 1.10
- Vagrant 1.8
- Virtualbox 5.0

* Quickstart

To quickly bootstrap a Swarm cluster with Vagrant on your local machine, configure the [[https://gitlab.com/blackfish/blackfish/raw/master/nodes.yml][nodes.yml]] file and type the following commands.

#+BEGIN_SRC bash
$ git clone https://gitlab.com/blackfish/blackfish.git
$ cd blackfish
$ vi nodes.yml
...
#+END_SRC

#+CAPTION: nodes.yml
#+NAME: fig:nodes.yml
#+BEGIN_SRC yaml
---
provider : "virtualbox"
admin-network : 192.168.101.0/24
box : yanndegat/blackfish-coreos
id : myid
stack : vagrant
dc : dc1
consul-joinip : 192.168.101.101
consul-key : hMJ0CFCrpFRQfVeSQBUUdQ==
registry-secret : A2jMOuYSS+3swzjAR6g1x3iu2vY/cROWYZrJLqYffxA=
ssh-key : ./vagrant.pub
private-ssh-key : ~/.vagrant.d/insecure_private_key
nodes:
- ip : 192.168.101.101
docker-registry : true
memory : "3072"
cpus : "2"
- memory : "3072"
cpus : "2"
ip : 192.168.101.102
- memory : "3072"
cpus : "2"
ip : 192.168.101.103
#+END_SRC

#+BEGIN_SRC bash
$ vagrant up
==> box: Loading metadata for box 'yanndegat/blackfish'
box: URL: https://atlas.hashicorp.com/yanndegat/blackfish
==> box: Adding box 'yanndegat/blackfish' (v0.1.0) for provider: virtualbox
box: Downloading: https://atlas.hashicorp.com/yanndegat/boxes/blackfish/versions/0.0.5/providers/virtualbox.box
box: Progress: 26% (Rate: 1981k/s, Estimated time remaining: 0:02:24)
...
$
#+END_SRC

TLS certificates have been generated in your ~$HOME/.blackfish/vagrant~ directory. You have to declare the CA Cert on your host, according to your system.

Once you have registered your [[tls][tls certificates]] setup your [[dns][dns configuration]], you can go to https://consul-agent.service.vagrant:4443/ui

Now refer to the [[play][Play with swarm]] section to go futher.

* Getting Started on AWS

#+BEGIN_NOTES Warning
This section is obsolete. To use blackfish on AWS, several terraform scripts are available but they're toooooo complex.
Instead we want to be able to use the same "nodes*yml" configuration files to bootstrap swarm on AWS. To do so, we will in a near future
provide some kind of go binary that reads "nodes.yml" & terraform templates and outputs a complete terraform directory.

Meanwhile, you can use the ~terraform/aws/vpc/~ templates or refer to the old scripts to bootstrap the vpc, and use vagrant --provider=aws to bootstrap the aws nodes.
#+END_NOTES

Refer to the [[https://blackfish.gitlab.io/blackfish/terraform/aws/README.html][Aws README]] file.

Here's an example of what a nodes.yml for AWS could look like

#+BEGIN_SRC yaml
---
provider : aws
admin-network : 10.233.1.0/24
stack : demo
dc : one
id : prod1
consul-joinip : 10.233.1.201
consul-key : dlnA+EWVSiQyfd0vkApmTUu4lDvMlmJcjMy+8dMEVkw=
registry-secret : rJ/9vXube9iujCjFiniJODQX60Q/XJytUJyOKQfPaLo=

journald-sink : journald.service.demo:42201
influxdb-url : http://influxdb.service.demo:48086

labels :
- type=control

nodes:
- ip : 10.233.1.201
docker-registry : true
- ip : 10.233.1.202
- ip : 10.233.1.203

aws:
region : eu-west-1
access-key-id : XXXXXXXXXXXXXXX
secret-access-key : YYYYYYYYYYYYYYYYYYYYYYYYYYYYY
availability-zone : eu-west-1a
instance-type : m3.xlarge
ebs-optimized : false
keypair-name : my-keypair
private-ssh-key : ~/.ssh/my-keypair.key
subnet-id : subnet-d779afb3
s3bucket : bucket-742587092752
security-groups :
- sg-2676b741

#+END_SRC

* <>Play with your Swarm cluster

Now we can play with Swarm.

** Using the Swarm cluster

You can now use your Swarm cluster to run Docker containers as simply as you would do to run a container on your local Docker engine. All you have to do is
target the IP of one of your Swarm node.

#+BEGIN_SRC bash
$ export PATH=$PATH:$(pwd)/bin
$ blackfish vagrant run --rm -it alpine /bin/sh
/ # ...
#+END_SRC

** Using the Blackfish internal registry

The Blackfish internal registry which is automatically started on the Swarm cluster is registered on the "registry.service.dev.vagrant:5000" name. So you have to tag & push Docker images with this name if you want the nodes to be able to download your images.

As the registry has an auto signed TLS certificate, you have to declare its CA Cert on your Docker engine and on your system (again according to your OS)

#+BEGIN_SRC bash
$ export PATH=$PATH:$(pwd)/bin
$ sudo mkdir -p /etc/docker/certs.d/registry.service.vagrant
$ sudo cp ~/.blackfish/vagrant/ca.pem /etc/docker/certs.d/registry.service.vagrant/
$ sudo systemctl restart docker
...
$ docker tag alpine registry.service.vagrant/alpine
$ docker push registry.service.vagrant/alpine
...
$ blackfish vagrant pull registry.service.vagrant/alpine
...
$ blackfish vagrant run --rm -it registry.service.vagrant/alpine /bin/sh
/ # ...
#+END_SRC

** Run the examples

Examples are available in the [[https://gitlab.com/blackfish/blackfish/raw/master/examples][examples]] directory. You can play with them to discover how to work with Docker Swarm.

* Blackfish Components

** Architecture guidelines

The Blackfish VM tries to follow the Immutable Infrastructure precepts:

- Every component of the system must be able to boot/reboot without having to be provisionned with configuration elements other than via cloud init.
- Every component of the system must be able to discover its pairs and join them
- If a component can't boot properly, it must be considered as dead. Don't try to fix it.
- To update the system, we must rebuild new components and replace existing ones with the new ones.

** Blackfish is architectured with the following components :

- a Consul cluster setup with full encryption setup, which consists of a set of Consul agents running in "server" mode, and additionnal nodes running in "agent" mode.
The Consul cluster could be used:
- as a distributed key/value store
- as a service discovery
- as a DNS server
- as a backend for Swarm master election

- a Swarm cluster setup with full encryption setup, which consists of a set of Swarm agents running in "server" mode, and additionnal nodes running in agent mode.
Every Swarm node will also run a Consul agent and a Registrator service to declare every running container in Consul.

- a HA private Docker registry with TLS encryption. It's registered under the DNS address registry.service.vagrant.
HA is made possible by the use of a shared filesystem storage.
On AWS, it is possible to configure the registry's backend to target a S3 bucket.

- a load balancer service built on top of HAProxy and consul-template with auto discovery

Some nodes could play both "consul server" and "swarm server" roles to avoid booting too many servers for small cluster setups.

** The Nodes.yml

The philosophy behind the "nodes.yml" files is to configure several "nodes.yml" files, each defining small clusters with special infrastructure features,
and make them join together to form a complete and robust Swarm cluster, multi az, multi dc, with several types of storage, ...

You can select the nodes.yml file you want to target by simply setting the ~BLACKFISH_NODES_YML~ env variable

#+BEGIN_SRC bash
$ BLACKFISH_NODES_YML=nodes_mycluster_control.yml vagrant up --provider=aws
...
$ BLACKFISH_NODES_YML=nodes_mycluster_ebs-storage.yml vagrant up --provider=aws
...
$ BLACKFISH_NODES_YML=nodes_mycluster_ssd.yml vagrant up --provider=aws
...

#+END_SRC

* Considerations & Roadmap

** Volume driver Plugins

Flocker is available on Blackfish but it has been disabled due to a critical bug when running with a Docker engine < 1.11.
Rex Ray and Convoy suffer the same bug. Plus Flocker's control service is not HA ready.

** CoreOS Alpha channel
Too many critical bugs are fixed on every Docker release and the alpha channel is the one that sticks to the Docker engine.
When a good configuration of the different components will be stabilized, we will move to more stable channels.

** Consul + Etcd on the same hosts ?

It sounds crazy. Yet it is recommanded to use 2 separate Consul clusters to run a Swarm cluster : One for master election and service discovery, one for the Docker overlay network.
As we run on CoreOS, there's a feature we'd like to enable: the automatic upgrade of CoreOS nodes based on their channel. To avoid that a whole cluster reboots all its node at the same time, CoreOS can use etcd to provide coordination.

** Use docker-machine

We didn't use docker-machine as it forces us to enter the "post provisionning" way of doing things.

** Run Consul and Swarm services as rocket containers

There are some caveats running the system services as Docker containers, even on CoreOS. The main problem is the process supervision with systemd, as full described in this [[https://lwn.net/Articles/676831/][article]].
Plus, we don't want system services to be visible and "killable" by a simple Docker remote command.

That's why every system component is run with the rocket CoreOS container engine.

** Monitoring

Monitoring and Log centralization is provided in a simple yet powerful manner:

Each node can be configured to report metrics and output its log to a remote machine.
That way, you can bootstrap a cluster and then run monitoring tools such as graylog2 and influxdb on it, the nodes will automatically start sending to it.

** Running on GCE

TBDone

** Running on Azure

TBDone

** Running on premise

Thats why we'e chosen CoreOs. Coreos comes with powerful tools such as Ignition and coreos-baremetal that allow us to boot our solution on premise infrastructures.

** How to do rolling upgrades of the infrastructure with terraform...?

well. that has to be defined.

* <>Configure DNS resolution

Before using Swarm, you have to declare the Blackfish VMs internal DNS in your system. To do so, you have multiple options:

- add one of the hosts in your /etc/resolv.conf (quick but dirty)
- configure your network manager to add the hosts permanently
#+BEGIN_SRC bash
echo 'server=/vagrant/192.168.101.101' | sudo tee -a /etc/NetworkManager/dnsmasq.d/blackfish-vagrant
sudo systemctl restart NetworkManager.service
#+END_SRC
- configure a local dnsmasq service which forwards dns lookups to the Blackfish Dns service according to domain names.

For the latter solution, you can refer to the [[https://gitlab.com/blackfish/blackfish/raw/master/bin/dns][./bin/dns]] file which runs a dnsmasq forwarder service.

#+BEGIN_SRC bash
$ # check your /etc/resolv.conf file
$ cat /etc/resolv.conf
nameserver 127.0.0.1
...
$ # eventually run the following command for your next boot (according to your OS)
$ sudo su
root $ echo "nameserver 127.0.0.1" > /etc/resolv.conf.head
root $ exit
$ ./bin/dns vagrant 192.168.101.101
$ dig registry.service.vagrant

$ ...
#+END_SRC

** <> On macOS

On MacOS, registering another DNS server for a given zone is very easy.
#+BEGIN_SRC bash
echo "nameserver 192.168.101.101" | sudo tee /etc/resolver/vagrant
#+END_SRC

* <> Registering TLS Certificates

** Registering the CA Root on your OS
According to your OS and your distro, there are several ways to register the ca.pem on your OS

#+CAPTION: Example on ubuntu
#+NAME: Example on ubuntu
#+BEGIN_SRC bash
$ sudo cp ~/.blackfish/vagrant/ca.pem /usr/local/share/ca-certificates/blackfish-vagrant-ca.crt
$ sudo update-ca-certificates
#+END_SRC

#+CAPTION: Example on archlinux
#+NAME: Example on archlinux
#+BEGIN_SRC bash
$ sudo cp ~/.blackfish/vagrant/ca.pem /etc/ca-certificates/trust-source/blackfish-vagrant-ca.crt
$ sudo update-ca-trust
#+END_SRC

#+CAPTION: Example on macOS
#+NAME: Example on macOS
#+BEGIN_SRC bash
sudo security add-trusted-cert -d -r trustRoot -k "/Library/Keychains/System.keychain" ~/.blackfish/vagrant/ca.pem
#+END_SRC

If you still have issues about pushing docker images in the registry:

#+CAPTION: Example on ubuntu
#+NAME: Example on ubuntu
#+BEGIN_SRC bash
$ sudo cp ~/.blackfish/vagrant/ca.pem /etc/docker/certs.d/registry.service.[stack]
$ systemctl restart docker
#+END_SRC

** Registering the certificates on your browser

Actually, there's a problem with the generated certificates under Chrome/Chromium browsers. We have to spend time on it to understand why Chrome rejects it.
You can still use Firefox by following these steps:
- go to the ~Preferences/Advanced/Certificates/View Certificates/Authorities~ menu
- import the ~$HOME/.blackfish/vagrant/ca.pem~ file
- go to the ~Preferences/Advanced/Certificates/View Certificates/Your Certificates~ menu
- import the ~$HOME/.blackfish/vagrant/client.pfx~

Now you should be able to access the https://consul-agent.service.vagrant:4443/ui url.