https://github.com/styxlab/kubernetes-mongodb-shard
Deploy a mongodb sharded cluster on kubernetes.
https://github.com/styxlab/kubernetes-mongodb-shard
kubernetes-cluster mongodb-shard replicas shard
Last synced: 6 months ago
JSON representation
Deploy a mongodb sharded cluster on kubernetes.
- Host: GitHub
- URL: https://github.com/styxlab/kubernetes-mongodb-shard
- Owner: styxlab
- License: mit
- Created: 2016-06-02T13:31:40.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2021-02-25T08:27:17.000Z (over 4 years ago)
- Last Synced: 2025-04-12T02:24:20.041Z (6 months ago)
- Topics: kubernetes-cluster, mongodb-shard, replicas, shard
- Language: Shell
- Homepage:
- Size: 27.3 KB
- Stars: 37
- Watchers: 5
- Forks: 15
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# kubernetes-mongodb-shard
Deploy a mongodb sharded cluster on kubernetes. This works on both small clusters with a minimum of 3 nodes and large clusters with 100+ nodes.## Prerequisites
- A Kubernetes cluster with at least 3 scheduable nodes.
- Kubernetes v1.2.3 or greater## Features
- Configurable number of shards, replicas, config servers and mongos
- Shard members and data replicas are distributed evenly on available nodes
- Storage is directly allocated on each node
- All mongo servers are combined into one kubernetes pod per node
- Services are setup which can be consumed upstream
- Official mongodb docker image is used without modifications## Description
Setting up a mongodb shard on kubernetes is easy with this repo. `kubectl`
is used to determine the number of nodes in your cluster
and the provided shell script `src/generate.sh` creates one kubernetes `yaml`
file per node as well as the neccessary `js` config scripts. Finally, the
shard is automatically created by executing the `yaml` files and applying the
config scripts.Great care has been taken to distribute data accross the cluster to
maximize data redundancy and high availability. In addition we
bind disk space with the kubernetes `hostPath` option in order
to maximize I/O throughput.Replication is achived by the built in mongodb feature rather than kubernetes
itself. However, as kubernetes knows about the desired state of your shard, it
will try to restore all services automatically should one node go down.## Usage
```
$ git clone https://github.com/styxlab/kubernetes-mongodb-shard.git
$ cd kubernetes-mongodb-shard
$ make build
```
All needed files can be found in the `build` folder. You should find one
`yaml` file for each node of you cluster and a couple of `js`
files that will configure the mongodb shard. Finally, you
need to execute these files on your kubernetes cluster:
```
$ make run
```
## Verify
After a minute or two (depending on how fast the docker images are fetched over your network) you should see that all deployments are up and running. For a 3 node shard, a typical output is shown below.
```
$ kubectl get deployments -l role="mongoshard"
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
mongodb-shard-node01 1 1 1 1 1d
mongodb-shard-node02 1 1 1 1 1d
mongodb-shard-node03 1 1 1 1 1d$ kubectl get pods -l role="mongoshard"
NAME READY STATUS RESTARTS AGE
mongodb-shard-node01-1358154500-wyv5n 5/5 Running 0 1d
mongodb-shard-node02-1578289992-i49fw 5/5 Running 0 1d
mongodb-shard-node03-4184329044-vwref 5/5 Running 0 1d
```
You can now connect to one of the mongos and inspect the status of the shard:
```
$ kubectl exec -ti mongodb-shard-node01-1358154500-wyv5n -c mgs01-node01 mongo
MongoDB shell version: 3.2.6
connecting to: test
mongos>
```
Type `sh.status()` at the mongos prompt:
```
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("575abbcb568388677e5336ef")
}
shards:
{ "_id" : "rs01", "host" : "rs01/mongodb-node01.default.svc.cluster.local:27020,mongodb-node02.default.svc.cluster.local:27021" }
{ "_id" : "rs02", "host" : "rs02/mongodb-node01.default.svc.cluster.local:27021,mongodb-node03.default.svc.cluster.local:27020" }
{ "_id" : "rs03", "host" : "rs03/mongodb-node02.default.svc.cluster.local:27020,mongodb-node03.default.svc.cluster.local:27021" }
active mongoses:
"3.2.6" : 3
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "styxmail", "primary" : "rs01", "partitioned" : true }
```## Consume
The default configurations configures one mongos service per node. Use one of them to connect
to your shard from any other application on your kubernetes cluster:
```
$ kubectl get svc -l role="mongoshard"
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mongodb-node01 10.3.0.175 27019/TCP,27018/TCP,27017/TCP,27020/TCP,27021/TCP 1d
mongodb-node02 10.3.0.13 27019/TCP,27018/TCP,27017/TCP,27020/TCP,27021/TCP 1d
mongodb-node03 10.3.0.47 27019/TCP,27018/TCP,27017/TCP,27020/TCP,27021/TCP 1d
```## Configuration Options
Configuration options are currently hard coded in `src/generate.sh`. This will be enhanced later. The following options are availabe:
```
NODES: number of cluster nodes (default: all nodes on your cluster as determined by kubectl)
SHARDS: number of shards in your mongo database (default: number of cluster nodes)
MONGOS_PER_CLUSTER: you connect to your shard through mongos (default: one per node, minimum: 1)
CFG_PER_CLUSTER: config servers per cluster (default: 1 config server, configured as a replication set)
CFG_REPLICA: number of replicas per configuration cluster (default: number of nodes)
REPLICAS_PER_SHARD: each shard is configured as a replication set (default: 2)
```## Ports
As each pod gets one IP address assigned, each service within a pod must have a distinct port.
As the mongos are the services by which you access your shard from other applications, the standard
mongodb port `27017` is given to them. Here is the list of port assignments:| Service | Port |
| ------------------------------- | --------|
| Mongos | 27017 |
| Config Server | 27018 |
| Arbiter (if present) | 27019 |
| Replication Server (Primary) | 27020 |
| Replication Server (Secondary) | 27021 |Usually you need not be concerned about the ports as you will only access the shard through the
standard port `27017`.## Examples
A typical `yaml` file for one node is shown below:
```
apiVersion: v1
kind: Service
metadata:
name: mongodb-node01
labels:
app: mongodb-node01
role: mongoshard
tier: backend
spec:
selector:
app: mongodb-shard-node01
role: mongoshard
tier: backend
ports:
- name: arb03-node01
port: 27019
protocol: TCP
- name: cfg01-node01
port: 27018
protocol: TCP
- name: mgs01-node01
port: 27017
protocol: TCP
- name: rsp01-node01
port: 27020
protocol: TCP
- name: rss02-node01
port: 27021
protocol: TCP---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: mongodb-shard-node01
spec:
replicas: 1
template:
metadata:
labels:
app: mongodb-shard-node01
role: mongoshard
tier: backend
spec:
nodeSelector:
kubernetes.io/hostname: 80.40.200.130
containers:
- name: arb03-node01
image: mongo:3.2
args:
- "--storageEngine"
- wiredTiger
- "--replSet"
- rs03
- "--port"
- "27019"
- "--noprealloc"
- "--smallfiles"
ports:
- name: arb03-node01
containerPort: 27019
volumeMounts:
- name: db-rs03
mountPath: /data/db
- name: rss02-node01
image: mongo:3.2
args:
- "--storageEngine"
- wiredTiger
- "--replSet"
- rs02
- "--port"
- "27021"
- "--noprealloc"
- "--smallfiles"
ports:
- name: rss02-node01
containerPort: 27021
volumeMounts:
- name: db-rs02
mountPath: /data/db
- name: rsp01-node01
image: mongo:3.2
args:
- "--storageEngine"
- wiredTiger
- "--replSet"
- rs01
- "--port"
- "27020"
- "--noprealloc"
- "--smallfiles"
ports:
- name: rsp01-node01
containerPort: 27020
volumeMounts:
- name: db-rs01
mountPath: /data/db
- name: cfg01-node01
image: mongo:3.2
args:
- "--storageEngine"
- wiredTiger
- "--configsvr"
- "--replSet"
- configReplSet01
- "--port"
- "27018"
- "--noprealloc"
- "--smallfiles"
ports:
- name: cfg01-node01
containerPort: 27018
volumeMounts:
- name: db-cfg
mountPath: /data/db
- name: mgs01-node01
image: mongo:3.2
command:
- "mongos"
args:
- "--configdb"
- "configReplSet01/mongodb-node01.default.svc.cluster.local:27018,mongodb-node02.default.svc.cluster.local:27018,mongodb-node03.default.svc.cluster.local:27018"
- "--port"
- "27017"
ports:
- name: mgs01-node01
containerPort: 27017
volumes:
- name: db-cfg
hostPath:
path: /enc/mongodb/db-cfg
- name: db-rs01
hostPath:
path: /enc/mongodb/db-rs01
- name: db-rs02
hostPath:
path: /enc/mongodb/db-rs02
- name: db-rs03
hostPath:
path: /enc/mongodb/db-rs03
```## Layouts
In order to get an understanding on how this script distributes the different mongodb servers and replication
set on your cluster, a couple examples are shown. First, take note of the notation:| Abbreviation | Meaning |
| -------------| --------------------------|
| columns | nodes |
| rows | shards |
| - | no assignment |
| rsp | replica set (primary) |
| rss | replica set (secondary) |
| arb | arbiter |### 3 nodes, 3 shards, 2 shards per node, 1 arbiter
| | node 1 | node 2 | node 3 |
| ------- | ------ | ------- | ------ |
| shard 1 | rsp | rss | arb |
| shard 2 | rss | arb | rsp |
| shard 3 | arb | rsp | rss |As can be seen, the secondary of a particular shard is always on a different node than the primary.
This ensures the replication feature. Also, each node does contain the same number of data stores, thus
distributing disk usage evenly accross the cluster.### 5 nodes, 5 shards, 2 shards per node, 1 arbiter
| | node 1 | node 2 | node 3 | node 4 | node 5 |
| ------- | ------ | ------- | ------ | ------ | ------ |
| shard 1 | rsp | rss | arb | - | - |
| shard 2 | rss | arb | - | - | rsp |
| shard 3 | arb | - | - | rsp | rss |
| shard 4 | - | - | rsp | rss | arb |
| shard 5 | - | rsp | rss | arb | - |Note that the same properties are retained for a larger cluster with 5 nodes. So, you can achieve
real horizontal scaling of your mongodb database with this technique.## Todos
- Gather config parameters
- EmptyDir option