Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/uditgaurav/k8s-actions

This Repository contains the code for creating a github action which can delete a pod by passing the name and namespace of pod
https://github.com/uditgaurav/k8s-actions

actions ansible bash-script dockerfile github kubectl kubernetes pod

Last synced: about 2 months ago
JSON representation

This Repository contains the code for creating a github action which can delete a pod by passing the name and namespace of pod

Host: GitHub
URL: https://github.com/uditgaurav/k8s-actions
Owner: uditgaurav
Created: 2020-04-17T13:53:31.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2021-11-30T10:35:27.000Z (about 3 years ago)
Last Synced: 2024-10-10T15:42:54.065Z (3 months ago)
Topics: actions, ansible, bash-script, dockerfile, github, kubectl, kubernetes, pod
Language: Shell
Homepage:
Size: 43.9 KB
Stars: 3
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# GitHub Action for Chaos Engineering in Kubernetes

This action provides a way to perform different chaos experiments on the Kubernetes environment. It contains Litmus Chaos experiments to run the chaos and find a weakness in the system. For more details about chaos engineering in Kubernetes using Litmus visit litmus-docs .

## Pre-requisites

Kubernetes 1.16 or later.

## Overview.

There is a number of chaos experiments that can be performed using `github-chaos-actions`, you can select the one which you want to perform, and for more details about the experiment please visit the experiment docs section.

## Run a chaos experiment using this action

We just need to follow these simple steps to run a chaos experiment using this action:

- **Deploy Application**: We need to have an application running on which the chaos will be performed. The user has to create an application and pass the application details through action's ENV. The details involved application kind (deployment,statefulset,daemonset), application label, and namespace.

- **Install Litmus**: Before running any experiment we need to setup litmus in the cluster. If litmus is not already installed then we can install it from `github-chaos-action` by just passing an ENV `INSTALL-LITMUS` with `true` value. This will bring up litmus with all infra components running in litmus namespace.

- **Select experiment**: Select an experiment from the list of experiments mentioned in the below section which you want to perform on an application. Get the details of the experiment and how to run the actions for a particular experiment.

**The different experiments that can be performed using `github-chaos-actions` are:**

- **Pod Delete**: This chaos action causes random (forced/graceful) pod delete of application deployment replicas. It tests deployment sanity (high availability & uninterrupted service) and recovery workflows of the application pod. Check a sample usage of pod delete chaos action and for more details about the experiment please visit pod delete docs.

- **Container Kill**: This chaos action executes SIGKILL on the container of random replicas of application deployment. It tests the deployment sanity (high availability & uninterrupted service) and recovery workflows of an application. Check a sample usage of container kill chaos action and for more details about the experiment please visit container kill docs.

- **Node CPU Hog**: This chaos action causes CPU resource exhaustion on the Kubernetes node. The experiment aims to verify the resiliency of applications that operate under resource constraints wherein replicas may sometimes be evicted on account on nodes turning unschedulable (Not Ready) due to lack of CPU resources. Check a sample usage of node cpu hog chaos action and for more details about the experiment please visit node cpu hog docs .

- **Node Memory Hog**: This chaos action causes Memory exhaustion on the Kubernetes node. The experiment aims to verify the resiliency of applications that operate under resource constraints wherein replicas may sometimes be evicted on account on nodes turning unschedulable due to lack of Memory resources. Check a sample usage of node memory hog chaos action and for more details about the experiment please visit node memory hog docs.

- **Pod CPU Hog**: This chaos action causes CPU resource consumption on specified application containers by starting one or more md5sum calculation process on the special file /dev/zero. It Can test the application's resilience to potential slowness/unavailability of some replicas due to high CPU load. Check a sample usage of pod cpu hog chaos action and for more details about the experiment please visit pod cpu hog docs.

- **Pod Memory Hog**: This chaos action causes Memory resource consumption on specified application containers by using dd command which will be used to consume memory of the application container for a certain duration of time. It can test the application's resilience to potential slowness/unavailability of some replicas due to high Memory load. Check a sample usage of pod memory hog chaos action and for more details about the experiment please visit Pod Memory hog docs.

- **Disk Fill**: This chaos action causes Disk Stress by filling up the Ephemeral Storage of the Pod using one of it containers. It forced the Pod to get Evicted if the Pod exceeds it Ephemeral Storage Limit.It tests the Ephemeral Storage Limits, to ensure those parameters are sufficient.Check a sample usage of disk fill chaos action and for more details about the experiment please visit Disk Fill hog docs.

- **Pod Network Corruption**: This chaos action Injects packet corruption on the specified container by starting a traffic control (tc) process with netem rules to add egress packet corruption. Corruption is injected via pumba library with command Pumba netem corruption bypassing the relevant network interface, packet-corruption-percentage, chaos duration, and regex filter for the container name. Check a sample usage of pod network corruption chaos action and for more details about the experiment please visit pod network corruption docs.

- **Pod Network Latency**: This chaos action causes flaky access to application replica by injecting network delay using Pumba. It injects latency on the specified container by starting a traffic control (tc) process with netem rules to add egress delays. It Can test the application's resilience to lossy/flaky network. Check a sample usage of pod network latency chaos action and for more details about the experiment please visit pod network latency docs.

- **Pod Network Loss**: This chaos action injects chaos to disrupt network connectivity to Kubernetes pods. The application pod should be healthy once chaos is stopped. It causes loss of access to application replica by injecting packet loss. Check a sample usage of pod network loss chaos action and for more details about the experiment please visit pod network loss docs

- **Pod Network Duplication**: This chaos action injects pod-network-duplication injects chaos to disrupt network connectivity to kubernetes podsThe application pod should be healthy once chaos is stopped. Service-requests should be served despite chaos. Check a sample usage of pod network duplication chaos action and for more details about the experiment please visit pod network duplication docs

- **Pod Autoscaler**: This chaos action can be used for other scenarios as well, such as for checking the Node auto-scaling feature. For example, check if the pods are successfully rescheduled within a specified period in cases where the existing nodes are already running at the specified limits. Check a sample usage of pod autoscaler chaos action and for more details about the experiment please visit pod autoscaler docs

- **Node IO Stress**: This chaos action injects IO stress on the Kubernetes node. The experiment aims to verify the resiliency of applications that share this disk resource for ephemeral or persistent storage purposes. The amount of disk stress can be either specifed as the size in percentage of the total free space on the file system or simply in Gigabytes(GB). When provided both it will execute with the utilization percentage specified and non of them are provided it will execute with default value of 10%. Check a sample usage of node io stress chaos action and for more details about the experiment please visit node io stress docs

## Usage

A sample pod delete experiment workflow:

`.github/workflows/main.yml`

```yaml
name: chaos-pipeline
#events can be modified as per requirements
on:
workflow_dispatch:

jobs:
chaos-action:
runs-on: ubuntu-latest
steps:
# KUBE_CONFIG_DATA is required env for litmuschaos/github-chaos-actions.
- name: Setting up kubeconfig ENV for Github Chaos Action
run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
env:
ACTIONS_ALLOW_UNSECURE_COMMANDS: true

- name: Setup Litmus
uses: litmuschaos/github-chaos-actions@master
env:
INSTALL_LITMUS: true

- name: Running Litmus pod delete chaos experiment
uses: litmuschaos/github-chaos-actions@master
env:
EXPERIMENT_NAME: pod-delete
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: latest
JOB_CLEANUP_POLICY: delete
APP_NS: default
APP_LABEL: run=nginx
APP_KIND: deployment
IMAGE_PULL_POLICY: Always
TOTAL_CHAOS_DURATION: 30
CHAOS_INTERVAL: 10
FORCE: false

- name: Uninstall Litmus
if: always()
uses: litmuschaos/github-chaos-actions@master
env:
LITMUS_CLEANUP: true
```

#### For EKS Clusters

A sample pod delete experiment workflow for EKS Clusters:

`.github/workflows/main.yml`

```yaml
name: chaos-pipeline
#events can be modified as per requirements
on:
workflow_dispatch:

jobs:
chaos-action:
runs-on: ubuntu-latest
steps:
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}

# Optionally kubeconfig can be passed from github secrets in base64 encoded form as mentioned above.
- name: Writing kubeconfig for eks cluster
run: |
aws eks --region ${{ secrets.AWS_REGION }} update-kubeconfig --name

- name: Setting up kubeconfig ENV for Github Chaos Action
run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
env:
ACTIONS_ALLOW_UNSECURE_COMMANDS: true

- name: Setup Litmus
uses: litmuschaos/github-chaos-actions@master
env:
INSTALL_LITMUS: true

- name: Uninstall Litmus
if: always()
uses: litmuschaos/github-chaos-actions@master
env:
LITMUS_CLEANUP: true
```

Get the details of the chaos action tunables for pod delete (above example) here

## Secrets

`KUBE_CONFIG_DATA` – **required**: A base64-encoded kubeconfig file with credentials for Kubernetes to access the cluster. You can get it by running the following command:

```bash
cat $HOME/.kube/config | base64
```

## Environment Variables

Some comman environment variables used for running the `github-chaos-actions` are:

Variables
Description
Specify In Chaos Action
Default Value

EXPERIMENT_NAME
Give the experiment name you want to run(check the list of experiments available under experiment folder)
Mandatory
No default value

APP_NS
Provide namespace of application under chaos
Optional
Default value is default

APP_LABEL
Provide application label of application under chaos.
Optional
Default value is run=nginx

APP_KIND
Provide the kind of application
Optional
Default value is deployment

INSTALL_LITMUS
Keep it true to install litmus if litmus is not already installed.
Optional
Default value is not set to true

LITMUS_CLEANUP
Keep it true to uninstall litmus after chaos
Optional
Default value is not set to true

EXPERIMENT_IMAGE
We can provide custom image for running chaos experiment
Optional
Default value is litmuschaos/go-runner

EXPERIMENT_IMAGE_TAG
We can set the image tag while using custom image for the chaos experiment
Optional
Default value is latest

IMAGE_PULL_POLICY
We can set the image pull policy while using custom image for running chaos experiment
Optional
Default value is Always

#### For EKS Cluster

Setup AWS Credentials using [GitHub secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets). The secrets should now be populated to action using ENVs.

```yaml
jobs:
chaos-action:
runs-on: ubuntu-latest
steps:
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
```

> Note: Either these secrets can be setup at Job level or have to be provided in all chaos-action steps.