https://github.com/cdapio/chaos-monkey

Chaos Monkey for CDAP
https://github.com/cdapio/chaos-monkey

Last synced: 8 months ago
JSON representation

Chaos Monkey for CDAP

Host: GitHub
URL: https://github.com/cdapio/chaos-monkey
Owner: cdapio
License: other
Created: 2017-01-06T01:45:07.000Z (over 9 years ago)
Default Branch: develop
Last Pushed: 2024-09-13T12:03:54.000Z (almost 2 years ago)
Last Synced: 2024-09-14T01:46:46.576Z (almost 2 years ago)
Language: Java
Size: 334 KB
Stars: 0
Watchers: 39
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # Chaos Monkey

Chaos Monkey provides a convenient way to disrupt CDAP and hadoop services on a cluster. 

Disruptions can be scheduled, randomized, or issued on command. 


## Standalone Chaos Monkey 

To start Chaos Monkey daemon and HTTP server, set configurations in chaos-monkey-site.xml and run ChaosMonkeyMain 


### Configurations

**Disruptions setup** 


>By default, the following disruptions will be available to each service: 


>* start 


>* restart 


>* stop 


>* terminate 


>* kill 


>* rolling-restart 


>

>Custom disruptions can be added by extending the Disruption class and then associating them with a service.

>A custom disruption is started by calling ClusterDisruptor.disrupt(serviceName, disruptionName, actionArguments), 

>where disruptionName is set by the Disruption.getName() method.

>Disruptions receive a collection of RemoteProcess based on the actionArguments, and can be used to execute commands 

>via ssh. To add a custom disruption to a service:

>* {service}.disruptions - Class paths of custom disruptions, separated by commas

**Initialize a service for Chaos Monkey** 


>Any configured service can be interacted with through ClusterDisruptor or REST endpoints. To configure a service for 

chaos Monkey, either provide custom disruptions or a pid file for the default disruptions: 


>* {service}.pidFile - Path to the .pid file of the service 


**Configurations for scheduled disruptions** 


>These additional properties can be set for a certain service to start a scheduled disruption: 


>* {service}.interval - Number of seconds between each disruption 


>* {service}.killProbability - Number between 0 to 1 representing chance of kill occurring each iteration. 


>* {service}.stopProbability - Number between 0 to 1 representing chance of stop occurring each iteration. 


>* {service}.restartProbability - Number between 0 to 1 representing chance of restart occurring each iteration. 


>* {service}.minNodesPerIteration - Minimum number of nodes affected each iteration. 


>* {service}.maxNodesPerIteration - Maximum number of nodes affected each iteration. 


**Cluster information collector** 


>By default, Chaos Monkey will retrieve cluster information from Coopr 


>To get cluster information from Coopr, the following configurations need to be set:


>* cluster.info.collector.coopr.clusterId 


>* cluster.info.collector.coopr.tenantId 


>* cluster.info.collector.coopr.server.uri 


>

>To get cluster information from other sources, include a plugin to implement ClusterInfoCollector and set the 

following configs: 


>* cluster.info.collector.class - classpath of the implementation of ClusterInfoCollector

>

>Additional properties can be passed in to the ClusterInfoCollector implementation. Setting the property

cluster.info.collector.{propertyName} in configurations will make {propertyName} available in the properties map, 

passed in via the initialize method

**SSH configurations** 


>username - username of SSH profile (if different from system user)


>keyPassphrase - passphrase for private key, if applicable 


>privateKey - path to private key (will check default locations unless specified)


## HTTP endpoints

HTTP server is hosted on port 11020, with the following endpoints: 


>**POST /v1/services/{service}/{action}** 


>{action} includes stop, kill, terminate, start, restart, and rolling-restart 


>The action, by default, will be performed on all nodes configured with the service. To specify affected nodes, include

ne of the following request bodies:

>```

>{

>  nodes:[,...]

>}

>```

>```

>{

>  percentage:

>}

>```

>```

>{

>  count:

>}

>```

>In addition to the above request bodies, rolling restart can be also configured with:

>```

>{

>  restartTime:

>  delay:

>}

>```

>**GET /v1/nodes/{ip}/status** 


>Get the status of all configured service on a given address 


>**GET /v1/status** 


>Get the status of all configured service on every node of the cluster

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cdapio/chaos-monkey

Awesome Lists containing this project

README