https://github.com/cdapio/chaos-monkey
Chaos Monkey for CDAP
https://github.com/cdapio/chaos-monkey
Last synced: 8 months ago
JSON representation
Chaos Monkey for CDAP
- Host: GitHub
- URL: https://github.com/cdapio/chaos-monkey
- Owner: cdapio
- License: other
- Created: 2017-01-06T01:45:07.000Z (over 9 years ago)
- Default Branch: develop
- Last Pushed: 2024-09-13T12:03:54.000Z (almost 2 years ago)
- Last Synced: 2024-09-14T01:46:46.576Z (almost 2 years ago)
- Language: Java
- Size: 334 KB
- Stars: 0
- Watchers: 39
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Chaos Monkey
Chaos Monkey provides a convenient way to disrupt CDAP and hadoop services on a cluster.
Disruptions can be scheduled, randomized, or issued on command.
## Standalone Chaos Monkey
To start Chaos Monkey daemon and HTTP server, set configurations in chaos-monkey-site.xml and run ChaosMonkeyMain
### Configurations
**Disruptions setup**
>By default, the following disruptions will be available to each service:
>* start
>* restart
>* stop
>* terminate
>* kill
>* rolling-restart
>
>Custom disruptions can be added by extending the Disruption class and then associating them with a service.
>A custom disruption is started by calling ClusterDisruptor.disrupt(serviceName, disruptionName, actionArguments),
>where disruptionName is set by the Disruption.getName() method.
>Disruptions receive a collection of RemoteProcess based on the actionArguments, and can be used to execute commands
>via ssh. To add a custom disruption to a service:
>* {service}.disruptions - Class paths of custom disruptions, separated by commas
**Initialize a service for Chaos Monkey**
>Any configured service can be interacted with through ClusterDisruptor or REST endpoints. To configure a service for
chaos Monkey, either provide custom disruptions or a pid file for the default disruptions:
>* {service}.pidFile - Path to the .pid file of the service
**Configurations for scheduled disruptions**
>These additional properties can be set for a certain service to start a scheduled disruption:
>* {service}.interval - Number of seconds between each disruption
>* {service}.killProbability - Number between 0 to 1 representing chance of kill occurring each iteration.
>* {service}.stopProbability - Number between 0 to 1 representing chance of stop occurring each iteration.
>* {service}.restartProbability - Number between 0 to 1 representing chance of restart occurring each iteration.
>* {service}.minNodesPerIteration - Minimum number of nodes affected each iteration.
>* {service}.maxNodesPerIteration - Maximum number of nodes affected each iteration.
**Cluster information collector**
>By default, Chaos Monkey will retrieve cluster information from Coopr
>To get cluster information from Coopr, the following configurations need to be set:
>* cluster.info.collector.coopr.clusterId
>* cluster.info.collector.coopr.tenantId
>* cluster.info.collector.coopr.server.uri
>
>To get cluster information from other sources, include a plugin to implement ClusterInfoCollector and set the
following configs:
>* cluster.info.collector.class - classpath of the implementation of ClusterInfoCollector
>
>Additional properties can be passed in to the ClusterInfoCollector implementation. Setting the property
cluster.info.collector.{propertyName} in configurations will make {propertyName} available in the properties map,
passed in via the initialize method
**SSH configurations**
>username - username of SSH profile (if different from system user)
>keyPassphrase - passphrase for private key, if applicable
>privateKey - path to private key (will check default locations unless specified)
## HTTP endpoints
HTTP server is hosted on port 11020, with the following endpoints:
>**POST /v1/services/{service}/{action}**
>{action} includes stop, kill, terminate, start, restart, and rolling-restart
>The action, by default, will be performed on all nodes configured with the service. To specify affected nodes, include
ne of the following request bodies:
>```
>{
> nodes:[,...]
>}
>```
>```
>{
> percentage:
>}
>```
>```
>{
> count:
>}
>```
>In addition to the above request bodies, rolling restart can be also configured with:
>```
>{
> restartTime:
> delay:
>}
>```
>**GET /v1/nodes/{ip}/status**
>Get the status of all configured service on a given address
>**GET /v1/status**
>Get the status of all configured service on every node of the cluster