An open API service indexing awesome lists of open source software.

https://github.com/steadybit/extension-redis


https://github.com/steadybit/extension-redis

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

          

# Steadybit extension-redis

A [Steadybit](https://www.steadybit.com/) extension for Redis chaos engineering.

Learn about the capabilities of this extension in our [Reliability Hub](https://hub.steadybit.com/).

## Configuration

| Environment Variable | Required | Description |
|---------------------|----------|-------------|
| `STEADYBIT_EXTENSION_ENDPOINTS_JSON` | Yes | JSON array of Redis endpoint configurations |
| `STEADYBIT_EXTENSION_DISCOVERY_INTERVAL_INSTANCE_SECONDS` | No | Interval for instance discovery (default: 30) |
| `STEADYBIT_EXTENSION_DISCOVERY_INTERVAL_DATABASE_SECONDS` | No | Interval for database discovery (default: 60) |

### Endpoint Configuration

The `STEADYBIT_EXTENSION_ENDPOINTS_JSON` environment variable should contain a JSON array of Redis endpoint configurations:

```json
[
{
"url": "redis://localhost:6379",
"password": "optional-password",
"username": "optional-username",
"db": 0,
"name": "my-redis-instance",
"insecureSkipVerify": false
}
]
```

### TLS Configuration

For TLS connections, use the `rediss://` URL scheme:

```json
[
{
"url": "rediss://redis.example.com:6379",
"password": "secret",
"insecureSkipVerify": true
}
]
```

## Supported Targets

### Redis Instance

Discovers Redis instances and exposes attributes like:
- `redis.host` - Redis host
- `redis.port` - Redis port
- `redis.version` - Redis version
- `redis.role` - Instance role (master/replica)
- `redis.cluster.enabled` - Cluster mode status

### Redis Database

Discovers Redis databases (db0-db15) and exposes:
- `redis.database.index` - Database index
- `redis.database.keys` - Key count in database
- `redis.database.name` - Database name (e.g., "db0")

## Supported Actions

### Attacks

#### Exhaust Connections
- **ID**: `com.steadybit.extension_redis.instance.connection-exhaustion`
- **Target**: Instance
- **Description**: Opens many connections to test connection limit handling
- **Parameters**:
- `duration` - How long to hold connections
- `numConnections` - Number of connections to open (default: 100)

#### Pause Clients
- **ID**: `com.steadybit.extension_redis.instance.client-pause`
- **Target**: Instance
- **Description**: Suspends all client command processing using CLIENT PAUSE
- **Parameters**:
- `duration` - How long to pause clients
- `pauseMode` - ALL (all commands) or WRITE (write commands only)
- **Reversibility**: Auto-reverts after timeout

#### Limit MaxMemory
- **ID**: `com.steadybit.extension_redis.instance.maxmemory-limit`
- **Target**: Instance
- **Description**: Reduces Redis maxmemory to force evictions or OOM errors
- **Parameters**:
- `duration` - How long to apply the limit
- `maxmemory` - Memory limit (e.g., "10mb", "1gb")
- `evictionPolicy` - noeviction, allkeys-lru, allkeys-lfu, volatile-lru, volatile-ttl, or keep original
- **Reversibility**: Fully reversible - restores original settings on stop

#### Force Cache Expiration
- **ID**: `com.steadybit.extension_redis.database.cache-expiration`
- **Target**: Database
- **Description**: Sets TTL on string keys matching a pattern to force expiration (non-string keys are skipped)
- **Parameters**:
- `duration` - Attack duration (for tracking)
- `pattern` - Key pattern to match (only string keys are affected)
- `ttl` - TTL in seconds before keys expire (default: 5)
- `maxKeys` - Maximum keys to affect (default: 100)
- `restoreOnStop` - Restore keys with original values and TTLs when attack stops (default: false)
- **Reversibility**: Reversible when `restoreOnStop` is enabled - recreates expired keys with original values and TTLs

#### Stop Sentinel
- **ID**: `com.steadybit.extension_redis.instance.sentinel-stop`
- **Target**: Instance
- **Description**: Stops a Redis Sentinel server using DEBUG SLEEP, making it unresponsive to all clients and other Sentinels
- **Parameters**:
- `duration` - How long the Sentinel should be unresponsive (default: 30s)
- **Reversibility**: Auto-recovers after the sleep duration

### Checks

#### Memory Usage Check
- **ID**: `com.steadybit.extension_redis.instance.check-memory`
- **Target**: Instance
- **Description**: Monitors Redis memory usage and fails if threshold exceeded
- **Parameters**:
- `duration` - Monitoring duration
- `maxMemoryPercent` - Max memory as % of maxmemory (default: 80%)
- `maxMemoryBytes` - Max memory in MB (optional)

#### Latency Check
- **ID**: `com.steadybit.extension_redis.instance.check-latency`
- **Target**: Instance
- **Description**: Monitors Redis response latency
- **Parameters**:
- `duration` - Monitoring duration
- `maxLatencyMs` - Maximum allowed latency in ms (default: 100)

#### Connection Count Check
- **ID**: `com.steadybit.extension_redis.instance.check-connections`
- **Target**: Instance
- **Description**: Monitors connected clients and fails if threshold exceeded
- **Parameters**:
- `duration` - Monitoring duration
- `maxConnectionsPct` - Max connections as % of maxclients (default: 80%)
- `maxConnections` - Absolute max connections (optional)

#### Replication Lag Check
- **ID**: `com.steadybit.extension_redis.instance.check-replication`
- **Target**: Instance
- **Description**: Monitors Redis replication status and lag for replicas
- **Parameters**:
- `duration` - Monitoring duration
- `maxLagSeconds` - Maximum allowed replication lag (default: 10s)
- `requireLinkUp` - Fail if master link is down (default: true)

## Demo Environment & Chaos Experiments

A complete demo environment with a sample application and chaos engineering experiments is available in the `demo/` directory.

### Quick Start

```bash
cd demo
docker-compose up -d
```

This starts:
- **redis-master**: Primary Redis (port 6379)
- **redis-replica**: Replica for HA testing (port 6380)
- **demo-app**: Sample app with caching (port 3400)
- **load-generator**: Continuous traffic generator

### Chaos Experiments

See [demo/CHAOS_EXPERIMENTS.md](demo/CHAOS_EXPERIMENTS.md) for detailed chaos engineering scenarios including:

**User-Facing Scenarios:**
- Cache unavailability impact
- Session loss handling
- Slow cache response
- Cache stampede (thundering herd)

**SRE/Platform Scenarios:**
- Connection pool exhaustion
- Memory pressure & eviction
- Replication lag
- Redis failover

## Installation

### Helm

```bash
helm repo add steadybit https://steadybit.github.io/helm-charts
helm repo update
helm install steadybit-extension-redis steadybit/steadybit-extension-redis \
--set redis.auth.managementEndpoints='[{"url":"redis://redis:6379"}]'
```

### Docker

```bash
docker run -d \
-e STEADYBIT_EXTENSION_ENDPOINTS_JSON='[{"url":"redis://redis:6379"}]' \
-p 8083:8083 \
ghcr.io/steadybit/extension-redis:latest
```

## Development

### Prerequisites

- Go 1.25+
- Redis instance for testing
- Docker (for demo environment)

### Build

```bash
make build
```

### Test

```bash
# Unit tests only
go test ./clients/... ./config/... ./extredis/... -v

# All tests including e2e (requires minikube)
make test
```

### Run locally

```bash
# Start Redis
./scripts/start-redis.sh

# Run extension
export STEADYBIT_EXTENSION_ENDPOINTS_JSON='[{"url":"redis://localhost:6379","password":"dev-password"}]'
make run
```

## License

MIT License - see [LICENSE](LICENSE) for details.