Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kpetremann/claw-network
Simulate network failures in a datacenter topology
https://github.com/kpetremann/claw-network
infra infrastructure maintenance network topology
Last synced: 11 days ago
JSON representation
Simulate network failures in a datacenter topology
- Host: GitHub
- URL: https://github.com/kpetremann/claw-network
- Owner: kpetremann
- License: mit
- Created: 2022-05-16T09:09:45.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-19T12:47:38.000Z (7 months ago)
- Last Synced: 2024-10-11T14:14:27.791Z (about 1 month ago)
- Topics: infra, infrastructure, maintenance, network, topology
- Language: Go
- Homepage:
- Size: 1.47 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[![status](https://img.shields.io/badge/status-in%20development-orange)](https://github.com/kpetremann/claw-network/)
[![Go](https://img.shields.io/github/go-mod/go-version/kpetremann/claw-network)](https://github.com/kpetremann/claw-network/)
[![CI](https://github.com/kpetremann/claw-network/actions/workflows/go.yml/badge.svg)](https://github.com/kpetremann/claw-network/actions/workflows/go.yml)
[![GitHub](https://img.shields.io/github/license/kpetremann/claw-network)](https://github.com/kpetremann/claw-network/blob/main/LICENSE)# Overview
ClawNetwork is a tool to simulate a network and evaluate failures impacts on Top of Racks.
It has been specially crafted for Clos Matrix network. For now, cyclic graphs are not supported. Only trees are.
```
ClawNetwork is in active development.
The main features are implemented, but endpoints, structures and functions may change.But it is now usable and you are free to play with it :)
```# Usecases
## Operations
The main usecase it to evaluate if an operation on a device in your core network will impact a Top of Rack.
Concerned operations can be: upgrade, reboot, risky maintenance etc...
## Detect anomalies / SPOF
ClawNetwork can be leveraged to detect SPOF of any anomalies such as spine without downlinks.
# Quickstart
## From source
Simply run ClawNetwork app using `go run .`
Alternative: build the binary via `go build` and run it.
## Using Docker compose
### Default backend
Run ClawNetwork with default backend (FileRepository):
```shell
docker-compose -f compose/docker-compose.yml up -d
```FileRepository stores the topologies in dedicated JSON files on the disk.
By default, this uses `examples/` directory provided in this repository.
> At the moment this is not customizable, but it will be very soon.
### Run with the Backend of your choice
```shell
docker-compose -f compose/docker-compose.yml -f .yml up -d
```#### RedisJSON
> recommended backend for production if you need to store topologies
At the moment, Redis JSON is the only alternative backend:
```shell
docker-compose -f compose/docker-compose.yml -f redisjson.yml up -d
```This backend leverages [RedisJSON module](https://redis.io/docs/stack/json/) to store pure JSON to Redis. Persistence is enabled and forced at each changes (ADD/DELETE) by ClawNetwork.
## Configuration
Configuration can be configured either via environment variables or YAML file (settings.yaml).
List of parameters available (`varenv format` | `YAML format`):
- `CLAW_LISTENADDRESS` | `ListenAddress`: ClawNetwork API listen address (default: `"0.0.0.0"`)
- `CLAW_LISTENPORT` | `ListenPort`: ClawNetwork API listen port (default: `"8080"`)
- `CLAW_TOPDEVICEROLE` | `TopDeviceRole`: Role of device at the top of the topology graph (default: `"edge"`)
- `CLAW_BOTTOMDEVICEROLE` | `BottomDeviceRole`: Role of device at the Bottom of the topology graph (default: `"tor"`)
- `CLAW_BACKEND` | `Backend`: Choose backend to store topologies (choices: `"file", "redis"`, default: `"file"`)
- `CLAW_BACKENDS_FILE_PATH` | `Backends.Redis.Path`: Redis DB to use (default: `"./topologies/"`)
- `CLAW_BACKENDS_REDIS_HOST` | `Backends.Redis.Host`: Redis server address (default: `"localhost"`)
- `CLAW_BACKENDS_REDIS_PORT` | `Backends.Redis.Port`: Redis server port (default: `"6379"`)
- `CLAW_BACKENDS_REDIS_PASSWORD` | `Backends.Redis.Password`: Redis password (default: `""`)
- `CLAW_BACKENDS_REDIS_DB` | `Backends.Redis.DB`: Redis DB to use (default: `0`)# Usage
### Manage stored topologies
- GET `/topology`: list stored topologies
- GET `/topology/:topology_name`: get topology definition
- POST `/topology/:topology_name`: create a new topology
- DELETE `/topology/:topology_name`: delete a topology
- GET `/topology/details`: list stored topologies with some stats
- GET `/topology/:topology_name/details`: get topology stats### Simulation on a stored topology
- GET `/topology/:topology_name/device/:device/down/impact`: run simulations on existing topology
- POST `/topology/custom/device/:device/down/impact`: run simulations on topology provided in the request bodyIt will run a simulation on a stored topology.
If `:device` is set to `each`, it will simulate failure impact of each devices excluding Top of Racks.
### Anomaly detection
- GET `/topology/:topology_name/anomalies`: get topology anomalies
It list all anomalies in the topology graph.
#### Link anomalies
A node is not connected properly to the graph.
For example:
- a ToR does not have any uplinks
- a spine does not have any downlinks or any uplinks
- an edge does not have any downlinksThis does not consider the status of the link, it only checks if there is a link.
### Topology structure
The topology to provide looks like this in JSON:
```json
{
"nodes": [
{
"hostname": "tor-01-01",
"role": "tor",
"status": true,
"layer": 1
},
{
"hostname": "fabric-1-01",
"role": "fabric",
"status": true,
"layer": 2
}
],
"links": [
{
"south_node": "tor-01-01",
"north_node": "fabric-1-01",
"status": true,
"uid": "10.0.0.0->10.0.0.1"
}
]
}
```> This structure is subject to change, as the API is not considered stable at the moment
### Example
Topology = 4 healthy fabric nodes + 4 healthy ToR
Simulations:
- first simulation considering first fabric node as down
- second simulation considering second fabric node as down but with the first up
- ...# Example usecase
You can query the following endpoint to simulate down impact of each devices. It get the tppology example from the `example/full_topology_with_issues.json`.
```shell
$ curl http://127.0.0.1:8080/topology/full_topology_with_issues/device/each/down/impact | jq
{
"scenarios_result": {
"edge-0": {
"impacts": null,
"parameters": {
"devices_down": [
"edge-0"
],
"links_down": null
}
},
"edge-1": {
"impacts": null,
"parameters": {
"devices_down": [
"edge-1"
],
"links_down": null
}
},
"fabric-1-01": {
"impacts": [
"tor-01-01"
],
"parameters": {
"devices_down": [
"fabric-1-01"
],
"links_down": null
}
},
...,
"compute_time": "89 ms"
}
```As you can see, `tor-01-01` would be down if we shut `fabric-1-01`.
The topology defined in `example/full_topology_with_issues.json`, also specifies some devices as down. Here all the fabric of pod 01 has been set to down except for `fabric-1-01`. This is why if there is a failure on this device, it will impact `tor-01-01` as this ToR only had one healthy uplink.
Note: more advanced examples will be provided soon, with more complex scenarios.
# Integrations
Below some ideas of possible integrations:
- the client push the topology with the simulation request. The topology is not stored.
```
+-------------------------+
| Observability metrics |
| example: Prometheus |
+-------------------------+
^
|
| get metrics
|
|
|
+-----------------------+
| | get impact
| Client | on custom topology +---------------+
| => convert metrics |--------------------------------->| ClawNetwork |
| to topology | +---------------+
+-----------------------+
```- the client provides the topologies and they are stored
```
+-------------------------+
| Observability metrics |
| example: Prometheus |
+-------------------------+
^
|
| get metrics
|
|
|
+-----------------------+
| Client | push topology +---------------+ save topology +-------------------------+
| => convert metrics |------------------------->| ClawNetwork |<------------------------>| Storage (FS, redis,...) |
| to topology | get impact +---------------+ get topology +-------------------------+
+-----------------------+
```- dedicated topology provider
```
+---------------------+
+-------------------------+ | Topology provider |
| Observability metrics | <------------------- | => convert metrics |
+-------------------------+ | to topology |
+---------------------+
|
|
| push topology
|
|
|
v
+-----------------------+ get impact +---------------+ save topology +-------------------------+
| Client |------------------------->| ClawNetwork |<------------------------>| Storage (FS, redis,...) |
+-----------------------+ +---------------+ get topology +-------------------------+
```