An open API service indexing awesome lists of open source software.

https://github.com/libraz/coredns-dynresolve

CoreDNS plugin for dynamic DNS resolution via REST API — simple, fail-safe service discovery without etcd or Kubernetes.
https://github.com/libraz/coredns-dynresolve

Last synced: 10 days ago
JSON representation

CoreDNS plugin for dynamic DNS resolution via REST API — simple, fail-safe service discovery without etcd or Kubernetes.

Awesome Lists containing this project

README

          

# coredns-dynresolve

[![CI](https://img.shields.io/github/actions/workflow/status/libraz/coredns-dynresolve/ci.yml?branch=main&label=CI)](https://github.com/libraz/coredns-dynresolve/actions)
[![Version](https://img.shields.io/github/v/release/libraz/coredns-dynresolve?label=version)](https://github.com/libraz/coredns-dynresolve/releases)
[![codecov](https://codecov.io/gh/libraz/coredns-dynresolve/branch/main/graph/badge.svg)](https://codecov.io/gh/libraz/coredns-dynresolve)
[![License](https://img.shields.io/github/license/libraz/coredns-dynresolve)](https://github.com/libraz/coredns-dynresolve/blob/main/LICENSE)
[![Go](https://img.shields.io/badge/Go-1.25+-00ADD8?logo=go)](https://go.dev/)
[![Platform](https://img.shields.io/badge/platform-Linux%20%7C%20macOS-lightgrey)](https://github.com/libraz/coredns-dynresolve)

A [CoreDNS](https://coredns.io/) plugin for dynamic DNS resolution driven by a REST API.

Enables DNS-based service failover in on-premise environments without etcd, Consul, or Kubernetes. An external controller pushes service state via authenticated HTTP endpoints; the plugin reads that state and serves DNS.

## Background

Service failover via DNS typically requires one of:

- Kubernetes DNS + Service
- CoreDNS + etcd plugin
- Consul catalog

All of these assume a distributed system is already in place. In transitional on-premise environments — where Kubernetes is not yet deployed and the operational cost of a distributed KV store is not justified — these options are too heavy.

This plugin takes a simpler approach: an external controller (a script, systemd timer, monitoring agent, or CI pipeline) pushes service state to the plugin via a REST API. Health checking, failover decisions, and state generation are outside the plugin's scope — they belong to the external controller.

Design constraints:

- ServeDNS reads from in-memory state only (non-blocking)
- DNS continues to respond when the API is unreachable (fail-safe)
- Plugin bugs do not crash CoreDNS (panic-safe)
- A records only; other query types are delegated to the next plugin

## Architecture

```mermaid
graph LR
subgraph External
C[External Controller]
C -->|HTTP PUT| API
end

subgraph "CoreDNS + dynresolve"
API[APIServer
auth + rate limit + TLS] -->|write| SS[(StateStore)]
SS --> SD[ServeDNS]
CA[(Cache)] <--> SD
FB[Fallback IP] --> SD
NP[Next Plugin] --> SD
end

CL[DNS Clients] -->|query| SD
SD -->|response| CL
```

StateStore is the shared in-memory state. The APIServer writes to it; ServeDNS reads from it via the Cache.

## Fallback Chain

Each query is resolved in the following order:

```mermaid
flowchart TD
Q[DNS Query] --> CC{Cache fresh?}
CC -->|Yes| R1[Respond from cache]
CC -->|No| SL{StateStore
has data?}
SL -->|Yes| UC[Update cache] --> R2[Respond from source]
SL -->|No| SC{Stale cache?}
SC -->|Yes| R3[Respond stale]
SC -->|No| FI{Fallback IP?}
FI -->|Yes| R4[Respond fallback]
FI -->|No| NX[Next Plugin]

style R1 fill:#2d6,stroke:#1a4,color:#fff
style R2 fill:#2d6,stroke:#1a4,color:#fff
style R3 fill:#da2,stroke:#a80,color:#fff
style R4 fill:#da2,stroke:#a80,color:#fff
style NX fill:#68f,stroke:#46c,color:#fff
```

## Quick Start

### Corefile

```
service.local {
dynresolve {
api listen 127.0.0.1:8080
api token
api allow 127.0.0.0/8
api rate-limit 10
ttl 5
cache 100ms
fallback 10.0.0.1
}
errors
}
```

With TLS:

```
service.local {
dynresolve {
api listen 0.0.0.0:8443
api token {env.DYNRESOLVE_TOKEN}
api allow 10.0.0.0/8
api rate-limit 10
api tls /etc/coredns/tls/cert.pem /etc/coredns/tls/key.pem
ttl 5
cache 100ms
fallback 10.0.0.1
}
errors
}
```

### Build

```bash
make build
```

### Verify

```bash
./coredns -conf Corefile

# Push state via API (short name — zone suffix is appended automatically)
curl -X PUT -H "Authorization: Bearer " \
-d '{"type":"A","records":["10.0.0.12"],"ttl":5}' \
http://127.0.0.1:8080/v1/services/valkey

# Resolve via DNS
dig @127.0.0.1 valkey.service.local A +short
# 10.0.0.12
```

## Configuration

### Core

| Directive | Default | Description |
|---|---|---|
| `ttl ` | `5` | Default DNS response TTL |
| `cache ` | `100ms` | In-memory cache TTL (supports stale serving) |
| `fallback ` | — | IPv4 address returned when all other sources fail |

### REST API (required)

| Directive | Default | Description |
|---|---|---|
| `api listen ` | required | Listen address (e.g., `127.0.0.1:8080`) |
| `api token ` | required | Bearer token. Supports `$ENV_VAR` and `{env.VAR}` syntax |
| `api allow ` | allow all | Allowed client CIDRs (e.g., `127.0.0.0/8 10.0.0.0/8`) |
| `api rate-limit ` | `10` | Maximum write requests per second |
| `api tls ` | — | TLS certificate and key paths. Enables HTTPS (TLS 1.2+) |
| `api persist ` | — | File path for state persistence. On each write, state is saved to disk (atomic write). On startup, state is restored from this file if it exists. Disabled by default (in-memory only) |

### API Endpoints

| Method | Path | Description |
|---|---|---|
| `GET` | `/v1/services` | List all services |
| `GET` | `/v1/services/{name}` | Get a single service |
| `PUT` | `/v1/services/{name}` | Add or update a service |
| `DELETE` | `/v1/services/{name}` | Delete a service |
| `POST` | `/v1/services/{name}/records` | Add records to a service |
| `DELETE` | `/v1/services/{name}/records` | Remove records from a service |
| `PUT` | `/v1/state` | Bulk replace all services |

All endpoints require `Authorization: Bearer ` header.

Write endpoints (`PUT`, `POST`, `DELETE`) are subject to rate limiting. Read endpoints (`GET`) are not.

Service names can be specified as short names (e.g., `valkey`) or full names (e.g., `valkey.service.local`). The zone suffix is appended automatically based on the CoreDNS server block. Responses include both `name` (short) and `fqdn` (full qualified).

### State Format

Used by `PUT /v1/services/{name}` and `PUT /v1/state`.

Single service:

```json
{"type": "A", "records": ["10.0.0.12"], "ttl": 5}
```

Response:

```json
{"name": "valkey", "fqdn": "valkey.service.local", "type": "A", "records": ["10.0.0.12"], "ttl": 5}
```

Bulk state request (short names):

```json
{
"services": {
"valkey": {"type": "A", "records": ["10.0.0.12"], "ttl": 5},
"web": {"type": "A", "records": ["10.0.0.20", "10.0.0.21"], "ttl": 10}
}
}
```

Bulk state response:

```json
{"zone": "service.local", "count": 2}
```

List services response (`GET /v1/services`):

```json
{
"zone": "service.local",
"services": {
"valkey": {"type": "A", "records": ["10.0.0.12"], "ttl": 5},
"web": {"type": "A", "records": ["10.0.0.20", "10.0.0.21"], "ttl": 10}
}
}
```

| Field | Specification |
|---|---|
| `type` | `"A"` only. Other types are rejected with 400 |
| `records` | IPv4 addresses. Invalid entries are rejected with 400 |
| `ttl` | Seconds. `0` falls back to the plugin-level default |
| names | Short name or FQDN (without trailing dot). Zone suffix appended automatically |

### Record-Level Operations

Used by `POST /v1/services/{name}/records` (add) and `DELETE /v1/services/{name}/records` (remove).

```json
{"records": ["10.0.0.13", "10.0.0.14"], "ttl": 5}
```

| Field | Specification |
|---|---|
| `records` | IPv4 addresses to add or remove |
| `ttl` | (add only) Optional. Updates TTL if non-zero |

`POST` creates the service if it does not exist. Duplicate records are ignored.

`DELETE` removes the specified records. If all records are removed, the service is deleted (returns 204).

## Design Principles

| Principle | Detail |
|---|---|
| Persist-safe | Write errors to the persist file are logged but do not affect API responses |
| Fail-safe | Serves stale data or fallback IP on source failure. DNS never stops responding |
| Non-blocking | ServeDNS reads from in-memory StateStore only. HTTP I/O runs in a separate goroutine |
| Panic-safe | ServeDNS includes `recover()`. Plugin bugs do not crash CoreDNS |
| A records only | Other query types are delegated to the next plugin |
| Separation of concerns | Health check and failover logic belong to the external controller |

## Security

### DNS

Each CoreDNS server block creates an independent plugin instance with its own StateStore. Zone restriction applies to both DNS queries and API writes.

```
service.local {
dynresolve { ... } # Instance A — only service.local names
}

infra.local {
dynresolve { ... } # Instance B — only infra.local names
}
```

### REST API

| Layer | Mechanism |
|---|---|
| Transport | `api tls` enables HTTPS (TLS 1.2+). Required when listening on non-loopback addresses |
| Network | `api listen` binds to a specific address. Use `127.0.0.1` for local-only access |
| IP restriction | `api allow ` restricts by source IP. Disallowed IPs receive 403 |
| Authentication | `api token` requires `Authorization: Bearer ` header. Constant-time comparison. Invalid tokens receive 401 |
| Rate limiting | `api rate-limit ` limits write operations per second. Excess requests receive 429 |
| Zone enforcement | Service names are automatically scoped to the CoreDNS server block zone. Short names are expanded with the zone suffix |
| Name validation | Service names must be valid DNS labels (`[a-z0-9-]`, 1-63 chars per label, max 253 chars total) |
| Record limit | Maximum 64 A records per service. Excess records are rejected with 400 |
| Body size | 1 MB (single service) / 10 MB (bulk state) |
| Audit logging | All write operations (PUT/POST/DELETE) are logged with source IP, service name, and record values |

### Token Management

The token can be specified as:

- Plaintext in Corefile: `api token mysecret`
- Environment variable: `api token $DYNRESOLVE_TOKEN` or `api token {env.DYNRESOLVE_TOKEN}`

The environment variable form is recommended for production to avoid storing secrets in configuration files.

### Known Limitations

- **Single static token.** Token rotation requires Corefile edit + CoreDNS restart. Per-client tokens are not supported. For multi-client access control, use a reverse proxy.
- **No replay protection.** Captured requests can be replayed. TLS mitigates network-level capture. For stronger guarantees, use mTLS via a reverse proxy.
- **`PUT /v1/state` is a full replacement.** Replaces all services within the zone. Zone validation applies to each entry. For granular changes, use `PUT /v1/services/{name}`, `POST /v1/services/{name}/records`, or `DELETE /v1/services/{name}/records`.

## Building

Default base: **CoreDNS v1.12.1**. Override with `make build COREDNS_VERSION=v1.x.x`.

```bash
git clone https://github.com/libraz/coredns-dynresolve.git
cd coredns-dynresolve
make build
```

### Makefile Targets

| Target | Description |
|---|---|
| `make build` | Build custom CoreDNS binary with dynresolve |
| `make test` | Run unit tests with race detector |
| `make lint` | Run golangci-lint |
| `make coverage` | Run tests with coverage report |
| `make integration-test` | Run integration tests (requires `make build`) |
| `make clean` | Remove build artifacts |

### RPM / DEB Packages

```bash
make pkg-rpm-el9 # AlmaLinux 9
make pkg-rpm-el10 # AlmaLinux 10
make pkg-deb-jammy # Ubuntu 22.04
make pkg-deb-noble # Ubuntu 24.04
make pkg-all # All of the above
```

## Testing

### Unit Tests

```bash
make test
```

Covers: config parsing, validation, cache, fallback chain, panic recovery, API handlers, middleware (auth/IP/rate limit), StateStore concurrency.

### Integration Tests

39 tests. Starts a real CoreDNS process and verifies the full stack via `dns.Exchange` and HTTP.

```bash
make build
make integration-test
```

| Category | Coverage |
|---|---|
| Basic operations | PUT then resolve, multiple records, fallback, TCP, EDNS0, delete |
| API functionality | GET/LIST/PUT/DELETE, bulk replace, validation errors |
| Short names | PUT/GET/DELETE with short names, full name interop, bulk state with short names, zone in responses |
| Record-level | Add records, add to new service, deduplication, partial remove, full remove (service deletion), not found, invalid IP, empty records |
| Security | Auth required, wrong token, invalid IP, invalid type |
| Concurrency | Parallel queries, concurrent API writes + DNS reads |
| Scale | 50 records per service, 200 services bulk |
| Persistence | PUT creates file, restore on restart, delete updates file |
| Edge cases | Case sensitivity, unsupported query types, cross-contamination, default TTL |

Integration tests use the `integration` build tag and are excluded from CI.

## License

[Apache License 2.0](LICENSE)