https://github.com/astrateam-net/k8s-cluster
https://github.com/astrateam-net/k8s-cluster
kubesearch
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/astrateam-net/k8s-cluster
- Owner: astrateam-net
- License: mit
- Created: 2025-04-08T20:11:30.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-01-19T22:32:39.000Z (4 months ago)
- Last Synced: 2026-01-20T00:41:34.540Z (4 months ago)
- Topics: kubesearch
- Language: YAML
- Homepage:
- Size: 49.3 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 72
-
Metadata Files:
- Readme: README-DEBUG.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Kubernetes Cluster Debugging Guide
## Common Issues and Solutions
### 1. Architecture Mismatch (exec format error)
**Problem**: Pods crash with `exec /nfsplugin: exec format error` or similar errors.
**Root Cause**: Images are downloaded with wrong architecture (e.g., ARM instead of AMD64).
**Solution**:
1. Find the correct digest for AMD64:
```bash
# For nfsplugin example:
curl -sL -H "Accept: application/vnd.docker.distribution.manifest.list.v2+json" \
https://registry.k8s.io/v2/sig-storage/nfsplugin/manifests/v4.12.1 | \
python3 -c "import sys, json; data = json.load(sys.stdin); amd64 = [m for m in data['manifests'] if m['platform']['architecture'] == 'amd64'][0]; print(amd64['digest'])"
```
2. Update HelmRelease with specific digest:
```yaml
values:
nfs: # or appropriate section
image:
repository: registry.k8s.io/sig-storage/nfsplugin
tag: v4.12.1@sha256:8835d96f4389cb1ea8996b4bf9cad8af960f52508c9b8372a877c6656fd92264
```
3. Delete incorrect images on all nodes:
```bash
# See section "Debug Nodes"
```
### 2. Spegel Cache Issues
**Problem**: Spegel serves wrong image digests even after clearing.
**Solution**:
```bash
# Delete all Spegel pods to clear cache
kubectl delete pod -n kube-system -l app=spegel
# Wait for new pods to restart
kubectl get pods -n kube-system | grep spegel
```
## Debugging Methods
### Launch Debug Pods on All Nodes
**Quick Method**:
```bash
# Apply debug pods to all nodes
kubectl apply -f kubernetes/temp/debug-all-nodes.yaml
# Wait for pods to start
kubectl get pods | grep debug
```
**Manual Method**:
```bash
for node in k8s-01 k8s-02 k8s-03; do
kubectl run debug-$node --image=raesene/alpine-containertools:latest \
--node-selector kubernetes.io/hostname=$node \
--restart=Never \
--overrides='{
"spec": {
"hostNetwork": true,
"containers": [{
"name": "debugger",
"image": "raesene/alpine-containertools:latest",
"command": ["sleep", "3600"],
"env": [{"name": "CONTAINER_RUNTIME_ENDPOINT", "value": "unix:///host/run/containerd/containerd.sock"}],
"securityContext": {"privileged": true},
"volumeMounts": [{"name": "sock", "mountPath": "/host/run/containerd"}]
}],
"volumes": [{"name": "sock", "hostPath": {"path": "/run/containerd", "type": "Directory"}}]
}
}'
done
```
### Common Debug Commands
**List container images on a node**:
```bash
# Replace debug-k8s-01 with appropriate pod name
kubectl exec debug-k8s-01 -- crictl images
```
**Find and remove specific images**:
```bash
# List images containing "sig-storage"
kubectl exec debug-k8s-01 -- crictl images | grep sig-storage
# Remove specific image by ID
kubectl exec debug-k8s-01 -- crictl rmi
# Remove all images matching pattern
kubectl exec debug-k8s-01 -- sh -c 'crictl images | grep sig-storage | awk "{print \$3}" | xargs -r crictl rmi'
```
**Check image architecture**:
```bash
# Check manifest architecture
kubectl exec debug-k8s-01 -- crictl inspecti | grep architecture
```
**Remove images on all nodes**:
```bash
for pod in debug-k8s-01 debug-k8s-02 debug-k8s-03; do
echo "=== $pod ==="
kubectl exec $pod -- sh -c 'crictl images | grep '
kubectl exec $pod -- sh -c 'crictl images | grep | awk "{print \$3}" | xargs -r crictl rmi'
done
```
## Using the debug-node Taskfile Task
```bash
# Open interactive debug shell on a specific node
task talos:debug-node NODE=k8s-01
# Inside the debug shell, you can use:
# - crictl images # List images
# - crictl rmi # Remove image
# - crictl rmi --prune # Remove all unused images
# - uname -m # Check node architecture
```
## When to Use Specific Image Digests
Always specify image digests when:
1. Cluster has nodes with different architectures (AMD64, ARM, etc.)
2. Spegel serves wrong images from cache
3. Image repository has multi-architecture manifests
4. You encounter "exec format error"
**How to find correct digest**:
1. Query the registry for manifest list
2. Filter for target architecture (usually amd64)
3. Extract the digest
4. Append to image tag: `image:tag@sha256:digest`
## Troubleshooting Checklist
- [ ] Check pod logs for error messages
- [ ] Verify pod is running on correct node
- [ ] Check image digest matches expected architecture
- [ ] Clear Spegel cache if serving wrong images
- [ ] Remove incorrect images from all nodes
- [ ] Verify HelmRelease values specify correct image with digest
- [ ] Check node architecture matches image architecture
- [ ] Restart affected pods after image fixes