{"id":27397988,"url":"https://github.com/yafeiaa/coredog","last_synced_at":"2026-01-12T08:26:49.928Z","repository":{"id":230192669,"uuid":"778614766","full_name":"yafeiaa/coredog","owner":"yafeiaa","description":"🐶Coredog is an open-source project designed to monitor and manage core dumps in a Kubernetes cluster. It automatically detects core files generated by applications running in the cluster, uploads them to an S3-compatible object storage system, and provides pre-signed S3 download links to all developers through instant messaging software.","archived":false,"fork":false,"pushed_at":"2024-08-21T12:09:11.000Z","size":320,"stargazers_count":4,"open_issues_count":8,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-14T01:32:06.363Z","etag":null,"topics":["corefile","kubernetes","monitor"],"latest_commit_sha":null,"homepage":"https://hub.docker.com/r/coderflyfyf/coredog","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yafeiaa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-28T03:50:08.000Z","updated_at":"2024-09-03T06:14:05.000Z","dependencies_parsed_at":"2024-06-19T12:40:52.007Z","dependency_job_id":"8d961d13-2bc4-4342-b1d1-a10a004b6885","html_url":"https://github.com/yafeiaa/coredog","commit_stats":null,"previous_names":["dominecore/coredog","yafeiaa/coredog"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/yafeiaa/coredog","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafeiaa%2Fcoredog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafeiaa%2Fcoredog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafeiaa%2Fcoredog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafeiaa%2Fcoredog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yafeiaa","download_url":"https://codeload.github.com/yafeiaa/coredog/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafeiaa%2Fcoredog/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264721602,"owners_count":23653965,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["corefile","kubernetes","monitor"],"created_at":"2025-04-14T01:20:38.363Z","updated_at":"2026-01-05T10:08:51.933Z","avatar_url":"https://github.com/yafeiaa.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CoreDog 🐶\n\nKubernetes Core Dump 自动收集系统\n\n## 简介\n\nCoreDog 在应用崩溃时自动收集 core dump 文件，上传到对象存储，并发送通知。\n\n**核心特性**：\n- 🎯 Webhook 自动注入 volume（opt-in，通过 annotation 开启）\n- 🔍 精准识别崩溃的 Pod 和容器\n- 📦 自动上传到 S3/COS/OSS\n- 🧹 上传后自动清理本地文件\n- 🔔 企业微信/Slack 即时通知\n- 🔗 [可选] 自动上报到 CoreSight，触发自动分析\n\n## 快速开始\n\n### 1. 安装 CoreDog\n\n```bash\n# 编辑配置\nvim charts/values.yaml\n# 填写 S3 凭证和通知渠道（见下方配置说明）\n\n# 安装\nhelm install coredog ./charts -n coredog-system --create-namespace\n```\n\n### 2. 配置节点\n\n**在每个 Kubernetes 节点上执行**：\n\n```bash\nsudo su -\n\n# ⚠️ 重要：路径要与容器内的挂载路径一致\n# 如果 coredog.io/path=\"/corefile\"，则配置为：\necho '/corefile/core.%e.%p.%h.%t' \u003e /proc/sys/kernel/core_pattern\n\n# 持久化\necho 'kernel.core_pattern=/corefile/core.%e.%p.%h.%t' \u003e\u003e /etc/sysctl.conf\nsysctl -p\n\n# 验证\ncat /proc/sys/kernel/core_pattern\n# 应该输出: /corefile/core.%e.%p.%h.%t\n```\n\n**说明**：\n- 容器内挂载到 `/corefile`\n- 内核配置也是 `/corefile/core.xxx`\n- 由于 hostPath volume 映射，文件实际写到宿主机的 `/data/coredog-system/dumps/\u003cns\u003e/\u003cpod\u003e/core.xxx`\n\n### 3. 应用接入\n\n在您的应用 Deployment/StatefulSet 中添加 annotations：\n\n```yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: my-app\nspec:\n  template:\n    metadata:\n      annotations:\n        # ⚠️ 必填：开启 CoreDog\n        coredog.io/inject: \"true\"\n        # ⚠️ 必填：指定挂载路径\n        coredog.io/path: \"/corefile\"\n        # 可选：指定要监控的容器（不填则所有容器）\n        coredog.io/container: \"app\"\n    spec:\n      containers:\n      - name: app\n        image: my-app:v1\n        command:\n          - bash\n          - -c\n          - |\n            ulimit -c unlimited  # ⚠️ 必须设置\n            exec /app/server\n```\n\n**就这么简单！** 应用崩溃时会自动收集 core dump。\n\n## 配置说明\n\n### 存储方案选择\n\nCoreDog 支持三种存储后端：\n\n| 方案 | Protocol | 优点 | 适用场景 |\n|-----|----------|------|--------|\n| **S3** | `s3` | 标准 S3 API，兼容性强 | AWS 用户，通用场景 |\n| **COS** | `cos` | 腾讯云原生，性能优化 | 腾讯云环境 |\n| **CFS** | `cfs` | 文件存储，支持 POSIX 接口 | 需要文件系统语义，本地/专线上传 |\n\n#### S3 配置示例\n\n```yaml\nStorageConfig:\n  protocol: s3\n  s3AccesskeyID: \"your_ak\"\n  s3SecretAccessKey: \"your_sk\"\n  s3Region: \"us-east-1\"\n  S3Bucket: \"my-bucket\"\n  S3Endpoint: \"s3.amazonaws.com\"\n```\n\n#### COS 配置示例\n\n```yaml\nStorageConfig:\n  protocol: cos\n  s3AccesskeyID: \"your_ak\"\n  s3SecretAccessKey: \"your_sk\"\n  s3Region: \"ap-nanjing\"\n  S3Bucket: \"my-bucket-1234567890\"\n  S3Endpoint: \"cos.ap-nanjing.myqcloud.com\"\n```\n\n#### CFS 配置示例\n\n**前置条件**：CFS 已挂载到集群节点\n\n```yaml\nStorageConfig:\n  protocol: cfs\n  CFSMountPath: \"/mnt/cfs\"      # CFS 挂载路径\n  StoreDir: \"corefiles\"          # CFS 内的存储目录\n  DeleteLocalCorefile: true\n```\n\n**配置 Watcher Pod 的 volume 挂载**（在 Helm values 中）：\n\n```yaml\ncorefileVolume:\n  type: hostPath\n  hostPath:\n    path: /mnt/cfs                # 与 CFSMountPath 一致\n```\n\n### values.yaml 必填配置\n\n编辑 `charts/values.yaml`：\n\n```yaml\nconfig:\n  coredog: |-\n    StorageConfig:\n      # 存储协议选择\n      protocol: s3                           # s3: S3/COS, cfs: CFS（默认 s3）\n      \n      # === 若使用 S3 或 COS 存储 ===\n      s3AccesskeyID: \"YOUR_ACCESS_KEY\"\n      s3SecretAccessKey: \"YOUR_SECRET_KEY\"\n      s3Region: \"ap-nanjing\"\n      S3Bucket: \"your-bucket\"\n      S3Endpoint: \"cos.ap-nanjing.myqcloud.com\"  # COS 填这个，S3 填 S3 endpoint\n      \n      # === 若使用 CFS 存储 ===\n      CFSMountPath: \"/mnt/cfs\"               # CFS 挂载路径\n      \n      # 通用配置\n      StoreDir: corefiles                    # 存储目录\n      DeleteLocalCorefile: true              # 上传后删除本地文件\n```\n      S3Bucket: \"your-bucket\"\n      S3Endpoint: \"cos.ap-nanjing.myqcloud.com\"\n      \n      # 上传后删除本地文件（强烈推荐）\n      DeleteLocalCorefile: true\n    \n    # ⚠️ 通知渠道（至少配置一个）\n    NoticeChannel:\n      - chan: wechat\n        webhookurl: \"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY\"\n    \n    # [可选] CoreSight 集成 - 自动分析 core dump\n    # CoreSight:\n    #   enabled: true\n    #   apiUrl: \"http://coresight-api:8000\"\n    #   token: \"your-agent-token\"\n```\n\n### 自定义处理器配置\n\nCoreDog 支持在检测到 coredump 后执行自定义 shell 脚本，可选择性地替代默认的通知和 CoreSight 上报行为。\n\n```yaml\nCustomHandler:\n  enabled: true                    # 启用自定义处理器\n  timeout: 300                     # 脚本超时时间（秒）\n  skipDefaultNotify: true          # 跳过默认通知（企业微信/Slack）\n  skipCoreSight: true              # 跳过 CoreSight 上报\n  script: |\n    #!/bin/bash\n    # 发送到自定义 webhook\n    curl -X POST \"https://your-api.com/coredump\" \\\n      -H \"Content-Type: application/json\" \\\n      -d \"{\\\"url\\\": \\\"$COREDUMP_URL\\\", \\\"pod\\\": \\\"$POD_NAMESPACE/$POD_NAME\\\"}\"\n```\n\n**可用环境变量**：\n\n| 变量 | 说明 | 示例 |\n|-----|------|------|\n| `COREDUMP_FILE` | 本地文件路径 | `/corefile/core.bash.123` |\n| `COREDUMP_URL` | 上传后的 URL | `https://s3.xxx/corefiles/xxx` |\n| `COREDUMP_FILENAME` | 文件名 | `core.bash.123` |\n| `COREDUMP_MD5` | 文件 MD5 | `abc123...` |\n| `COREDUMP_SIZE` | 文件大小（字节） | `1234567` |\n| `COREDUMP_EXECUTABLE` | 可执行文件路径 | `/usr/bin/bash` |\n| `POD_NAME` | Pod 名称 | `my-app-xxx` |\n| `POD_NAMESPACE` | 命名空间 | `default` |\n| `POD_UID` | Pod UID | `abc-123-xxx` |\n| `POD_NODE_IP` | 节点 IP | `10.0.0.1` |\n| `POD_IMAGE` | 容器镜像 | `my-app:v1` |\n| `POD_CONTAINER` | 容器名称 | `app` |\n| `HOST_IP` | 宿主机 IP | `10.0.0.1` |\n\n\u003e **注意**：部分 Pod 信息（如 `POD_IMAGE`、`POD_NODE_IP` 等）在某些情况下可能为空。\n\u003e 脚本中建议使用 `[ -z \"$POD_IMAGE\" ]` 检查变量是否为空，或使用 `${POD_NAME:-unknown}` 提供默认值。\n\n### Annotations 配置\n\n| Annotation | 必填 | 说明 | 示例 |\n|-----------|------|------|------|\n| `coredog.io/inject` | ✅ | 是否开启注入 | `\"true\"` |\n| `coredog.io/path` | ✅ | Core dump 挂载路径 | `\"/corefile\"` |\n| `coredog.io/container` | ❌ | 指定容器（逗号分隔），不填=所有容器 | `\"app,worker\"` |\n\n### 路径安全限制\n\n以下路径不允许使用（安全考虑）：\n- `/`, `/etc`, `/usr`, `/bin`, `/sbin`, `/var`, `/root`, `/home`, `/boot`\n\n推荐使用：\n- `/corefile` ✅\n- `/data/dumps` ✅\n- `/app/coredumps` ✅\n\n## 使用场景\n\n### 场景 1: 单容器应用\n\n```yaml\nmetadata:\n  annotations:\n    coredog.io/inject: \"true\"\n    coredog.io/path: \"/corefile\"\nspec:\n  containers:\n  - name: app\n    image: my-app:v1\n```\n\n### 场景 2: 多容器 Pod - 只监控特定容器\n\n```yaml\nmetadata:\n  annotations:\n    coredog.io/inject: \"true\"\n    coredog.io/path: \"/corefile\"\n    coredog.io/container: \"gamesvr,dbproxy\"  # 只监控业务容器\nspec:\n  containers:\n  - name: gamesvr      # ✅ 会被注入\n  - name: dbproxy      # ✅ 会被注入\n  - name: nginx        # ❌ 不会被注入\n  - name: metrics      # ❌ 不会被注入\n```\n\n### 场景 3: 自定义路径\n\n```yaml\nmetadata:\n  annotations:\n    coredog.io/inject: \"true\"\n    coredog.io/path: \"/data/dumps\"  # 自定义路径\nspec:\n  containers:\n  - name: app\n    command:\n      - bash\n      - -c\n      - |\n        ulimit -c unlimited\n        cd /data/dumps  # 确保路径一致\n        exec /app/server\n```\n\n### 场景 4: 自定义处理器 - 发送到自定义系统\n\n```yaml\n# values.yaml\nconfig:\n  coredog: |-\n    # ... 存储配置 ...\n    \n    # 启用自定义处理器，替代默认通知\n    CustomHandler:\n      enabled: true\n      skipDefaultNotify: true    # 不发送企业微信/Slack\n      skipCoreSight: false       # 仍然上报 CoreSight\n      timeout: 60\n      script: |\n        #!/bin/bash\n        # 发送到内部告警系统\n        curl -X POST \"https://alert.internal.com/api/coredump\" \\\n          -H \"Authorization: Bearer $ALERT_TOKEN\" \\\n          -H \"Content-Type: application/json\" \\\n          -d \"{\n            \\\"severity\\\": \\\"critical\\\",\n            \\\"source\\\": \\\"coredog\\\",\n            \\\"pod\\\": \\\"$POD_NAMESPACE/$POD_NAME\\\",\n            \\\"node\\\": \\\"$POD_NODE_IP\\\",\n            \\\"file_url\\\": \\\"$COREDUMP_URL\\\",\n            \\\"executable\\\": \\\"$COREDUMP_EXECUTABLE\\\"\n          }\"\n```\n\n## 架构说明\n\n```\nPod 创建 → Webhook 拦截 → 检查 annotations\n                            ↓\n                    inject=true 且 path 已设置？\n                            ↓\n                    注入 volume 和 volumeMount\n                            ↓\n                    hostPath: /data/coredog-system/dumps/\u003cns\u003e/\u003cpod\u003e/\n                    mountPath: \u003cpath annotation\u003e\n                            ↓\n应用崩溃 → 生成 core dump → /data/coredog-system/dumps/\u003cns\u003e/\u003cpod\u003e/core.xxx\n                            ↓\n                    Watcher 检测到文件\n                            ↓\n                    从路径解析: namespace + podname\n                            ↓\n                    上传到 S3 → 删除本地文件 → 发送通知\n```\n\n## 验证和测试\n\n### 验证注入\n\n```bash\n# 创建测试 Pod\nkubectl run test --image=ubuntu \\\n  --annotations=\"coredog.io/inject=true,coredog.io/path=/corefile\" \\\n  -- sleep 3600\n\n# 检查是否注入成功\nkubectl get pod test -o yaml | grep -A 5 coredog-corefile\n\n# 应该看到：\n# - name: coredog-corefile\n#   hostPath:\n#     path: /data/coredog-system/dumps/default/test\n```\n\n### 测试收集\n\n```bash\n# 创建会崩溃的测试应用\ncat \u003c\u003cEOF | kubectl apply -f -\napiVersion: v1\nkind: Pod\nmetadata:\n  name: crash-test\n  annotations:\n    coredog.io/inject: \"true\"\n    coredog.io/path: \"/corefile\"\nspec:\n  containers:\n  - name: app\n    image: ubuntu:22.04\n    command:\n      - bash\n      - -c\n      - |\n        ulimit -c unlimited\n        sleep 5\n        kill -11 \\$\\$  # 触发段错误\nEOF\n\n# 查看收集日志\nkubectl logs -n coredog-system -l app.kubernetes.io/component=watcher -f\n```\n\n**期望看到**：\n```\nlevel=info msg=\"capture a file:/corefile/core.bash.xxx\"\nlevel=info msg=\"resolved pod from webhook path: default/crash-test\"\nlevel=info msg=\"deleted local corefile: /corefile/core.bash.xxx\"\n```\n\n## 故障排查\n\n### Pod 未被注入\n\n**检查**：\n```bash\n# 1. 查看 webhook 日志\nkubectl logs -n coredog-system -l app.kubernetes.io/component=webhook\n\n# 应该看到类似：\n# Skip injection for pod default/my-pod - Reason: annotation coredog.io/path is required but not set\n```\n\n**常见原因**：\n- ❌ 忘记添加 `coredog.io/inject: \"true\"`\n- ❌ 忘记添加 `coredog.io/path`\n- ❌ path 使用了危险路径（如 `/etc`）\n\n### Core Dump 未被检测\n\n**检查**：\n```bash\n# 1. 验证节点配置\ncat /proc/sys/kernel/core_pattern\n# 应该是: /data/coredog-system/dumps/%E/%E.%p.%h.%t\n\n# 2. 验证 ulimit\nkubectl exec \u003cpod\u003e -c \u003ccontainer\u003e -- bash -c \"ulimit -c\"\n# 应该是: unlimited\n\n# 3. 查看 watcher 日志\nkubectl logs -n coredog-system -l app.kubernetes.io/component=watcher -f\n```\n\n### Pod 信息识别失败\n\n**现象**：通知显示 `[/] core: xxx` 而不是 `[namespace/podname]`\n\n**原因**：路径格式不符合预期\n\n**解决**：\n- 确认 Pod 有正确的 annotations\n- 确认 Webhook 正常工作\n- 检查文件实际路径是否为 `/data/coredog-system/dumps/\u003cns\u003e/\u003cpod\u003e/core.xxx`\n\n### 本地文件未清理\n\n**检查**：\n```bash\n# 查看配置\nkubectl get cm -n coredog-system coredog -o yaml | grep DeleteLocalCorefile\n# 应该是: true\n\n# 查看日志中是否有删除记录\nkubectl logs -n coredog-system -l app.kubernetes.io/component=watcher | grep \"deleted local corefile\"\n```\n\n## 通知配置\n\n### 企业微信\n\n```yaml\nNoticeChannel:\n  - chan: wechat\n    webhookurl: \"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx\"\n    keyword: \"\"  # 不过滤\n```\n\n### Slack\n\n```yaml\nNoticeChannel:\n  - chan: slack\n    webhookurl: \"https://hooks.slack.com/services/xxx\"\n```\n\n### 多渠道 + 过滤\n\n```yaml\nNoticeChannel:\n  # 所有环境\n  - chan: wechat\n    webhookurl: \"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=ALL\"\n    keyword: \"\"\n  \n  # 只通知生产环境\n  - chan: wechat\n    webhookurl: \"https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=PROD\"\n    keyword: \"production\"\n```\n\n### 自定义消息\n\n```yaml\nmessageTemplate: |\n  🚨 应用崩溃\n  Pod: {pod.namespace}/{pod.name}\n  节点: {pod.node}\n  文件: {corefile.filename}\n  下载: {corefile.url}\n```\n\n**可用变量**：\n- `{pod.namespace}`, `{pod.name}`, `{pod.uid}`, `{pod.node}`\n- `{host.ip}`\n- `{corefile.path}`, `{corefile.filename}`, `{corefile.url}`\n\n## 运维管理\n\n### 查看已开启 CoreDog 的 Pod\n\n```bash\nkubectl get pods -A -o json | jq -r '.items[] | select(.metadata.annotations[\"coredog.io/inject\"] == \"true\") | \"\\(.metadata.namespace)/\\(.metadata.name)\"'\n```\n\n### 批量开启\n\n```bash\nkubectl patch deployment my-app -p '{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"coredog.io/inject\":\"true\",\"coredog.io/path\":\"/corefile\"}}}}}'\n```\n\n### 升级\n\n```bash\nhelm upgrade coredog ./charts -n coredog-system\n```\n\n### 卸载\n\n```bash\nhelm uninstall coredog -n coredog-system\nkubectl delete mutatingwebhookconfiguration coredog\n```\n\n## 文档\n\n- [故障排查指南](docs/troubleshooting.md)\n- [CoreSight 集成指南](CORESIGHT_INTEGRATION.md) - 可选：自动分析 core dump\n\n## 许可证\n\nApache License 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyafeiaa%2Fcoredog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyafeiaa%2Fcoredog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyafeiaa%2Fcoredog/lists"}