Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mchmarny/disco
Utility for bulk image, license, package, and vulnerability discovery in containerize workloads on GCP. Includes CLI and Service with custom metrics and BigQuery data exports.
https://github.com/mchmarny/disco
Last synced: 5 days ago
JSON representation
Utility for bulk image, license, package, and vulnerability discovery in containerize workloads on GCP. Includes CLI and Service with custom metrics and BigQuery data exports.
- Host: GitHub
- URL: https://github.com/mchmarny/disco
- Owner: mchmarny
- License: apache-2.0
- Created: 2022-12-23T13:18:03.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-21T13:38:48.000Z (12 months ago)
- Last Synced: 2023-11-21T14:38:27.755Z (12 months ago)
- Language: Go
- Homepage:
- Size: 57.8 MB
- Stars: 13
- Watchers: 2
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE-OF-CONDUCT.md
Awesome Lists containing this project
README
[![](https://github.com/mchmarny/disco/actions/workflows/on-push.yaml/badge.svg?branch=main)](https://github.com/mchmarny/disco/actions/workflows/on-push.yaml)
[![](https://github.com/mchmarny/disco/actions/workflows/on-tag.yaml/badge.svg)](https://github.com/mchmarny/disco/actions/workflows/on-tag.yaml)
[![](https://codecov.io/gh/mchmarny/disco/branch/main/graph/badge.svg?token=9HLYDZZADN)](https://codecov.io/gh/mchmarny/disco)
[![version](https://img.shields.io/github/release/mchmarny/disco.svg?label=version)](https://github.com/mchmarny/disco/releases/latest)
[![](https://img.shields.io/github/go-mod/go-version/mchmarny/disco.svg?label=go)](https://github.com/mchmarny/disco)
[![](https://goreportcard.com/badge/github.com/mchmarny/disco)](https://goreportcard.com/report/github.com/mchmarny/disco)
[![](https://img.shields.io/badge/License-Apache%202.0-blue.svg?label=license)](https://github.com/mchmarny/disco/blob/main/LICENSE)# disco
Utility for bulk image, license, package, and vulnerability discovery in containerize workloads on GCP.
![](docs/img/dashboard.png)
> Note: this is a personal project, not an official Google product.
Features:
* Discover currently deployed container images in Cloud Run, GCF, and GKE
* supports multiple project and region
* resolves deploys deployed image to their digests
* Report on vulnerabilities, packages, or licenses in these images
* scans base images and packages
* supports filters (e.g CVE, package name, license type)
* Available as CLI os Service (for continuous discovery)![](docs/img/disco.gif)
Additionally, when deployed as a service, `disco` will:
* Publish custom metrics (time-series) in Cloud Monitoring to support:
* custom charts and dashboards (e.g. image vulnerability over time)
* metric threshold alerts (e.g. page on `CRITICAL` vulnerability in project `X`)
* Export image license, package, and vulnerability data to BigQuery
* query data using SQL (e.g. package versions over time)
* create ML models (e.g. vulnerability source classification model)
* build custom visualizations using Google Sheets, Data Studio, or Looker
* Archive raw license, package, and vulnerability scanner outputs into GCS bucket
* each file stored in "folder" named after image SHA## Why
It's easy to end up with a large number of containerized workloads across many GCP projects and regions: Cloud Run, GKE, or even Cloud Functions (yes, those end up running as a container too). You can scan these containers in Artifact Registry using [Container Analysis](https://cloud.google.com/container-analysis/docs/container-analysis) service, but currently it only [covers base OS](https://cloud.google.com/container-analysis/docs/os-overview). It's also not easy to know which of these images (and which versions) are actually being used in active services. Services like Cloud Run also support [multiple revisions](https://cloud.google.com/run/docs/managing/revisions), each potentially using a different version of an image, so identifying container images currently underpinning your services can get complicated.
`disco` provides an easy way to `disco`ver which of these container images are currently deployed, and automates the vulnerability/license scanning.
## Install
You can use `disco` either as CLI or Service:
* [CLI](docs/CLI.md) - Supports most common distribution methods (Homebrew, RPM, DEB, Go install, Binary etc).
* [Service](docs/SERVICE.md) - Deploys as a Cloud Run service via Terraform.## Usage
### CLI
```shell
disco command [command options] [arguments...]
```> You can use the `--help` flag on any level to get more information about the runtime, commands, of `disco` itself.
#### Images
Discover deployed images from specific runtime. To see all of the commands available for `img` (or `image`):
```shell
disco image --help
```To discover container images currently deployed in all of the supported runtimes:
```shell
disco img
```Options:
* `--output` - path where to save the output (stdout by default)
* `--format` - output format (`yaml` or `json` which is the default)
* `--project` - scope discovery to a single project using project IDThe resulting report in JSON format will look something like this (abbreviated):
```json
{
"meta": {
"kind": "image",
"version": "v0.3.19-next",
"created": "2022-12-28T21:20:15Z",
},
"items": [
{
"uri": "us-west1-docker.pkg.dev/cloudy-demos/gcf-artifacts/test--func@sha256:d22bfc69913190ff9d274553bc55f782b5056b0d2ed62b52eb327a34c90d7203",
"context": {
"container-name": "test--func-1",
"location-id": "us-west1",
"location-name": "Oregon",
"project-id": "cloudy-demos",
"project-number": "799736955886",
"runtime": "gcf",
"service-id": "projects/cloudy-demos/locations/us-west1/services/test-func",
"service-name": "test-func",
"service-revision": "projects/cloudy-demos/locations/us-west1/services/test-func/revisions/test-func-00001-fiz"
}
},
...
]
}
```#### Vulnerabilities
Discover potential vulnerabilities in container images. To see all of the commands available for `vul` (or `vulnerability`):
```shell
disco vulnerability --help
```Options:
* `--file` - image list input file path to serve as a source (instead of discovery) (e.g. `disco img --output images.json`)
* `--image` - specific image URI to scan. Note: `source` and `image` are mutually exclusive
* `--output` - saves report to file at this path (stdout by default)
* `--format` - output format (`yaml` or `json` which is the default)
* `--project` - during discovery, runs only on specific project (project ID)
* `--min-severity` - minimum severity of vulnerability to include in report (e.g. low, medium, high, critical, default: all)
* `--cve` - filter results on a specific CVE ID (e.g. `CVE-2020-22046`)
* `--target` - target data store to save the results to (e.g. `bq://my-project.some-dataset` or `bq://my-project.some-dataset.table-name`)> Using the `cve` filter you can quickly check if any of the currently deployed images have a vulnerability.
The resulting report in JSON format will look something like this (abbreviated):
```json
{
"meta": {
"kind": "vulnerability",
"version": "v0.3.19-next",
"created": "2022-12-28T21:32:34Z",
"count": 5
},
"items": [
{
"image": "gcr.io/cloudy-demos/hello-broken@sha256:0900c08e7d40f94...",
"context": {
"container-name": "hello-broken-1",
"location-id": "us-central1",
"location-name": "Iowa",
...
},
"vulnerabilities": [
{
"source": "CVE-2021-28165",
"severity": "HIGH",
"package": "org.eclipse.jetty:jetty-util",
"version": "9.4.31.v20200723",
"title": "jetty: Resource exhaustion when receiving an invalid large TLS frame",
"description": "In Eclipse Jetty 7.2.2 to 9.4.38, 10.0.0.alpha0 to 10.0.1, and 11.0.0.alpha0 to 11.0.1, CPU usage can reach 100% upon receiving a large invalid TLS frame.",
"url": "https://avd.aquasec.com/nvd/cve-2021-28165",
"updated": "2022-07-29T17:05:00Z"
},
...
]
},
...
]
}
```#### Licenses
Discover licenses for OS and packages used in container images. To see all of the commands available for `lic` or `license`:
```shell
disco license --help
```Options:
* `--file` - image list input file path to serve as a source (instead of discovery) (e.g. `disco img --output images.json`)
* `--image` - specific image URI to scan. Note: `source` and `image` are mutually exclusive
* `--output` - saves report to file at this path (stdout by default)
* `--format` - output format (`yaml` or `json` which is the default)
* `--project` - during discovery, runs only on specific project (project ID)
* `--type` - license type filter (supports prefix: e.g. `apache`, `bsd`, `mit`, etc.)
* `--target` - target data store to save the results to (e.g. `bq://my-project`)> Using the `type` you can quickly check if any of your currently deployed images are using specific license.
The resulting report in JSON format will look something like this (abbreviated):
```json
{
"meta": {
"kind": "license",
"version": "v0.3.19-next",
"created": "2022-12-28T21:23:20Z",
},
"items": [
{
"image": "us-docker.pkg.dev/cloudrun/container/hello@sha256:2e70803dbc92...",
"context": {
"container-name": "hello-1",
"project-id": "cloudy-demos",
"project-number": "799736955886",
...
},
"licenses": [
{
"name": "GPL-2.0",
"source": "alpine-baselayout-data"
},
{
"name": "MIT",
"source": "alpine-keys"
},
...
]
},
...
]
}
```#### Packages
Discover packages used in container images. To see all of the commands available for `pkg` (or `packages`):
```shell
disco packages --help
```Options:
* `--file` - image list input file path to serve as a source (instead of discovery) (e.g. `disco img --output images.json`)
* `--image` - specific image URI to scan. Note: `source` and `image` are mutually exclusive
* `--output` - saves report to file at this path (stdout by default)
* `--format` - output format (`yaml` or `json` which is the default)
* `--project` - during discovery, runs only on specific project (project ID)
* `--name` - package name filter (uses contains, e.g. libgcc, gobinary, express, etc.)
* `--target` - target data store to save the results to (e.g. `bq://my-project`)> Using the `type` you can quickly check if any of your currently deployed images are using specific license.
The resulting report in JSON format will look something like this (abbreviated):
```json
{
"meta": {
"kind": "package",
"version": "v0.9.4",
"created": "2023-01-08T00:37:26Z",
},
"items": [
{
"image": "us-central1-docker.pkg.dev/cloudy-labz/gcf-artifacts/test--go119@sha256:80be8e3c174...",
"context": {
"container-name": "test--go119",
"location-id": "us-central1",
"location-name": "Iowa",
...
},
"packages": [
{
"package": "minipass-sized",
"version": "1.0.3",
"source": "pkg:npm/[email protected]",
"license": "ISC",
"format": "SPDX-2.2",
"provider": "trivy"
},
...
],
...
}
]
}
```### Service
> Instructions on how to deploy the `disco` service are [here](docs/SERVICE.md).
When running as a service, `disco` automatically exports metrics and report data:
#### Metrics
`disco` metrics can be found in [Metric Explorer](https://cloud.google.com/monitoring/charts/metrics-explorer)
![](docs/img/metric-explore.png)
Custom time-series metrics created by `disco`:
* `disco/vulnerability/image` - count of images scanned for vulnerability (labels: project, version)
* `disco/vulnerability/severity` - vulnerability severity count (labels: project, version, kind)
* `disco/license/image` - count of images scanned for licenses (labels: project, version)
* `disco/license/count` - count of licenses (labels: project, version)
* `disco/package/image` - count of images scanned for packages (labels: project, version)
* `disco/package/count` - count of packages (labels: project, version)> License and packages have too high cardinality for more detail labels.
![](docs/img/metrics.png)
#### Data
`disco` service automatically exports its data to two BigQuery tables
Common elements:
* `batch_id` is the unique ID of each discovery operation
* `image` is the image URI sans tag or sha
* `sha` is the image digest prefixed with `sha:`
* `updated` is the timestamp when the data element was extracted**licenses**
```shell
{name: "batch_id", type: "integer", required: true},
{name: "image", type: "string", required: true},
{name: "sha", type: "string"},
{name: "name", type: "string", required: true},
{name: "package", type: "string"},
{name: "updated", type: "timestamp", required: true}
```**packages**
```shell
{name: "batch_id", type: "integer", required: true},
{name: "image", type: "string", required: true},
{name: "sha", type: "string"},
{name: "cve", type: "string", required: true},
{name: "severity", type: "string"},
{name: "package", type: "string"},
{name: "version", type: "string"},
{name: "title", type: "string"},
{name: "description", type: "string"},
{name: "url", type: "string"},
{name: "updated", type: "timestamp", required: true}
```**vulnerabilities**
```shell
{name: "batch_id", type: "integer", required: true},
{name: "image", type: "string", required: true},
{name: "sha", type: "string"},
{name: "format", type: "string", required: true},
{name: "provider", type: "string", required: true},
{name: "package", type: "string", required: true},
{name: "version", type: "string"},
{name: "source", type: "string"},
{name: "license", type: "string"},
{name: "updated", type: "timestamp", required: true}
```You can use these in your custom queries:
![](docs/img/query.png)
Sample of SQL queries available [here](docs/QUERIES.md).
or in Sheet, Data Studio, or Looker reports
![](docs/img/sheets.png)
## Disclaimer
This is my personal project and it does not represent my employer. While I do my best to ensure that everything works, I take no responsibility for issues caused by this code.