{"id":15222070,"url":"https://github.com/googlecloudplatform/gke-monitoring-tutorial","last_synced_at":"2025-10-03T15:30:26.716Z","repository":{"id":33251383,"uuid":"142072071","full_name":"GoogleCloudPlatform/gke-monitoring-tutorial","owner":"GoogleCloudPlatform","description":"This project walks you through setting up monitoring and visualizing metrics from a Kubernetes Engine cluster. The logs from the Kubernetes Engine cluster will be leveraged to walk through the monitoring capabilities of Stackdriver.","archived":true,"fork":false,"pushed_at":"2022-03-04T21:48:46.000Z","size":275,"stargazers_count":109,"open_issues_count":9,"forks_count":63,"subscribers_count":18,"default_branch":"master","last_synced_at":"2024-12-18T08:40:49.972Z","etag":null,"topics":["gke","gke-helmsman","google-cloud-platform","kubernetes","kubernetes-engine","monitoring","stackdriver"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GoogleCloudPlatform.png","metadata":{"files":{"readme":"README-QWIKLABS.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-23T21:36:54.000Z","updated_at":"2024-06-18T13:35:42.000Z","dependencies_parsed_at":"2022-07-21T06:18:17.313Z","dependency_job_id":null,"html_url":"https://github.com/GoogleCloudPlatform/gke-monitoring-tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fgke-monitoring-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fgke-monitoring-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fgke-monitoring-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fgke-monitoring-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GoogleCloudPlatform","download_url":"https://codeload.github.com/GoogleCloudPlatform/gke-monitoring-tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235146516,"owners_count":18943273,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gke","gke-helmsman","google-cloud-platform","kubernetes","kubernetes-engine","monitoring","stackdriver"],"created_at":"2024-09-28T15:10:15.029Z","updated_at":"2025-10-03T15:30:26.237Z","avatar_url":"https://github.com/GoogleCloudPlatform.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Monitoring with Stackdriver on Kubernetes Engine\n\n## Table of Contents\n* [Introduction](#introduction)\n* [Architecture](#architecture)\n* [Initial Setup](#initial-setup)\n  * [Configure gcloud](#configure-gcloud)\n* [Tools](#tools)\n  * [Install Cloud SDK](#install-cloud-sdk)\n  * [Install Kubectl](#install-kubectl-cli)\n  * [Install Terraform](#install-terraform)\n  * [Configure Authentication](#configure-authentication)\n* [Deployment](#deployment)\n  * [Create a new Stackdriver Account](#create-a-new-stackdriver-account)\n  * [Deploying the cluster](#deploying-the-cluster)\n  * [How does Terraform work?](#how-does-terraform-work)\n* [Validation](#validation)\n  * [Using Stackdriver Kubernetes Monitoring](#using-stackdriver-kubernetes-monitoring)\n    * [Native Prometheus integration](#native-prometheus-integration)\n* [Teardown](#teardown)\n* [Troubleshooting](#troubleshooting)\n* [Relevant Material](#relevant-material)\n\n## Introduction\n[Stackdriver Kubernetes Monitoring](https://cloud.google.com/monitoring/kubernetes-engine/) is a new Stackdriver feature that more tightly integrates with GKE to better show you key stats about your cluster and the workloads and services running in it. Included in the new feature is functionality to import, as native Stackdriver metrics, metrics from pods with Prometheus endpoints. This allows you to use Stackdriver native alerting functionality with your Prometheus metrics without any additional workload.\n\nThis tutorial will walk you through setting up Monitoring and visualizing metrics from a Kubernetes Engine cluster.  It makes use of [Terraform](https://www.terraform.io/), a declarative [Infrastructure as Code](https://en.wikipedia.org/wiki/Infrastructure_as_Code) tool that enables configuration files to be used to automate the deployment and evolution of infrastructure in the cloud.  The logs from the Kubernetes Engine cluster will be leveraged to walk through the monitoring capabilities of Stackdriver.\n\n**Note:** The setup of the Stackdriver Monitoring workspace is not automated with a script because it is currently not supported through Terraform or via the gcloud command line tool.\n\n## Architecture\n\nThe tutorial will create a Kubernetes Engine cluster that has a sample application deployed to it.  The logging and metrics for the cluster are loaded into Stackdriver Logging by default.  In the tutorial a Stackdriver Monitoring account will be setup to view the metrics captured.\n\n![Monitoring Architecture](docs/architecture.png)\n\n## Initial Setup\n\n### Configure gcloud\n\nAll the tools for the demo are installed. When using Cloud Shell execute the following\ncommand in order to setup gcloud cli. When executing this command please setup your region\nand zone.\n\n```console\ngcloud init\n```\n\n### Tools\n1. [Terraform \u003e= 0.11.7](https://www.terraform.io/downloads.html)\n2. [Google Cloud SDK version \u003e= 204.0.0](https://cloud.google.com/sdk/docs/downloads-versioned-archives)\n3. [kubectl matching the latest GKE version](https://kubernetes.io/docs/tasks/tools/install-kubectl/)\n\nYou can obtain a [free trial of GCP](https://cloud.google.com/free/) if you need one\n\n#### Install Cloud SDK\nThe Google Cloud SDK is used to interact with your GCP resources.\n[Installation instructions](https://cloud.google.com/sdk/downloads) for multiple platforms are available online.\n\n#### Install kubectl CLI\n\nThe kubectl CLI is used to interteract with both Kubernetes Engine and kubernetes in general.\n[Installation instructions](https://cloud.google.com/kubernetes-engine/docs/quickstart)\nfor multiple platforms are available online.\n\n#### Install Terraform\n\nTerraform is used to automate the manipulation of cloud infrastructure. Its\n[installation instructions](https://www.terraform.io/intro/getting-started/install.html) are also available online.\n\n### Configure Authentication\n\nThe Terraform configuration will execute against your GCP environment and create a Kubernetes Engine cluster running a simple application.  The configuration will use your personal account to build out these resources.  To setup the default account the configuration will use, run the following command to select the appropriate account:\n\n```console\n$ gcloud auth application-default login\n```\n\n## Deployment\n\nIn this section we will create a Stackdriver Monitoring account so that we can explore the capabilities of the Monitoring console.\n\n### Create a new Stackdriver Account\n\nThe following steps are used to setup a Stackdriver Monitoring account.\n1. Visit the **Monitoring** section of the GCP Console.  This will launch the process of creating a new Monitoring console if you have not created one before.\n2. On the **Create your free StackDriver account** page select the project you created earlier.  **Note:** You cannot change this setting once it is created.\n3. Click on the **Create Account** button.\n4. On the next page, **Add Google Cloud Platform projects to monitor** you can leave this alone since the project is already selected it isn't necessary to select any other projects.  **Note:** You can add and remove projects at a later date if necessary.\n5. Click the **Continue** button.\n6. On the **Monitor AWS accounts** page you can choose to specify your AWS account information or skip this step.\n7. For this tutorials purposes you can click the **Skip AWS Setup** button.\n8. On the **Install the Stackdriver Agents** page you are provided with a script that can be used to add the Stackdriver Monitoring and Logging agents on each of your VM instances.  **Note:** The tracking of VM's is not automatic like it is for Kubernetes Engine.  For the purposes of this tutorial this script is not needed.\n9. Click the **Continue** button.\n10. On the **Get Reports by Email** page you can simply select any of the options depending on whether you want to receive the reports.  For the purposes of this demo we will not be using the reports.\n11. Click the **Continue** button.\n12. The actual creation of the account and underlying resources takes a few minutes.  Once completed you can press the **Launch monitoring** button.\n\n### Deploying the cluster\n\nThe infrastructure and Stackdriver alert policy required by this project can be deployed by executing:\n```console\nmake create\n```\n\nThis will:\n1. Read your project \u0026 zone configuration to generate a couple config files:\n  * `./terraform/terraform.tfvars` for Terraform variables\n  * `./manifests/prometheus-service-sed.yaml` for the Prometeus policy to be created in Stackdriver\n2. Run `terraform init` to prepare Terraform to create the infrastructure\n3. Run `terraform apply` to actually create the infrastructure \u0026 Stackdriver alert policy\n\nIf you need to override any of the defaults in the Terraform variables file, simply replace the desired value(s) to the right of the equals sign(s). Be sure your replacement values are still double-quoted.\n\nIf no errors are displayed then after a few minutes you should see your Kubernetes Engine cluster in the [GCP Console](https://console.cloud.google.com/kubernetes).\n\n### How does Terraform work?\n\nFollowing the principles of [Infrastructure as Code](https://en.wikipedia.org/wiki/Infrastructure_as_Code) and [Immutable Infrastructure](https://www.oreilly.com/ideas/an-introduction-to-immutable-infrastructure), Terraform supports the writing of declarative descriptions of the desired state of infrastructure. When the descriptor is applied, Terraform uses GCP APIs to provision and update resources to match. Terraform compares the desired state with the current state so incremental changes can be made without deleting everything and starting over.  For instance, Terraform can build out GCP projects and compute instances, etc., even set up a Kubernetes Engine cluster and deploy applications to it. When requirements change, the descriptor can be updated and Terraform will adjust the cloud infrastructure accordingly.\n\nThis example will start up a Kubernetes Engine cluster and deploy a simple sample application to it. By default, Kubernetes Engine clusters in GCP are provisioned with a pre-configured [Fluentd](https://www.fluentd.org/)-based collector that forwards logs to Stackdriver.\n\n## Validation\n\nIf no errors are displayed during deployment, after a few minutes you should see your Kubernetes Engine cluster in the GCP Console with the sample application deployed.\n\nIn order to validate that resources are installed and working correctly, run:\n\n```console\nmake validate\n```\n\n### Using Stackdriver Kubernetes Monitoring\n\nFor a thorough guide on how to observe your cluster with the new Stackdriver Kubernetes UI, see [Observing Your Kubernetes Clusters](https://cloud.google.com/monitoring/kubernetes-engine/observing).\n\n#### Native Prometheus integration\n\nThe Terraform code included a Stackdriver alerting policy that is watching a metric that was originally imported from a Prometheus endpoint.\nFrom the Stackdriver main page, click on `Alerting` then `Policies Overview` to show all the policies, including the alerting policy called `Prometheus mem alloc`. Clicking on the policy will provide much more detail.\n\n\n## Teardown\n\nWhen you are finished with this example, and you are ready to clean up the resources that were created so that you avoid accruing charges, you can run the following command to remove all resources :\n\n```\n$ make teardown\n```\n\nThis command uses the `terraform destroy` command to remove the infrastructure. Terraform tracks the resources it creates so it is able to tear them all back down.\n\n## Troubleshooting\n\n** The install script fails with a `Permission denied` when running Terraform.**\nThe credentials that Terraform is using do not provide the\nnecessary permissions to create resources in the selected projects. Ensure\nthat the account listed in `gcloud config list` has necessary permissions to\ncreate resources. If it does, regenerate the application default credentials\nusing `gcloud auth application-default login`.\n\n** Metrics Not Appearing or Uptime Checks not executing **\nAfter the scripts execute it may take a few minutes for the Metrics or Uptime Checks to appear.  Configure the items and give the system some time to generate metrics and checks as they someimes take time to complete.\n\n## Relevant Material\n* [Stackdriver Kubernetes Monitoring](https://cloud.google.com/monitoring/kubernetes-engine/)\n* [Terraform Google Cloud Provider](https://www.terraform.io/docs/providers/google/index.html)\n\n\n**This is not an officially supported Google product**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgooglecloudplatform%2Fgke-monitoring-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgooglecloudplatform%2Fgke-monitoring-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgooglecloudplatform%2Fgke-monitoring-tutorial/lists"}