Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/GoogleCloudPlatform/gke-monitoring-tutorial
This project walks you through setting up monitoring and visualizing metrics from a Kubernetes Engine cluster. The logs from the Kubernetes Engine cluster will be leveraged to walk through the monitoring capabilities of Stackdriver.
https://github.com/GoogleCloudPlatform/gke-monitoring-tutorial
gke gke-helmsman google-cloud-platform kubernetes kubernetes-engine monitoring stackdriver
Last synced: 12 days ago
JSON representation
This project walks you through setting up monitoring and visualizing metrics from a Kubernetes Engine cluster. The logs from the Kubernetes Engine cluster will be leveraged to walk through the monitoring capabilities of Stackdriver.
- Host: GitHub
- URL: https://github.com/GoogleCloudPlatform/gke-monitoring-tutorial
- Owner: GoogleCloudPlatform
- License: apache-2.0
- Archived: true
- Created: 2018-07-23T21:36:54.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-03-04T21:48:46.000Z (over 2 years ago)
- Last Synced: 2024-03-16T23:44:03.840Z (8 months ago)
- Topics: gke, gke-helmsman, google-cloud-platform, kubernetes, kubernetes-engine, monitoring, stackdriver
- Language: Shell
- Homepage:
- Size: 269 KB
- Stars: 108
- Watchers: 18
- Forks: 65
- Open Issues: 9
-
Metadata Files:
- Readme: README-QWIKLABS.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - GoogleCloudPlatform/gke-monitoring-tutorial - This project walks you through setting up monitoring and visualizing metrics from a Kubernetes Engine cluster. The logs from the Kubernetes Engine cluster will be leveraged to walk through the monitor (Shell)
README
# Monitoring with Stackdriver on Kubernetes Engine
## Table of Contents
* [Introduction](#introduction)
* [Architecture](#architecture)
* [Initial Setup](#initial-setup)
* [Configure gcloud](#configure-gcloud)
* [Tools](#tools)
* [Install Cloud SDK](#install-cloud-sdk)
* [Install Kubectl](#install-kubectl-cli)
* [Install Terraform](#install-terraform)
* [Configure Authentication](#configure-authentication)
* [Deployment](#deployment)
* [Create a new Stackdriver Account](#create-a-new-stackdriver-account)
* [Deploying the cluster](#deploying-the-cluster)
* [How does Terraform work?](#how-does-terraform-work)
* [Validation](#validation)
* [Using Stackdriver Kubernetes Monitoring](#using-stackdriver-kubernetes-monitoring)
* [Native Prometheus integration](#native-prometheus-integration)
* [Teardown](#teardown)
* [Troubleshooting](#troubleshooting)
* [Relevant Material](#relevant-material)## Introduction
[Stackdriver Kubernetes Monitoring](https://cloud.google.com/monitoring/kubernetes-engine/) is a new Stackdriver feature that more tightly integrates with GKE to better show you key stats about your cluster and the workloads and services running in it. Included in the new feature is functionality to import, as native Stackdriver metrics, metrics from pods with Prometheus endpoints. This allows you to use Stackdriver native alerting functionality with your Prometheus metrics without any additional workload.This tutorial will walk you through setting up Monitoring and visualizing metrics from a Kubernetes Engine cluster. It makes use of [Terraform](https://www.terraform.io/), a declarative [Infrastructure as Code](https://en.wikipedia.org/wiki/Infrastructure_as_Code) tool that enables configuration files to be used to automate the deployment and evolution of infrastructure in the cloud. The logs from the Kubernetes Engine cluster will be leveraged to walk through the monitoring capabilities of Stackdriver.
**Note:** The setup of the Stackdriver Monitoring workspace is not automated with a script because it is currently not supported through Terraform or via the gcloud command line tool.
## Architecture
The tutorial will create a Kubernetes Engine cluster that has a sample application deployed to it. The logging and metrics for the cluster are loaded into Stackdriver Logging by default. In the tutorial a Stackdriver Monitoring account will be setup to view the metrics captured.
![Monitoring Architecture](docs/architecture.png)
## Initial Setup
### Configure gcloud
All the tools for the demo are installed. When using Cloud Shell execute the following
command in order to setup gcloud cli. When executing this command please setup your region
and zone.```console
gcloud init
```### Tools
1. [Terraform >= 0.11.7](https://www.terraform.io/downloads.html)
2. [Google Cloud SDK version >= 204.0.0](https://cloud.google.com/sdk/docs/downloads-versioned-archives)
3. [kubectl matching the latest GKE version](https://kubernetes.io/docs/tasks/tools/install-kubectl/)You can obtain a [free trial of GCP](https://cloud.google.com/free/) if you need one
#### Install Cloud SDK
The Google Cloud SDK is used to interact with your GCP resources.
[Installation instructions](https://cloud.google.com/sdk/downloads) for multiple platforms are available online.#### Install kubectl CLI
The kubectl CLI is used to interteract with both Kubernetes Engine and kubernetes in general.
[Installation instructions](https://cloud.google.com/kubernetes-engine/docs/quickstart)
for multiple platforms are available online.#### Install Terraform
Terraform is used to automate the manipulation of cloud infrastructure. Its
[installation instructions](https://www.terraform.io/intro/getting-started/install.html) are also available online.### Configure Authentication
The Terraform configuration will execute against your GCP environment and create a Kubernetes Engine cluster running a simple application. The configuration will use your personal account to build out these resources. To setup the default account the configuration will use, run the following command to select the appropriate account:
```console
$ gcloud auth application-default login
```## Deployment
In this section we will create a Stackdriver Monitoring account so that we can explore the capabilities of the Monitoring console.
### Create a new Stackdriver Account
The following steps are used to setup a Stackdriver Monitoring account.
1. Visit the **Monitoring** section of the GCP Console. This will launch the process of creating a new Monitoring console if you have not created one before.
2. On the **Create your free StackDriver account** page select the project you created earlier. **Note:** You cannot change this setting once it is created.
3. Click on the **Create Account** button.
4. On the next page, **Add Google Cloud Platform projects to monitor** you can leave this alone since the project is already selected it isn't necessary to select any other projects. **Note:** You can add and remove projects at a later date if necessary.
5. Click the **Continue** button.
6. On the **Monitor AWS accounts** page you can choose to specify your AWS account information or skip this step.
7. For this tutorials purposes you can click the **Skip AWS Setup** button.
8. On the **Install the Stackdriver Agents** page you are provided with a script that can be used to add the Stackdriver Monitoring and Logging agents on each of your VM instances. **Note:** The tracking of VM's is not automatic like it is for Kubernetes Engine. For the purposes of this tutorial this script is not needed.
9. Click the **Continue** button.
10. On the **Get Reports by Email** page you can simply select any of the options depending on whether you want to receive the reports. For the purposes of this demo we will not be using the reports.
11. Click the **Continue** button.
12. The actual creation of the account and underlying resources takes a few minutes. Once completed you can press the **Launch monitoring** button.### Deploying the cluster
The infrastructure and Stackdriver alert policy required by this project can be deployed by executing:
```console
make create
```This will:
1. Read your project & zone configuration to generate a couple config files:
* `./terraform/terraform.tfvars` for Terraform variables
* `./manifests/prometheus-service-sed.yaml` for the Prometeus policy to be created in Stackdriver
2. Run `terraform init` to prepare Terraform to create the infrastructure
3. Run `terraform apply` to actually create the infrastructure & Stackdriver alert policyIf you need to override any of the defaults in the Terraform variables file, simply replace the desired value(s) to the right of the equals sign(s). Be sure your replacement values are still double-quoted.
If no errors are displayed then after a few minutes you should see your Kubernetes Engine cluster in the [GCP Console](https://console.cloud.google.com/kubernetes).
### How does Terraform work?
Following the principles of [Infrastructure as Code](https://en.wikipedia.org/wiki/Infrastructure_as_Code) and [Immutable Infrastructure](https://www.oreilly.com/ideas/an-introduction-to-immutable-infrastructure), Terraform supports the writing of declarative descriptions of the desired state of infrastructure. When the descriptor is applied, Terraform uses GCP APIs to provision and update resources to match. Terraform compares the desired state with the current state so incremental changes can be made without deleting everything and starting over. For instance, Terraform can build out GCP projects and compute instances, etc., even set up a Kubernetes Engine cluster and deploy applications to it. When requirements change, the descriptor can be updated and Terraform will adjust the cloud infrastructure accordingly.
This example will start up a Kubernetes Engine cluster and deploy a simple sample application to it. By default, Kubernetes Engine clusters in GCP are provisioned with a pre-configured [Fluentd](https://www.fluentd.org/)-based collector that forwards logs to Stackdriver.
## Validation
If no errors are displayed during deployment, after a few minutes you should see your Kubernetes Engine cluster in the GCP Console with the sample application deployed.
In order to validate that resources are installed and working correctly, run:
```console
make validate
```### Using Stackdriver Kubernetes Monitoring
For a thorough guide on how to observe your cluster with the new Stackdriver Kubernetes UI, see [Observing Your Kubernetes Clusters](https://cloud.google.com/monitoring/kubernetes-engine/observing).
#### Native Prometheus integration
The Terraform code included a Stackdriver alerting policy that is watching a metric that was originally imported from a Prometheus endpoint.
From the Stackdriver main page, click on `Alerting` then `Policies Overview` to show all the policies, including the alerting policy called `Prometheus mem alloc`. Clicking on the policy will provide much more detail.## Teardown
When you are finished with this example, and you are ready to clean up the resources that were created so that you avoid accruing charges, you can run the following command to remove all resources :
```
$ make teardown
```This command uses the `terraform destroy` command to remove the infrastructure. Terraform tracks the resources it creates so it is able to tear them all back down.
## Troubleshooting
** The install script fails with a `Permission denied` when running Terraform.**
The credentials that Terraform is using do not provide the
necessary permissions to create resources in the selected projects. Ensure
that the account listed in `gcloud config list` has necessary permissions to
create resources. If it does, regenerate the application default credentials
using `gcloud auth application-default login`.** Metrics Not Appearing or Uptime Checks not executing **
After the scripts execute it may take a few minutes for the Metrics or Uptime Checks to appear. Configure the items and give the system some time to generate metrics and checks as they someimes take time to complete.## Relevant Material
* [Stackdriver Kubernetes Monitoring](https://cloud.google.com/monitoring/kubernetes-engine/)
* [Terraform Google Cloud Provider](https://www.terraform.io/docs/providers/google/index.html)**This is not an officially supported Google product**