{"id":13839295,"url":"https://github.com/Yelp/clusterman","last_synced_at":"2025-07-11T03:32:10.965Z","repository":{"id":38159697,"uuid":"221807101","full_name":"Yelp/clusterman","owner":"Yelp","description":"Cluster Autoscaler for Kubernetes and Mesos","archived":true,"fork":false,"pushed_at":"2024-01-04T15:17:42.000Z","size":175456,"stargazers_count":294,"open_issues_count":28,"forks_count":21,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-12T22:35:43.332Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Yelp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"code-of-conduct.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-11-14T23:49:29.000Z","updated_at":"2025-02-12T12:46:59.000Z","dependencies_parsed_at":"2023-02-18T00:15:36.192Z","dependency_job_id":"718f12d6-17d5-4e8d-8c68-54c0cb401e17","html_url":"https://github.com/Yelp/clusterman","commit_stats":null,"previous_names":[],"tags_count":195,"template":false,"template_full_name":null,"purl":"pkg:github/Yelp/clusterman","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yelp%2Fclusterman","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yelp%2Fclusterman/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yelp%2Fclusterman/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yelp%2Fclusterman/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Yelp","download_url":"https://codeload.github.com/Yelp/clusterman/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yelp%2Fclusterman/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264721346,"owners_count":23653923,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T17:00:18.098Z","updated_at":"2025-07-11T03:32:05.946Z","avatar_url":"https://github.com/Yelp.png","language":"Python","funding_links":[],"categories":["Cluster"],"sub_categories":[],"readme":"[![CI](https://github.com/Yelp/clusterman/actions/workflows/ci.yaml/badge.svg)](https://github.com/Yelp/clusterman/actions/workflows/ci.yaml)\n[![Documentation Status](https://readthedocs.org/projects/clusterman/badge/?version=latest)](https://clusterman.readthedocs.io/en/latest/?badge=latest)\n\n# Clusterman - Autoscale and Manage your compute clusters\n\n![Clusterman Logo](https://raw.githubusercontent.com/Yelp/clusterman/master/clusterman_logo.png)\n\nClusterman (the Cluster Manager) is an autoscaling engine for Mesos\nand Kubernetes clusters. It looks at metrics and can launch or terminate\ncompute to meet the needs of your workloads, similarly to the official\n[Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)\nIt provides the following set of features:\n\n* Customizable metrics: All metrics for Clusterman are stored in an\n  external datastore, and are automatically loaded into the signals\n  that need them\n* Pluggable autoscaling signals: Your team knows how the application\n  you're running should scale in response to metrics, so your team\n  should own the signal that tells Clusterman what to do\n* Full-featured simulation environment: Want to know how the autoscaler\n  is going to respond to production traffic before you deploy changes?\n  The Clusterman simulation environment lets you do this.  You can also\n  simulate future traffic so that you can predict usage or cost increase\n  before they happen.\n\nFor more information, see the [Clusterman documentation](https://clusterman.readthedocs.io/en/latest/)\n\n## Getting Started\n\nYou can try out Clusterman in a local development environment against\na Dockerized Mesos cluster by running the following commands:\n\n    make example\n    clusterman status --cluster local-dev -v\n\nAll of the Clusterman CLI commands should work in the above environment.\nYou can see examples of the Clusterman services by running\n\n    make itest-external\n\n## Components\n\n![Architecture Diagram](https://github.com/Yelp/clusterman/blob/master/images/architecture-diagram.png?raw=true)\n\nClusterman is made up of the following components:\n\n* Metrics Data Store: All relevant data used by scaling signals is written\n  to a single data store for a single source of truth about historical\n  cluster state.  At Yelp, we use AWS DynamoDB for this datastore.  Metrics are\n  written to the datastore via a separate metrics library.\n* Pluggable Signals: _Metrics_ (from the data store) are consumed by _signals_\n  (small bits of code that are used to produce resource requests.  Signals\n  run in separate processes configured by [supervisord](http://supervisord.org),\n  and use Unix sockets to communicate.\n* Core Autoscaler: The autoscaler logic consumes resource requests from the\n  signals and combines them to determine how many resources to request from or\n  release back to the cloud provider.\n* Resource Groups and Pools: Each autoscaler instance manages exactly one\n  \"pool\", that is, a logical grouping of machines in a cluster.  Pools consist\n  of \"resource groups\", such as a Spot Fleet Request (SFR) or AutoScaling Group\n  (ASG) from AWS EC2.\n* Configuration: Clusterman stores global configuration values in a file called\n  `clusterman.yaml`, and per-pool configuration in `clusterman-clusters/\u003ccluster-name\u003e/\u003cpool-name\u003e.(mesos|kubernetes)`.\n  These config files tell the Clusterman services when and how to run, and they\n  serve as the glue to hook up an autoscaler with its signals.  Configure the\n  path to `clusterman.yaml` with the `--env-config-path` flag, and the path to\n  `clusterman-clusters` with `--cluster-config-directory`.\n* An Autoscaling Simulation Environment: Clusterman comes with a complete\n  simulation environment for running tests with your signals on historical data\n  before they are deployed.  This environment can produce information about the\n  cost of your cluster, as well as whether it is over- or under-provisioned.\n\nClusterman has two main ways of interacting with the clusters it manages.  The\nClusterman CLI provides a set of command-line tools for viewing and managing\nthe state of the cluster; type `clusterman --help` to see a list of possible\nsubcommands.  See the Clusterman documentation for more details.\n\nThe Clusterman service runs as a set of three long-running processes; the first\nprocess collects data about spot instance pricing from AWS (not required if you\naren't using AWS, spot instances, or the Clusterman simulator); the second\nprocess queries each of the pools in a cluster to collect metadata and system\nmetrics about the pool; and the third process is responsible for actually\nautoscaling each of the pools.\n\n## Integrating Clusterman\n\nAt Yelp, we use [PaaSTA](https://github.com/Yelp/PaaSTA), our\nplatform-as-a-service, to manage Clusterman.  If you use PaaSTA, setting up\nClusterman should be relatively straightforward.  Otherwise you will need\nto provide additional tooling to deploy the Clusterman code or Docker image\nto your environment.\n\nIf you'd like to use Clusterman in your environment, you will need the\nfollowing components set up:\n\n* A metrics datastore with the appropriate tables.  See `examples/terraform/clusterman.tf`\n  for a Terraform representation of the schema in DynamoDB.\n* A `clusterman_metrics` library that can communicate with your chosen metrics\n  datastore.  There is a reference copy of the metrics library in `examples/clusterman_metrics`\n  that is capable of communicating with AWS DynamoDB.\n* Code to run the autoscaler service. At Yelp, we use an internal\n  batch library called `yelp_batch` for this task; however, the same goal\n  can be achieved by simply running the code in a never-terminating while\n  loop.  See the sample code in `examples/batch` for a place to start.\n* Configuration files.  Clusterman uses one \"master\" configuration file as well\n  as a configuration file per pool that it autoscales.  You can see examples of\n  these config files in `acceptance/srv-configs`, and the config file schema in\n  `examples/schemas`.\n\nTo build a Debian package for the Clusterman CLI, run `make package`.  To build\nan example Docker image which can run the Clusterman batch code, run `make cook-image-external`\n\nClusterman uses EC2 tags in order to find the resource groups that it manages.\nTo configure a resource group so that Clusterman can find it, you need to add a\ntag like the following to your ASG or SFR:\n\n    tag-name: \"{\\\"paasta_cluster\\\": \\\"cluster-name\\\", \\\"pool\\\": \\\"pool-name\\\"}\"\n\nYou can specify the value of `tag-name` in your configuration file for the pool:\n\n    resource_groups:\n      - (sfr|asg):\n        tag: tag-name\n\n## Design Goals\n\nClusterman is designed to support a wide range of cluster autoscaling needs at\nYelp.  We run many different types of workloads (long-running services, batch\njobs, machine learning tasks, databases, etc.) on top of Kubernetes and Mesos,\nand each of these workloads has a different set of scaling requirements.\nClusterman is designed to be a unified system that can accomodate each of these\nworkloads.  To that end, Clusterman's design goals are:\n\n* A modular design that separates cloud API calls from signal evaluation and\n  the core autoscaling loop\n* Unified autoscaling logic for a multi-tenant cluster\n* Client-owned scaling signals for requesting resources\n* A command-line interface for managing and interacting with clusters\n* A simulation environment for performing cost and behaviour analysis\n\n## Licence\n\nClusterman is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0\n\n## Contributing\n\nEveryone is encouraged to contribute to Clusterman by forking the\n[Github repository](http://github.com/Yelp/clusterman) and making a pull request or\nopening an issue.  Please read our [Code of Conduct](https://github.com/Yelp/clusterman/code-of-conduct.md).\n\n### Instructions for Yelp developers\n\n1) Make your changes, push a branch to GitHub, and create a pull request\n2) Once your PR is approved, merge your changes to master\n\nA Jenkins pipeline polls GitHub and brings any changes into our internal version. Jenkins will then build and deploy Clusterman as normal.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYelp%2Fclusterman","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYelp%2Fclusterman","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYelp%2Fclusterman/lists"}