{"id":13883439,"url":"https://github.com/airbnb/smartstack-cookbook","last_synced_at":"2025-12-17T16:21:07.555Z","repository":{"id":11357858,"uuid":"13790787","full_name":"airbnb/smartstack-cookbook","owner":"airbnb","description":"The chef recipes for running and testing Airbnb's SmartStack","archived":true,"fork":false,"pushed_at":"2020-07-28T21:34:48.000Z","size":47,"stargazers_count":245,"open_issues_count":10,"forks_count":45,"subscribers_count":30,"default_branch":"master","last_synced_at":"2024-09-17T03:26:32.451Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/airbnb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-10-23T01:40:28.000Z","updated_at":"2024-04-21T12:22:09.000Z","dependencies_parsed_at":"2022-09-13T08:23:08.597Z","dependency_job_id":null,"html_url":"https://github.com/airbnb/smartstack-cookbook","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airbnb%2Fsmartstack-cookbook","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airbnb%2Fsmartstack-cookbook/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airbnb%2Fsmartstack-cookbook/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airbnb%2Fsmartstack-cookbook/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/airbnb","download_url":"https://codeload.github.com/airbnb/smartstack-cookbook/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226159967,"owners_count":17582856,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-06T09:01:31.796Z","updated_at":"2025-12-17T16:21:02.531Z","avatar_url":"https://github.com/airbnb.png","language":"Ruby","funding_links":[],"categories":["Ruby","Capabilities"],"sub_categories":["Configuration \u0026 Discovery"],"readme":"# Description #\n\nThis cookbook configures Airbnb's SmartStack.\nSmartStack is our service registration, discovery and monitoring platform.\nIt allows you to quickly and reliably connect to other services that you need, and for others to connect to your service.\n\n# Getting started with this cookbook #\n\nThis cookbook contains everything you need to get SmartStack up and running, both in development and in production.\n\n## Production Use ##\n\n### Set up zookeeper ###\n\nIf you are ready to install SmartStack on your machines, you will first need to do a bit of prep.\nFirst, you will need [Zookeeper](https://cwiki.apache.org/confluence/display/ZOOKEEPER/ProjectDescription) running in your infrastructure.\nWe recommend using an [existing cookbook](https://github.com/SimpleFinance/chef-zookeeper).\nFor now, you can just set up a single machine, but for production use we recommend an [ensemble](http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup) of at least 3 nodes managed with [exhibitor](https://github.com/Netflix/exhibitor/wiki).\n\n### Configure chef ###\n\nIn your role, environment file, or infrastructure repo:\n\n* set `node.zookeeper.smartstack_cluster` to a list of the zookeeper machines you'll be using for smartstack.\n* create a services hash in `smartstack/attributes/services.rb` and `ports.rb` describing how you want your services configured. more information is [below](#configuring-smartstack)\n* enable the services you want:\n  * where the service is running, add it to `node.nerve.enabled_services`\n  * where it is being consumed, add it to `node.synapse.enabled_services`\n\nThat's all!\nSee the more extensive documentation below if you need additional help.\n\n## Dev and Testing ##\n\nThis cookbook is configured to be easy to run in dev using [vagrant](http://www.vagrantup.com/).\nTo get started:\n\n* Install [Virtualbox](https://www.virtualbox.org/wiki/Downloads); it's free!\n* Install [Vagrant](http://downloads.vagrantup.com/); this cookbook has been tested with v1.3.5\n* Install the [berkshelf](http://berkshelf.com/) plugin for vagrant: `vagrant plugin install vagrant-berkshelf`\n* Bring up SmartStack in a VM: `vagrant up`\n\nThis will bring up an Ubuntu VM configured with Zookeeper, SmartStack, and a few sample services.\nThe SmartStack integration tests will automatically run inside the Vagrant VM.\n\n# How SmartStack Works #\n\n## Synapse ##\n\n[Synapse](https://github.com/airbnb/synapse) is a service discovery platform.\nIt lets you reliably connect to an available worker for a given service.\nYou don't have to worry about discovery within your application, and you can easily do the same thing in dev as in prod.\n\n### How to use synapse ###\n\nUsing synapse to talk to a service is easy.\nJust specify that you would like to do so in your role file.\nYou'll need to add a `'synapse' =\u003e {'enabled_services' =\u003e ['desired_service']}` section to your `default_attributes` section:\n\n```ruby\nname 'myrole'\ndescription 'my role file'\n\ndefault_attributes({\n  'synapse' =\u003e { 'enabled_services' =\u003e [ 'service1', 'service2' ] }\n})\n\nrun_list(\n  'recipe[smartstack]',\n  'recipe[myrole]'\n)\n```\n \nOnce you've done this and reconverged your boxes, the service will be available to you on `localhost` at its synapse port.\nIf you are writing out a config file in chef and need to specify the port to use, just use `node.smartstack.services.desired_service.local_port` in your config.\nYou can manually look up your synapse port in `attributes/ports.rb` in this cookbook.\n\n### How synapse works ###\n\nFor every enabled service, synapse looks up a list of available servers which run the service in Zookeeper.\nIt then configures a local haproxy to forward requests for `localhost`:`synapse_port` to one of those backends (by default, in a round-robin fashion).\nWhenever the list of servers for the service changes in zookeeper, synapse reconfigures haproxy to reflect the latest information.\n\nIf synapse is not running, haproxy is still running, containing the latest set of servers.\nSo, even with synapse or zookeeper broken, the list of servers remains reasonably current unless there's massive change.\n\n### How to troubleshoot synapse ###\n\nThe immediate course of action is to visit the haproxy stats page.\nThis is accessible at `your.box:3212` -- just hit it in your web browser.\nThe stats page will show you all of your enabled services and the backends for those services.\nYou'll be able to see many per-service and per-backend stats, including the current status and insight into processed requests and how they are doing.\n\nYou can restart synapse via the usual way with runit: `sv restart synapse`.\nYou can also safely reload haproxy if you suspect issues there -- existing connections will be unaffected.\n\n## Nerve ##\n\n[Nerve](https://github.com/airbnb/nerve) is the registration component for synapse.\nIt takes care of creating entries for your services in Zookeeper.\nYour service will be published in zookeeper only when it passes the configured health checks.\nWhen your service stops passing health checks, it will be removed, and placed in maintenance mode in all of its synapse consumers.\n\n### Using Nerve #####\n\nUsing nerve is as simple as [using synapse](#using-synapse).\nYou just add a `'nerve' =\u003e {'enabled_services' =\u003e ['your_service']}` section to your `default_attributes` in your role file:\n\n```ruby\nname 'myservice'\ndescription 'sets up myservice'\n\ndefault_attributes({\n  'nerve' =\u003e { 'enabled_services' =\u003e [ 'myservice' ] }\n})\n\nrun_list(\n  'recipe[smartstack]',\n  'recipe[myservice]'\n)\n```\n\nHowever, you would normally do this if you are writing a role file for your service.\nThis probably means that you wrote the service as well.\nIn this case, you'll need to write the [nerve/synapse configuration](#configuring-smartstack) for the service.\nYou'll also want to make sure that your service has the correct endpoints for [health](#health-checks) and [connectivity](#connectivity-checks) checks.\n\nOnce nerve is configured to check your service on your boxes, it will start making health checks.\nYou can see the health checks being made in nerve's log, in `/etc/service/nerve/log`.\n\n### Configuring Smartstack ###\n\nSmartstack configuration lives in two files in this cookbook.\nThe first file is `attributes/ports.rb`.\nThis just contains a port reservation for your service.\n\nThe second, more important file, `attributes/services.rb`.\nLet's take a look at an example:\n\n```ruby\n  'ssspy' =\u003e {\n    'synapse' =\u003e {\n      'server_options' =\u003e 'check inter 30s downinter 2s fastinter 2s rise 3 fall 1',\n      'discovery' =\u003e { 'method' =\u003e 'zookeeper', },\n      'listen' =\u003e [\n        'mode http',\n        'option httpchk GET /ping',\n      ],\n    },\n    'nerve' =\u003e {\n      'port' =\u003e 3260,\n      'check_interval' =\u003e 2,\n      'checks' =\u003e [\n        { 'type' =\u003e 'http', 'uri' =\u003e '/health', 'timeout' =\u003e 0.5, 'rise' =\u003e 2, 'fall' =\u003e 1 },\n      ]\n    },\n  },\n```\n\nYou can see, there are several sections here.\nLet's start with the nerve config:\n\n```ruby\n    'nerve' =\u003e {\n      'port' =\u003e 3260,\n      'check_interval' =\u003e 2,\n      'checks' =\u003e [\n        { 'type' =\u003e 'http', 'uri' =\u003e '/health', 'timeout' =\u003e 0.5, 'rise' =\u003e 2, 'fall' =\u003e 1 },\n      ]\n    },\n```\n\nNerve here is configured to make its health checks on port 3260.\nThis means that `ssspy` is properly running on its own synapse port locally.\nThe checks happen every 2 seconds, and there's only one check -- an http check to the `/health` endpoint.\n\nThis is the most usual configuration.\nHowever, sometimes you might see multiple checks defined per service.\nFor instance, here is the config for `flog_thrift`:\n\n```ruby\n    'nerve' =\u003e {\n      'port' =\u003e 4567,\n      'check_interval' =\u003e 1,\n      'checks' =\u003e [\n        { 'type' =\u003e 'tcp', 'timeout' =\u003e 1, 'rise' =\u003e 5, 'fall' =\u003e 2 },\n        { 'type' =\u003e 'http', 'port' =\u003e 8422, 'uri' =\u003e '/health', 'timeout' =\u003e 1, 'rise' =\u003e 5, 'fall' =\u003e 2 },\n      ]\n    },\n```\n\nFor `flog_thift` to be up, it has to both be listening on its thrift port via TCP and also pass its http health check.\n\nLets look at ssspy's synapse config:\n\n```ruby\n    'synapse' =\u003e {\n      'server_options' =\u003e 'check inter 30s downinter 2s fastinter 2s rise 3 fall 1',\n      'discovery' =\u003e {\n        'method' =\u003e 'zookeeper',\n        'hosts' =\u003e []\n      },\n      'listen' =\u003e [\n        'mode http',\n        'option httpchk GET /ping',\n      ],\n    },\n```\n\nThe `server_options` directive tells haproxy to run checks on each backend with proper check intervals.\nYou can read more about the [haproxy check options](https://code.google.com/p/haproxy-docs/wiki/ServerOptions).\nThe `discovery` section tells us how synapse will find ssspy; in this case, via zookeeper.\n\nFinally, the `listen` section contains additional haproxy configuration.\nIt specifies how haproxy will conduct its own health checks.\nSSSPy is following convention by properly implemented a `/ping` endpoint for [connectivity checks](#connectivity-checks).\n\n### Health Checks ###\n\nNobody wants your service to recieve traffic when it's not actually functional.\nYour consumers do not want that, because they want their service calls to work.\nAnd you don't want that, because you also want your service to work.\n\nYou can make sure that a broken service instance won't recieve traffic by making your `/health` checks fail when your service is broken.\nSimply return a non-`200` status code.\nHere is an example from [optica](https://github.com/airbnb/optica), a simple Sinatra service:\n\n```ruby\n  get '/health' do\n    if settings.store.healthy?\n      content_type 'text/plain', :charset =\u003e 'utf-8'\n      return \"OK\"\n    else\n      halt(503)\n    end\n  end\n```\n\nThe `healthy?` function does [real work](https://github.com/airbnb/optica/blob/164ee747425eb823994345203fd40089751724f5/store.rb#L94) to make sure the service actually functions.\nOnly nerve will ever hit that endpoint, so you can and should feel free to make it take some time.\n\n### Connectivity Checks ###\n\nIf a particular backend for your service passes its [health checks](#health-checks), it might still be unavailable to consumers.\nOne example is a network partition -- synapse has discovered your service, but can't actually reach it.\nTo prevent such problems, we configure the haproxy on the consumer end to do connectivity checks when possible.\n\nWe do this by utilizing [haproxy's built-in checking mechanism](http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#5-check).\nTo destinguish between health checks made by nerve and connectivity checks made by haproxy on the synapse end, we define a `/ping` endpoint.\nThis endpoint should *always* return `200` with a conventional text body of `PONG`.\n\nBecause the number of machines making connectivity checks may be large, you should strive to make the `/ping` check as lightweight as possible.\n\n## Zookeeper and Smartstack ##\n\nSmartstack cannot function without [zookeeper](https://cwiki.apache.org/confluence/display/ZOOKEEPER/ProjectDescription).\nThis shared file-like store provides the correct semantics for ensuring that service information is correct and distributed across our infrastructure.\nWe use zookeeper because it provides the [ephemeral nodes](http://zookeeper.apache.org/doc/r3.2.1/zookeeperProgrammers.html#Ephemeral+Nodes) nerve uses to register services.\nIts distributed nature prevents it from becoming a scaling choke point or a single points of failure in our infrastructure.\n\n### Debugging Smartstack ###\n\nYou would like to use your service from another service, but something is not working.\nThese instructions will tell you how to debug the situation.\n\nFirst, on a consumer box (a box which has `the_service` in its `'synapse' =\u003e { 'enabled_services'`) go to port 3212 in your browser.\nYou'll see the haproxy stats page.\nThere should be a section for `the_service` containing the boxes providing `the_service`\n\nIf the section exists and contains some boxes, but they are all in red, those boxes are failing connectivity checks.\nYou should double-check your security group settings with SRE.\nIf the section is not there at all, or is missing some boxes, then there could be two reasons:\n1. the service is not properly discovered\n2. the service is not properly registered\n\nTo check if it's (1), check `synapse` on the consumer box.\n1. It should be running; check with `sv s synapse`\n2. Try restarting it with `sv restart synapse`\n3. Check the synapse logs in `/etc/service/synapse/log/current` for anything unusual\n\nIf it looks like synapse is working, then the problem is probably (2) -- no registration.\nTo debug, follow these steps:\n\n1. Check the service on one of its instances\n  * Is it running? Is it insta-crashing? watch `sv s the_service`\n2. If it's insta-crashing, figure out why\n  * Check `/etc/service/the_service/logs/current`\n  * Run it live; `sv down the_service; cd /etc/service/the_service; ./run`\n3. If it's running, is it passing health checks?\n  * `curl -D - localhost:32xx/health` and ensure you get a 200\n4. Is it passing health checks from a remote box?\n  * this happens if you accidentally only bind to `lo` in your service\n  * run the health check `curl` from another box\n5. Is nerve running?\n  * `sv s nerve`; if something is wrong with nerve, alert SRE\n\n\nYou can also smartstack by directly looking in zookeeper for registered services, and watching how that list changes over time.\nYou can do this via an exhibitor UI.\nAnother way is to use a zkCli client and connect directly to one of the machines in the cluster.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairbnb%2Fsmartstack-cookbook","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fairbnb%2Fsmartstack-cookbook","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairbnb%2Fsmartstack-cookbook/lists"}