https://github.com/open-guides/og-aws
π Amazon Web Services β a practical guide
https://github.com/open-guides/og-aws
Last synced: about 1 year ago
JSON representation
π Amazon Web Services β a practical guide
- Host: GitHub
- URL: https://github.com/open-guides/og-aws
- Owner: open-guides
- License: cc-by-4.0
- Created: 2016-07-13T17:30:16.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2024-08-16T02:51:54.000Z (almost 2 years ago)
- Last Synced: 2025-04-08T21:17:49.938Z (about 1 year ago)
- Language: Shell
- Homepage:
- Size: 7.32 MB
- Stars: 35,925
- Watchers: 1,207
- Forks: 3,892
- Open Issues: 159
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
- Authors: AUTHORS.md
Awesome Lists containing this project
- pure-awesomeness - OG AWS
- awesome-aws-research - Open Guide to AWS
- awesome - open-guides/og-aws
- awesome-awesome - Amazon Web Services β a practical guide
- awesome - og-aws - π Amazon Web Services β a practical guide (AWS)
- Algorithms-Cheatsheet-Resources - π Amazon Web Services β a practical guide
- Resources - Amazon Web Services β a practical guide
- awesome-list - og-aws - guides | 29922 | (Shell)
- awesome-sre - Open AWS guide
- awesome-cto - Open Guide to Amazon Web Services
- awesome-fullstack - Amazon Web Services β a practical guide
- awesome-fullstack - Amazon Web Services β a practical guide
- jimsghstars - open-guides/og-aws - π Amazon Web Services β a practical guide (Shell)
- fucking-awesome-aws - Open Guide to AWS :fire::fire::fire::fire::fire:
- awesome-learning-collections - AWS Open Guide - A practical guide to AWS. (Web Development)
- awesome-aws - Open Guide to AWS :fire::fire::fire::fire::fire:
- awesome-cloudops - Security and IAM Open Guide for AWS
- awesome-systools - open-guides/og-aws: π Amazon Web Services β a practical guide
- starred-awesome - og-aws - π Amazon Web Services β a practical guide (Shell)
- bookmarks - open-guides/og-aws - π Amazon Web Services β a practical guide (Code signing ### / Chess :chess_pawn:)
- awesome-iam - Open guide to AWS Security and IAM
- my-awesome-learning-resources - AWS β Open Guide π
- awesome-aws - Open Guide to AWS :fire::fire::fire::fire::fire:
- awesome-aws - Open Guide to AWS :fire::fire::fire::fire::fire:
- stars - open-guides/og-aws - π Amazon Web Services β a practical guide \[*Creative Commons Attribution 4.0 International*\] (βοΈ36413) (Shell)
- awesome-aws - Open Guide to AWS :fire::fire::fire::fire::fire:
- awesome-discoveries - og-aws - a practical guide to Amazon Web Services _(`Shell`)_ (Amazon Web Services)
- YAACL - The Open Guide to Amazon Web Services
- awesome-billing - Open guide to AWS - Links to the *Billing and Cost Management* section which details the broad characteristics of billing for a cloud provider. (Basics)
- stars - open-guides/og-aws - π Amazon Web Services β a practical guide (Shell)
README

The Open Guide to Amazon Web Services
=====================================
[](http://slackhatesthe.cloud) β¦ Join us!
[Credits](AUTHORS.md) β [Contributing guidelines](CONTRIBUTING.md)
Table of Contents
-----------------
**Purpose**
- [Why an Open Guide?](#why-an-open-guide)
- [Scope](#scope)
- [Legend](#legend)
**AWS in General**
- [General Information](#general-information)
- [Learning and Career Development](#learning-and-career-development)
- [Managing AWS](#managing-aws)
- [Managing Servers and Applications](#managing-servers-and-applications)
| Specific AWS Services | Basics | Tips | Gotchas |
|---------------------------------------|--------------------------------|-------------------------------|------------------------------------------------|
| [ALB](#alb) | [π](#alb-basics) | [π](#alb-tips) | [π](#alb-gotchas-and-limitations) |
| [AMIs](#amis) | [π](#ami-basics) | [π](#ami-tips) | [π](#ami-gotchas-and-limitations) |
| [API Gateway](#api-gateway) | [π](#api-gateway-basics) | [π](#api-gateway-tips) | [π](#api-gateway-gotchas-and-limitations) |
| [Auto Scaling](#auto-scaling) | [π](#auto-scaling-basics) | [π](#auto-scaling-tips) | [π](#auto-scaling-gotchas-and-limitations) |
| [Batch](#batch) | [π](#batch-basics) | [π](#batch-tips) |
| [Certificate Manager](#certificate-manager) | [π](#certificate-manager-basics) | [π](#certificate-manager-tips) | [π](#certificate-manager-gotchas-and-limitations) |
| [CLB (ELB)](#clb) | [π](#clb-basics) | [π](#clb-tips) | [π](#clb-gotchas-and-limitations) |
| [CloudFront](#cloudfront) | [π](#cloudfront-basics) | [π](#cloudfront-tips) | [π](#cloudfront-gotchas-and-limitations) |
| [CloudFormation](#cloudformation) | [π](#cloudformation-basics) | [π](#cloudformation-tips) | [π](#cloudformation-gotchas-and-limitations) |
| [CloudWatch](#cloudwatch) | [π](#cloudwatch-basics) | [π](#cloudwatch-tips) | [π](#cloudwatch-gotchas-and-limitations) |
| [Device Farm](#device-farm) | [π](#device-farm-basics) | [π](#device-farm-tips) | [π](#device-farm-gotchas-and-limitations) |
| [DirectConnect](#directconnect) | [π](#directconnect-basics) | [π](#directconnect-tips) | |
| [DynamoDB](#dynamodb) | [π](#dynamodb-basics) | [π](#dynamodb-tips) | [π](#dynamodb-gotchas-and-limitations) |
| [EBS](#ebs) | [π](#ebs-basics) | [π](#ebs-tips) | [π](#ebs-gotchas-and-limitations) |
| [EC2](#ec2) | [π](#ec2-basics) | [π](#ec2-tips) | [π](#ec2-gotchas-and-limitations) |
| [ECS](#ecs) | [π](#ecs-basics) | [π](#ecs-tips) | |
| [EKS](#eks) | [π](#eks-basics) | [π](#eks-tips) | [π](#eks-gotchas-and-limitations) |
| [EFS](#efs) | [π](#efs-basics) | [π](#efs-tips) | [π](#efs-gotchas-and-limitations) |
| [Elastic Beanstalk](#elastic-beanstalk) | [π](#elastic-beanstalk-basics) | [π](#elastic-beanstalk-tips) | [π](#elastic-beanstalk-gotchas-and-limitations) |
| [Elastic IPs](#elastic-ips) | [π](#elastic-ip-basics) | [π](#elastic-ip-tips) | [π](#elastic-ip-gotchas-and-limitations) |
| [ElastiCache](#elasticache) | [π](#elasticache-basics) | [π](#elasticache-tips) | [π](#elasticache-gotchas-and-limitations) |
| [EMR](#emr) | [π](#emr-basics) | [π](#emr-tips) | [π](#emr-gotchas-and-limitations) |
| [Fargate](#fargate) | [π](#fargate-basics) | [π](#fargate-tips) | [π](#fargate-gotchas-and-limitations) |
| [Glacier](#glacier) | [π](#glacier-basics) | [π](#glacier-tips) | [π](#glacier-gotchas-and-limitations) |
| [IoT](#iot) | [π](#iot-basics) | [π](#iot-tips) | [π](#iot-gotchas-and-limitations) |
| [Kinesis Firehose](#kinesis-firehose) | | | [π](#kinesis-firehose-gotchas-and-limitations) |
| [Kinesis Streams](#kinesis-streams) | [π](#kinesis-streams-basics) | [π](#kinesis-streams-tips) | [π](#kinesis-streams-gotchas-and-limitations) |
| [KMS](#kms) | [π](#kms-basics) | [π](#kms-tips) | [π](#kms-gotchas-and-limitations) |
| [Lambda](#lambda) | [π](#lambda-basics) | [π](#lambda-tips) | [π](#lambda-gotchas-and-limitations) |
| [Load Balancers](#load-balancers) | [π](#load-balancer-basics) | [π](#load-balancer-tips) | [π](#load-balancer-gotchas-and-limitations) |
| [Mobile Hub](#mobile-hub) | [π](#mobile-hub-basics) | [π](#mobile-hub-tips) | [π](#mobile-hub-gotchas-and-limitations) |
| [OpsWorks](#opsworks) | [π](#opsworks-basics) | [π](#opsworks-tips) | [π](#opsworks-gotchas-and-limitations) |
| [Quicksight](#quicksight) | [π](#quicksight-basics) | | [π](#quicksight-gotchas-and-limitations) |
| [RDS](#rds) | [π](#rds-basics) | [π](#rds-tips) | [π](#rds-gotchas-and-limitations) |
| [RDS Aurora](#rds-aurora) | [π](#rds-aurora-basics) | [π](#rds-aurora-tips) | [π](#rds-aurora-gotchas-and-limitations) |
| [RDS Aurora MySQL](#rds-aurora-mysql) | [π](#rds-aurora-mysql-basics) | [π](#rds-aurora-mysql-tips) | [π](#rds-aurora-mysql-gotchas-and-limitations) |
| [RDS Aurora PostgreSQL](#rds-aurora-postgresql) | [π](#rds-aurora-postgresql-basics) | [π](#rds-aurora-postgresql-tips) | [π](#rds-aurora-postgresql-gotchas-and-limitations) |
| [RDS MySQL and MariaDB](#rds-mysql-and-mariadb) | [π](#rds-mysql-and-mariadb-basics) | [π](#rds-mysql-and-mariadb-tips) | [π](#rds-mysql-and-mariadb-gotchas-and-limitations) |
| [RDS PostgreSQL](#rds-postgresql) | [π](#rds-postgresql-basics) | [π](#rds-postgresql-tips) | [π](#rds-postgresql-gotchas-and-limitations) |
| [RDS SQL Server](#rds-sql-server) | [π](#rds-sql-server-basics) | [π](#rds-sql-server-tips) | [π](#rds-sql-server-gotchas-and-limitations) |
| [Redshift](#redshift) | [π](#redshift-basics) | [π](#redshift-tips) | [π](#redshift-gotchas-and-limitations) |
| [Route 53](#route-53) | [π](#route-53-basics) | [π](#route-53-tips) | [π](#route-53-gotchas-and-limitations) |
| [S3](#s3) | [π](#s3-basics) | [π](#s3-tips) | [π](#s3-gotchas-and-limitations) |
| [Security and IAM](#security-and-iam) | [π](#security-and-iam-basics) | [π](#security-and-iam-tips) | [π](#security-and-iam-gotchas-and-limitations) |
| [SES](#ses) | [π](#ses-basics) | [π](#ses-tips) | [π](#ses-gotchas-and-limitations) |
| [SNS](#sns) | [π](#sns-basics) | [π](#sns-tips) | [π](#sns-gotchas-and-limitations) |
| [SQS](#sqs) | [π](#sqs-basics) | [π](#sqs-tips) | [π](#sqs-gotchas-and-limitations) |
| [Step Functions](#step-functions) | [π](#step-functions-basics) | [π](#step-functions-tips) | [π](#step-functions-gotchas-and-limitations) |
| [WAF](#waf) | [π](#waf-basics) | [π](#waf-tips) | [π](#waf-gotchas-and-limitations) |
| [VPCs, Network Security, and Security Groups](#vpcs-network-security-and-security-groups) | [π](#vpc-basics) | [π](#vpc-and-network-security-tips) | [π](#vpc-and-network-security-gotchas-and-limitations) |
**Special Topics**
- [High Availability](#high-availability)
- [Billing and Cost Management](#billing-and-cost-management)
- [Further Reading](#further-reading)
**Legal**
- [Disclaimer](#disclaimer)
- [License](#license)
**Figures and Tables**
[](#tools-and-services-market-landscape) [](#aws-data-transfer-costs)
- [Figure: Tools and Services Market Landscape](#tools-and-services-market-landscape): A selection of third-party companies/products
- [Figure: AWS Data Transfer Costs](#aws-data-transfer-costs): Visual overview of data transfer costs
- [Table: Service Matrix](#service-matrix): How AWS services compare to alternatives
- [Table: AWS Product Maturity and Releases](#aws-product-maturity-and-releases): AWS product releases
- [Table: Storage Durability, Availability, and Price](#storage-durability-availability-and-price): A quantitative comparison
Why an Open Guide?
------------------
A lot of information on AWS is already written. Most people learn AWS by reading a blog or a β[getting started guide](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html)β and referring to the [standard AWS references](https://aws.amazon.com/documentation/). Nonetheless, trustworthy and practical information and recommendations arenβt easy to come by. AWSβs own documentation is a great but sprawling resource few have time to read fully, and it doesnβt include anything but official facts, so omits experiences of engineers. The information in blogs or [Stack Overflow](http://stackoverflow.com/questions/tagged/amazon-web-services) is also not consistently up to date.
This guide is by and for engineers who use AWS. It aims to be a useful, living reference that consolidates links, tips, gotchas, and best practices. It arose from discussion and editing over beers by [several engineers](AUTHORS.md) who have used AWS extensively.
Before using the guide, please read the [**license**](#license) and [**disclaimer**](#disclaimer).
[Back to top :arrow_up:](#table-of-contents)
### Please help!
**This is an early in-progress draft!** Itβs our first attempt at assembling this information, so is far from comprehensive still, and likely to have omissions or errors.
[](http://slackhatesthe.cloud)
Please help by [**joining the Slack channel**](http://slackhatesthe.cloud) (we like to talk about AWS in general, even if you only have questions β discussion helps the community and guides improvements) and [**contributing to the guide**](CONTRIBUTING.md). This guide is *open to contributions*, so unlike a blog, it can keep improving. Like any open source effort, we combine efforts but also review to ensure high quality.
Scope
-----
- Currently, this guide covers selected βcoreβ services, such as EC2, S3, Load Balancers, EBS, and IAM, and partial details and tips around other services. We expect it to expand.
- It is not a tutorial, but rather a collection of information you can read and return to. It is for both beginners and the experienced.
- The goal of this guide is to be:
- **Brief:** Keep it dense and use links
- **Practical:** Basic facts, concrete details, advice, gotchas, and other βfolk knowledgeβ
- **Current:** We can keep updating it, and anyone can contribute improvements
- **Thoughtful:** The goal is to be helpful rather than present dry facts. Thoughtful opinion with rationale is welcome. Suggestions, notes, and opinions based on real experience can be extremely valuable. (We believe this is both possible with a guide of this format, unlike in some [other venues](http://meta.stackexchange.com/questions/201994/is-there-a-place-to-ask-opinion-based-questions).)
- This guide is not sponsored by AWS or AWS-affiliated vendors. It is written by and for engineers who use AWS.
Legend
------
- π Marks standard/official AWS pages and docs
- πΉ Important or often overlooked tip
- β βSeriousβ gotcha (used where risks or time or resource costs are significant: critical security risks, mistakes with significant financial cost, or poor architectural choices that are fundamentally difficult to correct)
- πΈ βRegularβ gotcha, limitation, or quirk (used where consequences are things not working, breaking, or not scaling gracefully)
- π Undocumented feature (folklore)
- π₯ Relatively new (and perhaps immature) services or features
- β± Performance discussions
- β Lock-in: Products or decisions that are likely to tie you to AWS in a new or significant way β that is, later moving to a non-AWS alternative would be costly in terms of engineering effort
- πͺ Alternative non-AWS options
- πΈ Cost issues, discussion, and gotchas
- π A mild warning attached to βfull solutionβ or opinionated frameworks that may take significant time to understand and/or might not fit your needs exactly; the opposite of a point solution (the cathedral is a nod to [Raymondβs metaphor](https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar)\)
- πππ Colors indicate basics, tips, and gotchas, respectively.
- π§ Areas where correction or improvement are needed (possibly with link to an issue β do help!)
General Information
-------------------
### When to Use AWS
- [AWS](https://en.wikipedia.org/wiki/Amazon_Web_Services) is the dominant public cloud computing provider.
- In general, β[cloud computing](https://en.wikipedia.org/wiki/Cloud_computing)β can refer to one of three types of cloud: βpublic,β βprivate,β and βhybrid.β AWS is a public cloud provider, since anyone can use it. Private clouds are within a single (usually large) organization. Many companies use a hybrid of private and public clouds.
- The core features of AWS are [infrastructure-as-a-service](https://en.wikipedia.org/wiki/Cloud_computing#Infrastructure_as_a_service_.28IaaS.29) (IaaS) β that is, virtual machines and supporting infrastructure. Other cloud service models include [platform-as-a-service](https://en.wikipedia.org/wiki/Cloud_computing#Platform_as_a_service_.28PaaS.29) (PaaS), which typically are more fully managed services that deploy customersβ applications, or [software-as-a-service](https://en.wikipedia.org/wiki/Cloud_computing#Software_as_a_service_.28SaaS.29) (SaaS), which are cloud-based applications. AWS does offer a few products that fit into these other models, too.
- In business terms, with infrastructure-as-a-service you have a variable cost model β it is [OpEx, not CapEx](http://www.investopedia.com/ask/answers/020915/what-difference-between-capex-and-opex.asp) (though some [pre-purchased contracts](https://aws.amazon.com/ec2/purchasing-options/reserved-instances/) are still CapEx).
- AWSβs TTM revenue was [**$37.549 billion**](https://ir.aboutamazon.com/news-release/news-release-details/2020/Amazoncom-Announces-First-Quarter/default.aspx) as of Q1 2020 according to their earnings results (slide 14 in the linked deck), or roughly **14%** of Amazon.comβs total revenue (slide 11 in the same deck) for the same TTM period.
- **Main reasons to use AWS:**
- If your company is building systems or products that may need to scale
- and you have technical know-how
- and you want the most flexible tools
- and youβre not significantly tied into different infrastructure already
- and you donβt have internal, regulatory, or compliance reasons you canβt use a public cloud-based solution
- and youβre not on a Microsoft-first tech stack
- and you donβt have a specific reason to use Google Cloud
- and you can afford, manage, or negotiate its somewhat higher costs
- ... then AWS is likely a good option for your company.
- Each of those reasons above might point to situations where other services are preferable. In practice, many, if not most, tech startups as well as a number of modern large companies can or already do benefit from using AWS. Many large enterprises are partly migrating internal infrastructure to Azure, Google Cloud, and AWS.
- **Costs:** Billing and cost management are such big topics that we have [an entire section on this](#billing-and-cost-management).
- πΉ**EC2 vs. other services:** Most users of AWS are most familiar with [EC2](#ec2), AWSβ flagship virtual server product, and possibly a few others like S3 and CLBs. But AWS products now extend far beyond basic IaaS, and often companies do not properly understand or appreciate all the many AWS services and how they can be applied, due to the [sharply growing](#which-services-to-use) number of services, their novelty and complexity, branding confusion, and fear of βlock-in to proprietary AWS technology. Although a bit daunting, itβs important for technical decision-makers in companies to understand the breadth of the AWS services and make informed decisions. (We hope this guide will help.)
- πͺ**AWS vs. other cloud providers:** While AWS is the dominant IaaS provider (31% market share in [this 2016 estimate](https://www.srgresearch.com/articles/aws-remains-dominant-despite-microsoft-and-google-growth-surges)), there is significant competition and alternatives that are better suited to some companies. [This Gartner report](https://www.gartner.com/doc/reprints?id=1-2G2O5FC&ct=150519&st=sb) has a good overview of the major cloud players :
- [**Google Cloud Platform**](https://cloud.google.com/). GCP arrived later to market than AWS, but has vast resources and is now used widely by many companies, including a few large ones. It is gaining market share. Not all AWS services have similar or analogous services in GCP. And vice versa: In particular, GCP offers some more advanced machine learning-based services like the [Vision](https://cloud.google.com/vision/), [Speech](https://cloud.google.com/speech/), and [Natural Language](https://cloud.google.com/natural-language/) APIs. Itβs not common to switch once youβre up and running, but it does happen: [Spotify migrated](http://www.wsj.com/articles/google-cloud-lures-amazon-web-services-customer-spotify-1456270951) from AWS to Google Cloud. There is more discussion [on Quora](https://www.quora.com/What-are-the-reasons-to-choose-AWS-over-Google-Cloud-or-vice-versa-for-a-high-traffic-web-application) about relative benefits. Of particular note is that VPCs in GCP are [global by default](https://cloud.google.com/vpc/) with subnetworks per region, while AWSβ VPCs have to live within a particular region. This gives GCP an edge if youβre designing applications with geo-replication from the beginning. Itβs also possible to [share one GCP VPC](https://cloud.google.com/compute/docs/shared-vpc/) between multiple projects (roughly analogous to AWS accounts), while in AWS youβd have to peer them. Itβs also possible to [peer GCP VPCs](https://cloud.google.com/compute/docs/vpc/vpc-peering) in a similar manner to how itβs done in AWS.
- [**Microsoft Azure**](https://azure.microsoft.com/en) is the de facto choice for companies and teams that are focused on a Microsoft stack, and it has now placed significant emphasis on Linux as well
- In **China**, AWSβ footprint is relatively small. The market is dominated by Alibabaβs [Alibaba Cloud](https://www.alibabacloud.com/), formerly called [Aliyun](https://intl.aliyun.com/).
- Companies at (very) large scale may want to reduce costs by managing their own infrastructure. For example, [Dropbox migrated](https://news.ycombinator.com/item?id=11282948) to their own infrastructure.
- Other cloud providers such as [Digital Ocean](https://www.digitalocean.com/) offer similar services, sometimes with greater ease of use, more personalized support, or lower cost. However, none of these match the breadth of products, mind-share, and market domination AWS now enjoys.
- Traditional managed hosting providers such as [Rackspace](https://www.rackspace.com/) offer cloud solutions as well.
- πͺ**AWS vs. PaaS:** If your goal is just to put up a single service that does something relatively simple, and youβre trying to minimize time managing operations engineering, consider a [platform-as-a-service](https://en.wikipedia.org/wiki/Platform_as_a_service) such as [Heroku](https://www.heroku.com/). The AWS approach to PaaS, Elastic Beanstalk, is arguably more complex, especially for simple use cases.
- πͺ**AWS vs. web hosting:** If your main goal is to host a website or blog, and you donβt expect to be building an app or more complex service, you may wish consider one of the myriad [web hosting services](https://www.google.com/search?q=web+hosting).
- πͺ**AWS vs. managed hosting:** Traditionally, many companies pay [managed hosting](https://en.wikipedia.org/wiki/Dedicated_hosting_service) providers to maintain physical servers for them, then build and deploy their software on top of the rented hardware. This makes sense for businesses who want direct control over hardware, due to legacy, performance, or special compliance constraints, but is usually considered old fashioned or unnecessary by many developer-centric startups and younger tech companies.
- **Complexity:** AWS will let you build and scale systems to the size of the largest companies, but the complexity of the services when used at scale requires significant depth of knowledge and experience. Even very simple use cases often require more knowledge to do βrightβ in AWS than in a simpler environment like Heroku or Digital Ocean. (This guide may help!)
- **Geographic locations:** AWS has data centers in [over a dozen geographic locations](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions), known as **regions**, in Europe, East Asia, North and South America, and now Australia and India. It also has many more **edge locations** globally for reduced latency of services like CloudFront.
- See the [current list](https://aws.amazon.com/about-aws/global-infrastructure/) of regions and edge locations, including upcoming ones.
- If your infrastructure needs to be in close physical proximity to another service for latency or throughput reasons (for example, latency to an ad exchange), viability of AWS may depend on the location.
- β**Lock-in:** As you use AWS, itβs important to be aware when you are depending on AWS services that do not have equivalents elsewhere.
- Lock-in may be completely fine for your company, or a significant risk. Itβs important from a business perspective to make this choice explicitly, and consider the cost, operational, business continuity, and competitive risks of being tied to AWS. AWS is such a dominant and reliable vendor, many companies are comfortable with using AWS to its full extent. Others can tell stories about the [dangers of βcloud jailβ when costs spiral](http://firstround.com/review/the-three-infrastructure-mistakes-your-company-must-not-make/).
- Generally, the more AWS services you use, the more lock-in you have to AWS β that is, the more engineering resources (time and money) it will take to change to other providers in the future.
- Basic services like virtual servers and standard databases are usually easy to migrate to other providers or on premises. Others like load balancers and IAM are specific to AWS but have close equivalents from other providers. The key thing to consider is whether engineers are architecting systems around specific AWS services that are not open source or relatively interchangeable. For example, Lambda, API Gateway, Kinesis, Redshift, and DynamoDB do not have substantially equivalent open source or commercial service equivalents, while EC2, RDS (MySQL or Postgres), EMR, and ElastiCache more or less do. (See more [below](#which-services-to-use), where these are noted with β.)
- **Combining AWS and other cloud providers:** Many customers combine AWS with other non-AWS services. For example, legacy systems or secure data might be in a managed hosting provider, while other systems are AWS. Or a company might only use S3 with another provider doing everything else. However small startups or projects starting fresh will typically stick to AWS or Google Cloud only.
- **Hybrid cloud:** In larger enterprises, it is common to have [hybrid deployments](https://aws.amazon.com/enterprise/hybrid/) encompassing private cloud or on-premises servers and AWS β or other enterprise cloud providers like [IBM](https://www.ibm.com/it-infrastructure/solutions/hybrid-cloud)/[Bluemix](http://www.ibm.com/cloud-computing/bluemix/hybrid/), [Microsoft](https://www.microsoft.com/en-us/cloud-platform/hybrid-cloud)/[Azure](https://azure.microsoft.com/en-us/overview/azure-stack/), [NetApp](http://www.netapp.com/us/solutions/cloud/hybrid-cloud/), or [EMC](http://www.emc.com/en-us/cloud/hybrid-cloud-computing/index.htm).
- **Major customers:** Who uses AWS and Google Cloud?
- AWSβs [list of customers](https://aws.amazon.com/solutions/case-studies/) includes large numbers of mainstream online properties and major brands, such as Netflix, Pinterest, Spotify (moving to Google Cloud), Airbnb, Expedia, Yelp, Zynga, Comcast, Nokia, and Bristol-Myers Squibb.
- Azureβs [list of customers](https://azure.microsoft.com/en-us/case-studies/) includes companies such as NBC Universal, 3M and Honeywell Inc.
- Google Cloudβs [list of customers](https://cloud.google.com/customers/) is large as well, and includes a few mainstream sites, such as [Snapchat](http://www.businessinsider.com/snapchat-is-built-on-googles-cloud-2014-1), Best Buy, Dominoβs, and Sony Music.
[Back to top :arrow_up:](#table-of-contents)
### Which Services to Use
- AWS offers a *lot* of different services β [about a hundred](https://aws.amazon.com/products/) at last count.
- Most customers use a few services heavily, a few services lightly, and the rest not at all. What services youβll use depends on your use cases. Choices differ substantially from company to company.
- **Immature and unpopular services:** Just because AWS has a service that sounds promising, it doesnβt mean you should use it. Some services are very narrow in use case, not mature, are overly opinionated, or have limitations, so building your own solution may be better. We try to give a sense for this by breaking products into categories.
- **Must-know infrastructure:** Most typical small to medium-size users will focus on the following services first. If you manage use of AWS systems, you likely need to know at least a little about all of these. (Even if you donβt use them, you should learn enough to make that choice intelligently.)
- [IAM](#security-and-iam): User accounts and identities (you need to think about accounts early on!)
- [EC2](#ec2): Virtual servers and associated components, including:
- [AMIs](#amis): Machine Images
- [Load Balancers](#load-balancers): CLBs and ALBs
- [Autoscaling](#auto-scaling): Capacity scaling (adding and removing servers based on load)
- [EBS](#ebs): Network-attached disks
- [Elastic IPs](#elastic-ips): Assigned IP addresses
- [S3](#s3): Storage of files
- [Route 53](#route-53): DNS and domain registration
- [VPC](#vpcs-network-security-and-security-groups): Virtual networking, network security, and co-location; you automatically use
- [CloudFront](#cloudfront): CDN for hosting content
- [CloudWatch](#cloudwatch): Alerts, paging, monitoring
- **Managed services:** Existing software solutions you could run on your own, but with managed deployment:
- [RDS](#rds): Managed relational databases (managed MySQL, Postgres, and Amazonβs own Aurora database)
- [EMR](#emr): Managed Hadoop
- [Elasticsearch](https://aws.amazon.com/elasticsearch-service/): Managed Elasticsearch
- [ElastiCache](https://aws.amazon.com/elasticache/): Managed Redis and Memcached
- **Optional but important infrastructure:** These are key and useful infrastructure components that are less widely known and used. You may have legitimate reasons to prefer alternatives, so evaluate with care to be sure they fit your needs:
- β[Lambda](#lambda): Running small, fully managed tasks βserverlessβ
- [CloudTrail](https://aws.amazon.com/cloudtrail/): AWS API logging and audit (often neglected but important)
- βπ[CloudFormation](#cloudformation): Templatized configuration of collections of AWS resources
- π[Elastic Beanstalk](https://aws.amazon.com/elasticbeanstalk/): Fully managed (PaaS) deployment of packaged Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker applications
- π₯[EFS](#efs): Network filesystem compatible with NFSv4.1
- βπ[ECS](#ecs): Docker container/cluster management (note Docker can also be used directly, without ECS)
- π [EKS](#eks): Kubernetes (K8) Docker Container/Cluster management
- β[ECR](https://aws.amazon.com/ecr/): Hosted private Docker registry
- π₯[Config](https://aws.amazon.com/config/): AWS configuration inventory, history, change notifications
- π₯[X-Ray](https://aws.amazon.com/xray/): Trace analysis and debugging for distributed applications such as microservices.
- **Special-purpose infrastructure:** These services are focused on specific use cases and should be evaluated if they apply to your situation. Many also are proprietary architectures, so tend to tie you to AWS.
- β[DynamoDB](#dynamodb): Low-latency NoSQL key-value store
- β[Glacier](#glacier): Slow and cheap alternative to S3
- β[Kinesis](https://aws.amazon.com/kinesis/): Streaming (distributed log) service
- β[SQS](https://aws.amazon.com/sqs/): Message queueing service
- β[Redshift](#redshift): Data warehouse
- π₯[QuickSight](https://aws.amazon.com/quicksight/): Business intelligence service
- [SES](https://aws.amazon.com/ses/): Send and receive e-mail for marketing or transactions
- β[API Gateway](https://aws.amazon.com/api-gateway/): Proxy, manage, and secure API calls
- β[IoT](#iot): Manage bidirectional communication over HTTP, WebSockets, and MQTT between AWS and clients (often but not necessarily βthingsβ like appliances or sensors)
- β[WAF](https://aws.amazon.com/waf/): Web firewall for CloudFront to deflect attacks
- β[KMS](#kms): Store and manage encryption keys securely
- [Inspector](https://aws.amazon.com/inspector/): Security audit
- [Trusted Advisor](https://aws.amazon.com/premiumsupport/trustedadvisor/): Automated tips on reducing cost or making improvements
- π₯[Certificate Manager](https://aws.amazon.com/certificate-manager/): Manage SSL/TLS certificates for AWS services
- π₯β[Fargate](https://aws.amazon.com/fargate/): Docker containers management, backend for ECS and EKS
- **Compound services:** These are similarly specific, but are full-blown services that tackle complex problems and may tie you in. Usefulness depends on your requirements. If you have large or significant need, you may have these already managed by in-house systems and engineering teams.
- [Machine Learning](https://aws.amazon.com/machine-learning/): Machine learning model training and classification
- [Lex](https://aws.amazon.com/lex/): Automatic speech recognition (ASR) and natural language understanding (NLU)
- [Polly](https://aws.amazon.com/polly/): Text-to-speech engine in the cloud
- [Rekognition](https://aws.amazon.com/rekognition/): Service for image recognition
- βπ[Data Pipeline](https://aws.amazon.com/datapipeline/): Managed ETL service
- βπ[SWF](https://aws.amazon.com/swf/): Managed state tracker for distributed polyglot job workflow
- βπ[Lumberyard](https://aws.amazon.com/lumberyard/): 3D game engine
- **Mobile/app development:**
- [SNS](https://aws.amazon.com/sns/): Manage app push notifications and other end-user notifications
- βπ[Cognito](https://aws.amazon.com/cognito/): User authentication via Facebook, Twitter, etc.
- [Device Farm](https://aws.amazon.com/device-farm/): Cloud-based device testing
- [Mobile Analytics](https://aws.amazon.com/mobileanalytics/): Analytics solution for app usage
- π[Mobile Hub](https://aws.amazon.com/mobile/): Comprehensive, managed mobile app framework
- **Enterprise services:** These are relevant if you have significant corporate cloud-based or hybrid needs. Many smaller companies and startups use other solutions, like Google Apps or Box. Larger companies may also have their own non-AWS IT solutions.
- [AppStream](https://aws.amazon.com/appstream/): Windows apps in the cloud, with access from many devices
- [Workspaces](https://aws.amazon.com/workspaces/): Windows desktop in the cloud, with access from many devices
- [WorkDocs](https://aws.amazon.com/workdocs/) (formerly Zocalo): Enterprise document sharing
- [WorkMail](https://aws.amazon.com/workmail/): Enterprise managed e-mail and calendaring service
- [Directory Service](https://aws.amazon.com/directoryservice/): Microsoft Active Directory in the cloud
- [Direct Connect](https://aws.amazon.com/directconnect/): Dedicated network connection between office or data center and AWS
- [Storage Gateway](https://aws.amazon.com/storagegateway/): Bridge between on-premises IT and cloud storage
- [Service Catalog](https://aws.amazon.com/servicecatalog/): IT service approval and compliance
- **Probably-don't-need-to-know services:** Bottom line, our informal polling indicates these services are just not broadly used β and often for good reasons:
- [Snowball](https://aws.amazon.com/importexport/): If you want to ship petabytes of data into or out of Amazon using a physical appliance, read on.
- [Snowmobile](https://aws.amazon.com/snowmobile/): Appliances are great, but if you've got exabyte scale data to get into Amazon, nothing beats a tractor trailer full of drives.
- [CodeCommit](https://aws.amazon.com/codecommit/): Git service. Youβre probably already using GitHub or your own solution ([Stackshare](http://stackshare.io/stackups/github-vs-bitbucket-vs-aws-codecommit) has informal stats).
- π[CodePipeline](https://aws.amazon.com/codepipeline/): Continuous integration. You likely have another solution already.
- π[CodeDeploy](https://aws.amazon.com/codedeploy/): Deployment of code to EC2 servers. Again, you likely have another solution.
- π[OpsWorks](https://aws.amazon.com/opsworks/): Management of your deployments using Chef or (as of November 2017) Puppet Enterprise.
- [AWS in Plain English](https://www.expeditedssl.com/aws-in-plain-english) offers more friendly explanation of what all the other different services are.
[Back to top :arrow_up:](#table-of-contents)
### Tools and Services Market Landscape
There are now enough cloud and βbig dataβ enterprise companies and products that few can keep up with the market landscape.
Weβve assembled a landscape of a few of the services. This is far from complete, but tries to emphasize services that are popular with AWS practitioners β services that specifically help with AWS, or a complementary, or tools almost anyone using AWS must learn.

π§ *Suggestions to improve this figure? Please [file an issue](CONTRIBUTING.md).*
[Back to top :arrow_up:](#table-of-contents)
### Common Concepts
- π The AWS [**General Reference**](https://docs.aws.amazon.com/general/latest/gr/Welcome.html) covers a bunch of common concepts that are relevant for multiple services.
- AWS allows deployments in [**regions**](https://docs.aws.amazon.com/general/latest/gr/rande.html), which are isolated geographic locations that help you reduce latency or offer additional redundancy. Regions contain availability zones(AZs), which are typically the first tool of choice for [high availability](#high-availability)). AZs are [physically separate from one another](https://www.youtube.com/watch?v=JIQETrFC_SQ&feature=youtu.be&t=1428) even within the same region, and [may span multiple physical data centers](https://blog.rackspace.com/aws-101-regions-availability-zones). While they are connected via low latency links, natural disasters afflicting one should not affect others.
- Each service has API **endpoints** for each region. Endpoints differ from service to service and not all services are available in each region, as listed in [these tables](https://docs.aws.amazon.com/general/latest/gr/rande.html).
- [**Amazon Resource Names (ARNs)**](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html) are specially formatted identifiers for identifying resources. They start with 'arn:' and are used in many services, and in particular for IAM policies.
[Back to top :arrow_up:](#table-of-contents)
### Service Matrix
Many services within AWS can at least be compared with Google Cloud offerings or with internal Google services. And often times you could assemble the same thing yourself with open source software. This table is an effort at listing these rough correspondences. (Remember that this table is imperfect as in almost every case there are subtle differences of features!)
| Service | AWS | Google Cloud | Google Internal | Microsoft Azure | Other providers | Open source βbuild your ownβ | Openstack |
|-------------------------------|------------------------------------------------------------------------------|------------------------------|-----------------|------------------------------------|-----------------------------------|------------------------------------------------------------|------------------------------------------------------------|
| Virtual server | EC2 | Compute Engine (GCE) | | Virtual Machine | DigitalOcean | OpenStack | Nova |
| PaaS | Elastic Beanstalk | App Engine | App Engine | Web Apps | Heroku, AppFog, OpenShift | Meteor, AppScale, Cloud Foundry, Convox |
| Serverless, microservices | Lambda, API Gateway | Functions | | Function Apps | PubNub Blocks, Auth0 Webtask | Kong, Tyk | Qinling |
| Container, cluster manager | ECS, EKS, Fargate | Container Engine, Kubernetes | Borg or Omega | Container Service | | Kubernetes, Mesos, Aurora | Zun |
| Object storage | S3 | Cloud Storage | GFS | Storage Account | DigitalOcean Spaces | Swift, HDFS, Minio | Swift |
| Block storage | EBS | Persistent Disk | | Storage Account | DigitalOcean Volumes | NFS | Cinder |
| SQL datastore | RDS | Cloud SQL | | SQL Database | | MySQL, PostgreSQL | Trove (stores NoSQL as well) |
| Sharded RDBMS | | Cloud Spanner | F1, Spanner | Azure Database for PostgreSQL - Hyperscale (Citus) | | Crate.io, CockroachDB |
| Bigtable | | Cloud Bigtable | Bigtable | | | HBase |
| Key-value store, column store | DynamoDB | Cloud Datastore | Megastore | Tables, DocumentDB | | Cassandra, CouchDB, RethinkDB, Redis |
| Memory cache | ElastiCache | App Engine Memcache | | Redis Cache | | Memcached, Redis |
| Search | CloudSearch, Elasticsearch (managed) | | | Search | Algolia, QBox, Elastic Cloud | Elasticsearch, Solr |
| Data warehouse | Redshift | BigQuery | Dremel | SQL Data Warehouse | Oracle, IBM, SAP, HP, many others | Greenplum |
| Business intelligence | QuickSight | Data Studio 360 | | Power BI | Tableau | |
| Lock manager | [DynamoDB (weak)](https://gist.github.com/ryandotsmith/c95fd21fab91b0823328) | | Chubby | Lease blobs in Storage Account | | ZooKeeper, Etcd, Consul |
| Message broker | SQS, SNS, IoT | Pub/Sub | PubSub2 | Service Bus | | RabbitMQ, Kafka, 0MQ |
| Streaming, distributed log | Kinesis | Dataflow | PubSub2 | Event Hubs | | Kafka Streams, Apex, Flink, Spark Streaming, Storm |
| MapReduce | EMR | Dataproc | MapReduce | HDInsight, DataLake Analytics | Qubole | Hadoop |
| Monitoring | CloudWatch | Stackdriver Monitoring | Borgmon | Monitor | | Prometheus(?) |
| Tracing | X-Ray | Stackdriver Trace | | Monitor (Application Insights) | DataDog, New Relic, Epsagon | Zipkin, Jaeger, Appdash
| Metric management | | | Borgmon, TSDB | Application Insights | | Graphite, InfluxDB, OpenTSDB, Grafana, Riemann, Prometheus |
| CDN | CloudFront | Cloud CDN | | CDN | Akamai, Fastly, Cloudflare, Limelight Networks | Apache Traffic Server |
| Load balancer | CLB/ALB | Load Balancing | GFE | Load Balancer, Application Gateway | | nginx, HAProxy, Apache Traffic Server |
| DNS | Route53 | DNS | | DNS | | bind |
| Email | SES | | | | Sendgrid, Mandrill, Postmark | |
| Git hosting | CodeCommit | Cloud Source Repositories | | Visual Studio Team Services | GitHub, BitBucket | GitLab |
| User authentication | Cognito | Firebase Authentication | | Azure Active Directory | | oauth.io |
| Mobile app analytics | Mobile Analytics | Firebase Analytics | | HockeyApp | Mixpanel | |
| Mobile app testing | Device Farm | Firebase Test Lab | | Xamarin Test Cloud | BrowserStack, Sauce Labs, Testdroid |
| Managing SSL/TLS certificates | Certificate Manager | | | | Let's Encrypt, Comodo, Symantec, GlobalSign |
| Automatic speech recognition and natural language understanding | Transcribe (ASR), Lex (NLU) | Cloud Speech API, Natural Language API | | Cognitive services | AYLIEN Text Analysis API, Ambiverse Natural Language Understanding API |Stanford's Core NLP Suite, Apache OpenNLP, Apache UIMA, spaCy |
| Text-to-speech engine in the cloud | Polly | | | |Nuance, Vocalware, IBM | Mimic, eSpeak, MaryTTS |
| Image recognition | Rekognition | Vision API | |Cognitive services | IBM Watson, Clarifai |TensorFlow, OpenCV |
| OCR (Text recognition) | Textract (documents), Rekognition (photographs) | Cloud Vision API | | Computer Vision API | | Tesseract |
| Language Translation | Translate | Translate | | Translator Text API | | Apertium |
| File Share and Sync | WorkDocs | Google Docs | |OneDrive | Dropbox, Box, Citrix File Share |ownCloud |
| Machine Learning | SageMaker, DeepLens, ML | ML Engine, Auto ML | |ML Studio | Watson ML | |
| Data Loss Prevention | Macie | Cloud Data Loss Prevention | | Azure Information Protection | | |
π§ [*Please help fill this table in.*](CONTRIBUTING.md)
Selected resources with more detail on this chart:
- Google internal: [MapReduce](http://research.google.com/archive/mapreduce.html), [Bigtable](http://research.google.com/archive/bigtable.html), [Spanner](http://research.google.com/archive/spanner.html), [F1 vs Spanner](http://highscalability.com/blog/2013/10/8/f1-and-spanner-holistically-compared.html), [Bigtable vs Megastore](http://perspectives.mvdirona.com/2008/07/google-megastore/)
[Back to top :arrow_up:](#table-of-contents)
### AWS Product Maturity and Releases
Itβs important to know the maturity of each AWS product. Here is a mostly complete list of first release date, with links to the [release notes](https://aws.amazon.com/releasenotes/). Most recently released services are first. Not all services are available in all regions; see [this table](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/).
| Service | Original release | Availability | CLI Support | HIPAA Compliant | PCI-DSS Compliant |
|------------------------------------------------------------------------------------------------------------|------------------|-------------------------------------------------------------------------------|:-----------:|:---------------:|:-----------------:|
| π₯[X-Ray](https://aws.amazon.com/releasenotes/AWS-X-Ray?browse=1) | 2016-12 | General |β |β |β |
| π₯[Lex](https://aws.amazon.com/releasenotes/Amazon-Lex?browse=1) | 2016-11 | Preview | | | |
| π₯[Polly](https://aws.amazon.com/releasenotes/Amazon-Polly?browse=1) | 2016-11 | General |β |β |β |
| π₯[Rekognition](https://aws.amazon.com/releasenotes/Amazon-Rekognition?browse=1) | 2016-11 | General |β |β |β |
| π₯[Athena](http://docs.aws.amazon.com/athena/latest/ug/what-is.html) | 2016-11 | General |β |β |β |
| π₯[Batch](http://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html) | 2016-11 | General |β |β |β |
| π₯[Database Migration Service](https://aws.amazon.com/releasenotes/AWS-Database-Migration-Service?browse=1) | 2016-03 | General | | β | β |
| π₯[Certificate Manager](https://aws.amazon.com/blogs/aws/new-aws-certificate-manager-deploy-ssltls-based-apps-on-aws/) | 2016-01 | General | β |β |β |
| π₯[IoT](https://aws.amazon.com/blogs/aws/aws-iot-now-generally-available/) | 2015-08 | General | β |β |β[13](#user-content-pci-iot) |
| π₯[WAF](https://aws.amazon.com/releasenotes/AWS-WAF?browse=1) | 2015-10 | General | β | β | β |
| π₯[Data Pipeline](https://aws.amazon.com/releasenotes/AWS-Data-Pipeline?browse=1) | 2015-10 | General | β | | |
| π₯[Elasticsearch](https://aws.amazon.com/releasenotes/Amazon-Elasticsearch-Service?browse=1) | 2015-10 | General | β |β |β |
| π₯[Aurora](https://aws.amazon.com/releasenotes/2775579329314699) | 2015-07 | General | β | β[3](#user-content-hipaa-aurora) | β[3](#user-content-hipaa-aurora) |
| π₯[Service Catalog](https://aws.amazon.com/releasenotes/AWS-Service-Catalog?browse=1) | 2015-07 | General | β |β |β |
| π₯[Device Farm](https://aws.amazon.com/releasenotes/AWS-Device-Farm?browse=1) | 2015-07 | General | β | | |
| π₯[CodePipeline](https://aws.amazon.com/releasenotes/AWS-CodePipeline?browse=1) | 2015-07 | General | β |β | |
| π₯[CodeCommit](https://aws.amazon.com/releasenotes/AWS-CodeCommit?browse=1) | 2015-07 | General | β |β |β |
| π₯[API Gateway](https://aws.amazon.com/releasenotes/Amazon-API-Gateway?browse=1) | 2015-07 | General | β | β[1](#user-content-hipaa-apigateway) | β |
| π₯[Config](https://aws.amazon.com/releasenotes/AWS-Config?browse=1) | 2015-06 | General | β |β | β |
| π₯[EFS](https://aws.amazon.com/releasenotes/Amazon-EFS?browse=1) | 2015-05 | General | β |β |β |
| π₯[Machine Learning](https://aws.amazon.com/releasenotes/AmazonML?browse=1) | 2015-04 | General | β | | |
| [Lambda](https://aws.amazon.com/releasenotes/AWS-Lambda?browse=1) | 2014-11 | General | β |β | β |
| [ECS](https://aws.amazon.com/ecs/release-notes/) | 2014-11 | General | β | β | β |
| [EKS](https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html) | 2018-06 | General | β[12](#user-content-eks-cli) |β |β |
| [KMS](https://aws.amazon.com/releasenotes/AWS-KMS?browse=1) | 2014-11 | General | β |β | β |
| [CodeDeploy](https://aws.amazon.com/releasenotes/AWS-CodeDeploy?browse=1) | 2014-11 | General | β |β | |
| [Kinesis](https://aws.amazon.com/releasenotes/Amazon-Kinesis?browse=1) | 2013-12 | General | β |β | β[11](#user-content-pci-kinesis) |
| [CloudTrail](https://aws.amazon.com/releasenotes/AWS-CloudTrail?browse=1) | 2013-11 | General | β |β | β |
| [AppStream](https://aws.amazon.com/releasenotes/Amazon-AppStream?browse=1) | 2013-11 | Preview | |β | |
| [CloudHSM](https://aws.amazon.com/releasenotes/AWS-CloudHSM?browse=1) | 2013-03 | General | β |β | β |
| [Silk](https://aws.amazon.com/releasenotes/Amazon-Silk?browse=1) | 2013-03 | Obsolete? | | | |
| [OpsWorks](https://aws.amazon.com/releasenotes/AWS-OpsWorks?browse=1) | 2013-02 | General | β |β | β |
| [Redshift](https://aws.amazon.com/releasenotes/Amazon-Redshift?browse=1) | 2013-02 | General | β | β | β |
| [Elastic Transcoder](https://aws.amazon.com/releasenotes/Amazon-Elastic-Transcoder?browse=1) | 2013-01 | General | β | | |
| [Glacier](https://aws.amazon.com/releasenotes/Amazon-Glacier?browse=1) | 2012-08 | General | β | β | β |
| [CloudSearch](https://aws.amazon.com/releasenotes/Amazon-CloudSearch?browse=1) | 2012-04 | General | β | | |
| [SWF](https://aws.amazon.com/releasenotes/Amazon-SWF?browse=1) | 2012-02 | General | β |β | β |
| [Storage Gateway](https://aws.amazon.com/releasenotes/AWS-Storage-Gateway?browse=1) | 2012-01 | General | β |β |β |
| [DynamoDB](https://aws.amazon.com/releasenotes/Amazon-DynamoDB?browse=1) | 2012-01 | General | β | β | β |
| [DirectConnect](https://aws.amazon.com/releasenotes/AWS-Direct-Connect?browse=1) | 2011-08 | General | β | β | β |
| [ElastiCache](https://aws.amazon.com/releasenotes/Amazon-ElastiCache?browse=1) | 2011-08 | General | β |β[14](#user-content-pci-elasticache) |β[14](#user-content-pci-elasticache) |
| [CloudFormation](https://aws.amazon.com/releasenotes/AWS-CloudFormation?browse=1) | 2011-04 | General | β |β | β |
| [SES](https://aws.amazon.com/releasenotes/Amazon-SES?browse=1) | 2011-01 | General | β |β | |
| [Elastic Beanstalk](https://aws.amazon.com/releasenotes/AWS-Elastic-Beanstalk?browse=1) | 2010-12 | General | β |β | β |
| [Route 53](https://aws.amazon.com/releasenotes/Amazon-Route-53?browse=1) | 2010-10 | General | β |β | β |
| [IAM](https://aws.amazon.com/releasenotes/AWS-Identity-and-Access-Management?browse=1) | 2010-09 | General | β | | β |
| [SNS](https://aws.amazon.com/releasenotes/Amazon-SNS?browse=1) | 2010-04 | General | β | β | β |
| [EMR](https://aws.amazon.com/releasenotes/Elastic-MapReduce?browse=1) | 2010-04 | General | β | β | β |
| [RDS](https://aws.amazon.com/releasenotes/Amazon-RDS?browse=1) | 2009-12 | General | β |β[2](#user-content-hipaa-rds) |β[9](#user-content-pci-rds) |
| [VPC](https://aws.amazon.com/releasenotes/Amazon-VPC?browse=1) | 2009-08 | General | β | β | β |
| [Snowball](https://aws.amazon.com/releasenotes/AWS-ImportExport?browse=1) | 2015-10 | General | β | β |β[15](#user-content-pci-snowball) |
| [Snowmobile](https://aws.amazon.com/snowmobile/) | 2016-11 | General | |β |β |
| [CloudWatch](https://aws.amazon.com/releasenotes/CloudWatch?browse=1) | 2009-05 | General | β |β | β |
| [CloudFront](https://aws.amazon.com/releasenotes/CloudFront?browse=1) | 2008-11 | General | β | β[4](#user-content-hipaa-cloudfront) | β |
| [Fulfillment Web Service](https://aws.amazon.com/releasenotes/Amazon-FWS?browse=1) | 2008-03 | Obsolete? | | | |
| [SimpleDB](https://aws.amazon.com/releasenotes/Amazon-SimpleDB?browse=1) | 2007-12 | β[Nearly obsolete](https://forums.aws.amazon.com/thread.jspa?threadID=121711) | β | | β |
| [DevPay](https://aws.amazon.com/releasenotes/DevPay?browse=1) | 2007-12 | General | | | |
| [Flexible Payments Service](https://aws.amazon.com/releasenotes/Amazon-FPS?browse=1) | 2007-08 | Retired | | | |
| [EC2](https://aws.amazon.com/releasenotes/Amazon-EC2?browse=1) | 2006-08 | General | β | β[5](#user-content-hipaa-ec2sysmgr),[6](#user-content-hipaa-ec2ebs),[7](#user-content-hipaa-ec2elb) | β[6](#user-content-hipaa-ec2ebs),[7](#user-content-hipaa-ec2elb),[10](#user-content-pci-asg) |
| [SQS](https://aws.amazon.com/releasenotes/Amazon-SQS?browse=1) | 2006-07 | General | β | β | β |
| [S3](https://aws.amazon.com/releasenotes/Amazon-S3?browse=1) | 2006-03 | General | β | β[8](#user-content-hipaa-s3) | β |
| [Alexa Top Sites](https://aws.amazon.com/alexa-top-sites/) | 2006-01 | General βHTTP-only | | | |
| [Alexa Web Information Service](https://aws.amazon.com/awis/) | 2005-10 | General βHTTP-only | | | |
[Back to top :arrow_up:](#table-of-contents)
##### Footnotes
**1**: Excludes use of Amazon API Gateway caching
**2**: RDS MySQL, Oracle, and PostgreSQL engines only
**3**: MySQL-compatible Aurora edition only
**4**: Excludes Lambda@Edge
**5**: Includes EC2 Systems Manager
**6**: Includes Elastic Block Storage (EBS)
**7**: Includes Elastic Load Balancing
**8**: Includes S3 Transfer Acceleration
**9**: Includes RDS MySQL, Oracle, PostgreSQL, SQL Server, and MariaDB
**10**: Includes Auto-Scaling
**11**: Data Analytics, Streams, Video Streams and Firehose
**12**: Kubernetes uses a custom CLI for Pod/Service management called kubectl. AWS CLI only handles Kubernetes Master concerns
**13**: IoT Core (includes Device Management) and Greengrass
**14**: ElastiCache for Redis only
**15**: Snowball and Snowball Edge
### Compliance
- Many applications have strict requirements around reliability, security, or data privacy. The [AWS Compliance](https://aws.amazon.com/compliance/) page has details about AWSβs certifications, which include **PCI DSS Level 1**, **SOC 1,2, and 3**, **HIPAA**, and **ISO 9001**.
- Security in the cloud is a complex topic, based on a [shared responsibility model](https://aws.amazon.com/compliance/shared-responsibility-model/), where some elements of compliance are provided by AWS, and some are provided by your company.
- Several third-party vendors offer assistance with compliance, security, and auditing on AWS. If you have substantial needs in these areas, assistance is a good idea.
- From inside **China**, AWS services outside China [are generally accessible](https://en.greatfire.org/aws.amazon.com), though there are at times breakages in service. There are also AWS services [inside China](https://www.amazonaws.cn/en/).
### Getting Help and Support
- **Forums:** For many problems, itβs worth searching or asking for help in the [discussion forums](https://forums.aws.amazon.com/index.jspa) to see if itβs a known issue.
- **Premium support:** AWS offers several levels of [premium support](https://aws.amazon.com/premiumsupport/).
- The first tier, called "Developer support" lets you file support tickets with 12 to 24 hour turnaround time, it starts at $29 but once your monthly spend reaches around $1000 it changes to a 3% surcharge on your bill.
- The higher-level support services are quite expensive β and increase your bill by up to 10%. Many large and effective companies never pay for this level of support. They are usually more helpful for midsize or larger companies needing rapid turnaround on deeper or more perplexing problems.
- Keep in mind, a flexible architecture can reduce need for support. You shouldnβt be relying on AWS to solve your problems often. For example, if you can easily re-provision a new server, it may not be urgent to solve a rare kernel-level issue unique to one EC2 instance. If your EBS volumes have recent snapshots, you may be able to restore a volume before support can rectify the issue with the old volume. If your services have an issue in one availability zone, you should in any case be able to rely on a redundant zone or migrate services to another zone.
- Larger customers also get access to AWS Enterprise support, with dedicated technical account managers (TAMs) and shorter response time SLAs.
- There is definitely some controversy about how useful the paid support is. The support staff donβt always seem to have the information and authority to solve the problems that are brought to their attention. Often your ability to have a problem solved may depend on your relationship with your account rep.
- **Account manager:** If you are at significant levels of spend (thousands of US dollars plus per month), you may be assigned (or may wish to ask for) a dedicated account manager.
- These are a great resource, even if youβre not paying for premium support. Build a good relationship with them and make use of them, for questions, problems, and guidance.
- Assign a single point of contact on your companyβs side, to avoid confusing or overwhelming them.
- **Contact:** The main web contact point for AWS is [here](https://aws.amazon.com/contact-us/). Many technical requests can be made via these channels.
- **Consulting and managed services:** For more hands-on assistance, AWS has established relationships with many [consulting partners](https://aws.amazon.com/partners/consulting/) and [managed service partners](https://aws.amazon.com/partners/msp/). The big consultants wonβt be cheap, but depending on your needs, may save you costs long term by helping you set up your architecture more effectively, or offering specific expertise, e.g. security. Managed service providers provide longer-term full-service management of cloud resources.
- **AWS Professional Services:** AWS provides [consulting services](https://aws.amazon.com/professional-services/) alone or in combination with partners.
### Restrictions and Other Notes
- πΈLots of resources in Amazon have [**limits**](http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html) on them. This is actually helpful, so you donβt incur large costs accidentally. You have to request that quotas be increased by opening support tickets. Some limits are easy to raise, and some are not. (Some of these are noted in sections below.) Additionally, not all service limits are published.
- **Obtaining Current Limits and Usage:** Limit information for a service may be available from the service API, Trusted Advisor, both or neither (in which case you'll need to contact Support). [This page](http://awslimitchecker.readthedocs.io/en/latest/limits.html) from the awslimitchecker tool's documentation provides a nice summary of available retrieval options for each limit. The [tool](https://github.com/jantman/awslimitchecker) itself is also valuable for automating limit checks.
- πΈ[**AWS terms of service**](https://aws.amazon.com/service-terms/) are extensive. Much is expected boilerplate, but it does contain important notes and restrictions on each service. In particular, there are restrictions against using many AWS services in **safety-critical systems**. (Those appreciative of legal humor may wish to review [clause 42.10](https://www.theguardian.com/technology/2016/feb/11/amazon-terms-of-service-zombie-apocalypse).)
### Related Topics
- [OpenStack](https://www.openstack.org/) is a private cloud alternative to AWS used by large companies that wish to avoid public cloud offerings.
Learning and Career Development
-------------------------------
### Certifications
- **Certifications:** AWS offers [**certifications**](https://aws.amazon.com/certification/) for IT professionals who want to demonstrate their knowledge.
- [Certified Cloud Practitioner](https://aws.amazon.com/certification/certified-cloud-practitioner/)
- [Certified Solutions Architect Associate](https://aws.amazon.com/certification/certified-solutions-architect-associate/)
- [Certified Developer Associate](https://aws.amazon.com/certification/certified-developer-associate/)
- [Certified SysOps Administrator Associate](https://aws.amazon.com/certification/certified-sysops-admin-associate/)
- [Certified Solutions Architect Professional](https://aws.amazon.com/certification/certified-solutions-architect-professional/)
- [Certified DevOps Engineer Professional](https://aws.amazon.com/certification/certified-devops-engineer-professional/)
- [Certified Security β Specialty](https://aws.amazon.com/certification/certified-security-specialty/)
- [Certified Advanced Networking β Specialty](https://aws.amazon.com/certification/certified-advanced-networking-specialty/)
- [Certified Machine Learning β Specialty](https://aws.amazon.com/certification/certified-machine-learning-specialty/)
- [Certified Data Analytics β Specialty](https://aws.amazon.com/certification/certified-data-analytics-specialty/)
- [Certified Database β Specialty](https://aws.amazon.com/certification/certified-database-specialty/)
Associate level certifications were once required as pre-requisites to taking the Professional examinations - this is no longer the case.
- **Getting certified:** If youβre interested in studying for and getting certifications, [this practical overview](https://gist.github.com/leonardofed/bbf6459ad154ad5215d354f3825435dc) tells you a lot of what you need to know. The official page is [here](https://aws.amazon.com/training/) and there is an [FAQ](https://aws.amazon.com/certification/faqs/).
- **Training for certifications:** Training is offered by AWS themselves (mainly instructor-led and on-site) and various third-party companies (usually as video-based training) such as [A Cloud Guru](https://acloud.guru/aws-cloud-training), [CloudAcademy](https://cloudacademy.com/library/amazon-web-services/) and [Linux Academy](https://linuxacademy.com/library/topics/AWS/type/Course/).
- **Do you need a certification?** Especially in consulting companies or when working in key tech roles in large non-tech companies, certifications are important credentials. In others, including in many tech companies and startups, certifications are not common or considered necessary. (In fact, fairly or not, some Silicon Valley hiring managers and engineers see them as a βnegativeβ signal on a resume.)
Certifications are required to access certificate lounges at official AWS events such as [Summits](https://aws.amazon.com/events/summits/) and [re:Invent](https://reinvent.awsevents.com). Lounges typically provide power charging points, seats and relatively better coffee.
Managing AWS
------------
### Managing Infrastructure State and Change
A great challenge in using AWS to build complex systems (and with DevOps in general) is to manage infrastructure state effectively over time. In general, this boils down to three broad goals for the state of your infrastructure:
- **Visibility**: Do you know the state of your infrastructure (what services you are using, and exactly how)? Do you also know when you β and anyone on your team β make changes? Can you detect misconfigurations, problems, and incidents with your service?
- **Automation**: Can you reconfigure your infrastructure to reproduce past configurations or scale up existing ones without a lot of extra manual work, or requiring knowledge thatβs only in someoneβs head? Can you respond to incidents easily or automatically?
- **Flexibility**: Can you improve your configurations and scale up in new ways without significant effort? Can you add more complexity using the same tools? Do you share, review, and improve your configurations within your team?
Much of what we discuss below is really about how to improve the answers to these questions.
There are several approaches to deploying infrastructure with AWS, from the console to complex automation tools, to third-party services, all of which attempt to help achieve visibility, automation, and flexibility.
### AWS Configuration Management
The first way most people experiment with AWS is via its web interface, the AWS Console. But using the Console is a highly manual process, and often works against automation or flexibility.
So if youβre not going to manage your AWS configurations manually, what should you do? Sadly, there are no simple, universal answers β each approach has pros and cons, and the approaches taken by different companies vary widely, and include directly using APIs (and building tooling on top yourself), using command-line tools, and using third-party tools and services.
### AWS Console
- The [AWS Console](https://aws.amazon.com/console/) lets you control much (but not all) functionality of AWS via a web interface.
- Ideally, you should only use the AWS Console in a few specific situations:
- Itβs great for read-only usage. If youβre trying to understand the state of your system, logging in and browsing it is very helpful.
- It is also reasonably workable for very small systems and teams (for example, one engineer setting up one server that doesnβt change often).
- It can be useful for operations youβre only going to do rarely, like less than once a month (for example, a one-time VPC setup you probably wonβt revisit for a year). In this case using the console can be the simplest approach.
- β**Think before you use the console:** The AWS Console is convenient, but also the enemy of automation, reproducibility, and team communication. If youβre likely to be making the same change multiple times, avoid the console. Favor some sort of automation, or at least have a path toward automation, as discussed next. Not only does using the console preclude automation, which wastes time later, but it prevents documentation, clarity, and standardization around processes for yourself and your team.
### Command-Line tools
- The [**aws command-line interface**](https://aws.amazon.com/cli/) (CLI), used via the **aws** command, is the most basic way to save and automate AWS operations.
- Donβt underestimate its power. It also has the advantage of being well-maintained β it covers a large proportion of all AWS services, and is up to date.
- In general, whenever you can, prefer the command line to the AWS Console for performing operations.
- πΉEven in the absence of fancier tools, you can **write simple Bash scripts** that invoke *aws* with specific arguments, and check these into Git. This is a primitive but effective way to document operations youβve performed. It improves automation, allows code review and sharing on a team, and gives others a starting point for future work.
- πΉFor use that is primarily interactive (not scripted), consider instead using the [**aws-shell**](https://github.com/awslabs/aws-shell) tool from AWS. It is easier to use, with auto-completion and a colorful UI, but still works on the command line. If youβre using [SAWS](https://github.com/donnemartin/saws), a previous version of the program, [you should migrate to aws-shell](https://github.com/donnemartin/saws/issues/68#issuecomment-240067034).
### APIs and SDKs
- **SDKs** for using AWS APIs are available in most major languages, with [Go](https://github.com/aws/aws-sdk-go), [iOS](https://github.com/aws/aws-sdk-ios), [Java](https://github.com/aws/aws-sdk-java), [JavaScript](https://github.com/aws/aws-sdk-js), [Python](https://github.com/boto/boto3), [Ruby](https://github.com/aws/aws-sdk-ruby), and [PHP](https://github.com/aws/aws-sdk-php) being most heavily used. AWS maintains [a short list](https://aws.amazon.com/tools/#sdk), but the [awesome-aws list](https://github.com/donnemartin/awesome-aws#sdks-and-samples) is the most comprehensive and current. Note [support for C++](https://github.com/donnemartin/awesome-aws#c-sdk) is [still new](https://aws.amazon.com/blogs/aws/introducing-the-aws-sdk-for-c/).
- **Retry logic:** An important aspect to consider whenever using SDKs is error handling; under heavy use, a wide variety of failures, from programming errors to throttling to AWS-related outages or failures, can be expected to occur. SDKs typically implement [**exponential backoff**](https://docs.aws.amazon.com/general/latest/gr/api-retries.html) to address this, but this may need to be understood and adjusted over time for some applications. For example, it is often helpful to alert on some error codes and not on others.
- βDonβt use APIs directly. Although AWS documentation includes lots of API details, itβs better to use the SDKs for your preferred language to access APIs. SDKs are more mature, robust, and well-maintained than something youβd write yourself.
### Boto
- A good way to automate operations in a custom way is [**Boto3**](https://github.com/boto/boto3), also known as the [Amazon SDK for Python](http://aws.amazon.com/sdk-for-python/). [**Boto2**](https://github.com/boto/boto), the previous version of this library, has been in wide use for years, but now there is a newer version with official support from Amazon, so prefer Boto3 for new projects.
- Boto3 contains a variety of APIs that operate at either a high level or a low level, here some explanation of both:
- The low level APIs (Client APIs) are mapped to AWS Cloud service-specific APIs, and all service operations are supported by clients. Clients are generated from a JSON service definition file.
- The high level option, Resource APIs, allows you to avoid calling the network at the low level and instead provide an object-oriented way to interact with AWS Cloud services.
- Boto3 has a lot of helpful [**features**](https://boto3.readthedocs.io/en/latest/guide/index.html#general-feature-guides) like *waiters*, which provide a structure that allows for code to wait for changes to occur in the cloud, for example, when you are creating an EC2 instance and need wait until the instance is running in order to perform another task.
- If you find yourself writing a Bash script with more than one or two CLI commands, youβre probably doing it wrong. Stop, and consider writing a Boto script instead. This has the advantages that you can:
- Check return codes easily so success of each step depends on success of past steps.
- Grab interesting bits of data from responses, like instance ids or DNS names.
- Add useful environment information (for example, tag your instances with git revisions, or inject the latest build identifier into your initialization script).
[Back to top :arrow_up:](#table-of-contents)
### General Visibility
- πΉ[**Tagging resources**](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html) is an essential practice, especially as organizations grow, to better understand your resource usage. For example, through automation or convention, you can add tags:
- For the org or developer that βownsβ that resource
- For the product that resource supports
- To label lifecycles, such as temporary resources or one that should be deprovisioned in the future
- To distinguish production-critical infrastructure (e.g. serving systems vs backend pipelines)
- To distinguish resources with special security or compliance requirements
- To (once enabled) [allocate cost](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html). Note that cost allocation tags only apply on a forward-looking basis; you can't retroactively apply them to items already billed.
- For many years, there was a notorious 10 tag limit per resource, which could not be raised and caused many companies significant pain. As of 2016, this was [raised](https://aws.amazon.com/blogs/security/now-organize-your-aws-resources-by-using-up-to-50-tags-per-resource/) to 50 tags per resource.
- πΉIn 2017, AWS introduced the ability to [enforce tagging](https://aws.amazon.com/blogs/aws/new-tag-ec2-instances-ebs-volumes-on-creation/) on instance and volume creation, deprecating portions of third party tools such as [Cloud Custodian](https://github.com/capitalone/cloud-custodian).
- πΈ Tags are case sensitive; 'environment' and 'Environment' are two different tags. Automation in setting tags is likely the only sensible option at significant scale.
- πΈ There is a bug in the ASG console where spaces after tag names are preserved. So if you type "Name " with a space at the end you will not get the expected behavior. This is probably true in other locations and SDKs also. Be sure you do not add trailing spaces to tag keys unless you really mean it. (As of Jul 2018)
- πΈ When resources are shared across the org, tags are not shared with it. For example, sharing Transit Gateway or AMIs will show the correct tags in the account that created these resources but not in the accounts where these resources were shared.
Managing Servers and Applications
---------------------------------
[Back to top :arrow_up:](#table-of-contents)
### AWS vs Server Configuration
This guide is about AWS, not DevOps or server configuration management in general. But before getting into AWS in detail, itβs worth noting that in addition to the configuration management for your AWS resources, there is the long-standing problem of configuration management for servers themselves.
[Back to top :arrow_up:](#table-of-contents)
### Philosophy
- Herokuβs [**Twelve-Factor App**](http://12factor.net/) principles list some established general best practices for deploying applications.
- **Pets vs cattle:** Treat servers [like cattle, not pets](https://www.engineyard.com/blog/pets-vs-cattle). That is, design systems so infrastructure is disposable. It should be minimally worrisome if a server is unexpectedly destroyed.
- The concept of [**immutable infrastructure**](http://radar.oreilly.com/2015/06/an-introduction-to-immutable-infrastructure.html) is an extension of this idea.
- Minimize application state on EC2 instances. In general, instances should be able to be killed or die unexpectedly with minimal impact. State that is in your application should quickly move to RDS, S3, DynamoDB, EFS, or other data stores not on that instance. EBS is also an option, though it generally should not be the bootable volume, and EBS will require manual or automated re-mounting.
[Back to top :arrow_up:](#table-of-contents)
### Server Configuration Management
- There is a [large set](https://en.wikipedia.org/wiki/Comparison_of_open-source_configuration_management_software) of open source tools for managing configuration of server instances.
- These are generally not dependent on any particular cloud infrastructure, and work with any variety of Linux (or in many cases, a variety of operating systems).
- Leading configuration management tools are [Puppet](https://github.com/puppetlabs/puppet), [Chef](https://github.com/chef/chef), [Ansible](https://github.com/ansible/ansible), and [Saltstack](https://github.com/saltstack/salt). These arenβt the focus of this guide, but we may mention them as they relate to AWS.
[Back to top :arrow_up:](#table-of-contents)
### Containers and AWS
- [Docker](http://blog.scottlowe.org/2014/03/11/a-quick-introduction-to-docker/) and the containerization trend are changing the way many servers and services are deployed in general.
- Containers are designed as a way to package up your application(s) and all of their dependencies in a known way. When you build a container, you are including every library or binary your application needs, outside of the kernel. A big advantage of this approach is that itβs easy to test and validate a container locally without worrying about some difference between your computer and the servers you deploy on.
- A consequence of this is that you need fewer AMIs and boot scripts; for most deployments, the only boot script you need is a template that fetches an exported docker image and runs it.
- Companies that are embracing [microservice architectures](http://martinfowler.com/articles/microservices.html) will often turn to container-based deployments.
- AWS launched [ECS](https://aws.amazon.com/ecs/) as a service to manage clusters via Docker in late 2014, though many people still deploy Docker directly themselves. See the [ECS section](#ecs) for more details.
- AWS launched [EKS](https://aws.amazon.com/eks/) as a service to manage Kubernetes Clusters mid 2018, though many people still deploy ECS or use Docker directly themselves. See the [EKS section](#eks) for more details.
[Back to top :arrow_up:](#table-of-contents)
### Visibility
- Store and track instance metadata (such as instance id, availability zone, etc.) and deployment info (application build id, Git revision, etc.) in your logs or reports. The [**instance metadata service**](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html) can help collect some of the AWS data youβll need.
- **Use log management services:** Be sure to set up a way to view and manage logs externally from servers.
- Cloud-based services such as [Sumo Logic](https://www.sumologic.com/), [Splunk Cloud](http://www.splunk.com/en_us/cloud.html), [Scalyr](https://www.scalyr.com/), [LogDNA](https://www.logdna.com/), and [Loggly](https://www.loggly.com/) are the easiest to set up and use (and also the most expensive, which may be a factor depending on how much log data you have).
- Major open source alternatives include [Elasticsearch](https://github.com/elastic/elasticsearch), [Logstash](https://github.com/elastic/logstash), and [Kibana](https://github.com/elastic/kibana) (the β[Elastic Stack](https://www.elastic.co/webinars/introduction-elk-stack)β) and [Graylog](https://www.graylog.org/).
- If you can afford it (you have little data or lots of money) and donβt have special needs, it makes sense to use hosted services whenever possible, since setting up your own scalable log processing systems is notoriously time consuming.
- **Track and graph metrics:** The AWS Console can show you simple graphs from CloudWatch, you typically will want to track and graph many kinds of metrics, from CloudWatch and your applications. Collect and export helpful metrics everywhere you can (and as long as volume is manageable enough you can afford it).
- Services like [Librato](https://www.librato.com/), [KeenIO](https://keen.io/), and [Datadog](https://www.datadoghq.com/) have fancier features or better user interfaces that can save a lot of time. (A more detailed comparison is [here](http://blog.takipi.com/production-tools-guide/visualization-and-metrics/).)
- Use [Prometheus](https://prometheus.io) or [Graphite](https://github.com/graphite-project/graphite-web) as timeseries databases for your metrics (both are open source).
- [Grafana](https://github.com/grafana/grafana) can visualize with dashboards the stored metrics of both timeseries databases (also open source).
[Back to top :arrow_up:](#table-of-contents)
### Tips for Managing Servers
- β**Timezone settings on servers**: unless *absolutely necessary*, always **set the timezone on servers to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time)** (see instructions for your distribution, such as [Ubuntu](https://www.digitalocean.com/community/tutorials/how-to-set-up-timezone-and-ntp-synchronization-on-ubuntu-14-04-quickstart), [CentOS](https://www.vultr.com/docs/setup-timezone-and-ntp-on-centos-6) or [Amazon](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html) Linux). Numerous distributed systems rely on time for synchronization and coordination and UTC [provides](https://blog.serverdensity.com/set-your-server-timezone-to-utc/) the universal reference plane: it is not subject to daylight savings changes and adjustments in local time. It will also save you a lot of headache debugging [elusive timezone issues](http://yellerapp.com/posts/2015-01-12-the-worst-server-setup-you-can-make.html) and provide coherent timeline of events in your logging and audit systems.
- **NTP and accurate time:** If you are not using Amazon Linux (which comes preconfigured), you should confirm your servers [configure NTP correctly](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#configure_ntp), to avoid insidious time drift (which can then cause all sorts of issues, from breaking API calls to misleading logs). This should be part of your automatic configuration for every server. If time has already drifted substantially (generally >1000 seconds), remember NTP wonβt shift it back, so you may need to remediate manually (for example, [like this](http://askubuntu.com/questions/254826/how-to-force-a-clock-update-using-ntp) on Ubuntu).
- **Testing immutable infrastructure:** If you want to be proactive about testing your serviceβs ability to cope with instance termination or failure, it can be helpful to introduce random instance termination during business hours, which will expose any such issues at a time when engineers are available to identify and fix them. Netflixβs [Simian Army](https://github.com/Netflix/SimianArmy) (specifically, [Chaos Monkey](https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey)) is a popular tool for this. Alternatively, [chaos-lambda](https://github.com/bbc/chaos-lambda) by the BBC is a lightweight option which runs on AWS [Lambda](#lambda).
Security and IAM
----------------
We cover security basics first, since configuring user accounts is something you usually have to do early on when setting up your system.
### Security and IAM Basics
- π IAM [Homepage](https://aws.amazon.com/iam/) β [User guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) β [FAQ](https://aws.amazon.com/iam/faqs/)
- The [AWS Security Blog](https://blogs.aws.amazon.com/security) is one of the best sources of news and information on AWS security.
- **IAM** is the service you use to manage accounts and permissioning for AWS.
- Managing security and access control with AWS is critical, so every AWS administrator needs to use and understand IAM, at least at a basic level.
- [IAM identities](https://docs.aws.amazon.com/IAM/latest/UserGuide/id.html) include users (people or services that are using AWS), groups (containers for sets of users and their permissions), and roles (containers for permissions assigned to AWS service instances). [Permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_permissions.html) for these identities are governed by [policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html) You can use AWS pre-defined policies or custom policies that you create.
- IAM manages various kinds of authentication, for both users and for software services that may need to authenticate with AWS, including:
- [**Passwords**](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_passwords.html) to log into the console. These are a username and password for real users.
- [**Access keys**](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html), which you may use with command-line tools. These are two strings, one the βidβ, which is an upper-case alphabetic string of the form 'AXXXXXXXXXXXXXXXXXXX', and the other is the secret, which is a 40-character mixed-case base64-style string. These are often set up for services, not just users.
- π Access keys that start with AKIA are normal keys. Access keys that start with ASIA are session/temporary keys from STS, and will require an additional "SessionToken" parameter to be sent along with the id and secret. See the documentation for [a complete list of access key prefixes](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-prefixes).
- [**Multi-factor authentication (MFA)**](https://aws.amazon.com/iam/details/mfa/), which is the highly recommended practice of using a keychain fob or smartphone app as a second layer of protection for user authentication.
- IAM allows complex and fine-grained control of permissions, dividing users into groups, assigning permissions to roles, and so on. There is a [policy language](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html) that can be used to customize security policies in a fine-grained way.
- An excellent high level overview of IAM policy concepts lives at [IAM Policies In A Nutshell](http://start.jcolemorrison.com/aws-iam-policies-in-a-nutshell/).
- πΈThe policy language has a complex and error-prone JSON syntax thatβs quite confusing, so unless you are an expert, it is wise to base yours off trusted examples or AWSβ own pre-defined [managed policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html).
- At the beginning, IAM policy may be very simple, but for large systems, it will grow in complexity, and need to be managed with care.
- πΉMake sure one person (perhaps with a backup) in your organization is formally assigned ownership of managing IAM policies, make sure every administrator works with that person to have changes reviewed. This goes a long way to avoiding accidental and serious misconfigurations.
- It is best to give each user or service the minimum privileges needed to perform their duties. This is the [principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege), one of the foundations of good security. Organize all IAM users and groups according to levels of access they need.
- IAM has the [permission hierarchy](http://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html) of:
1. Explicit deny: The most restrictive policy wins.
2. Explicit allow: Access permissions to any resource has to be explicitly given.
3. Implicit deny: All permissions are implicitly denied by default.
- You can test policy permissions via the AWS IAM [policy simulator tool](https://po