Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/carlos19960601/awesome-scalability-zh

awesome-scalability 的中文翻译
https://github.com/carlos19960601/awesome-scalability-zh

List: awesome-scalability-zh

Last synced: 3 months ago
JSON representation

awesome-scalability 的中文翻译

Awesome Lists containing this project

README

        

# awesome-scalability中文版
[![logo.png](/logo.png)](http://awesome-scalability.com/)

一份说明可伸缩、高可靠和高性能的大规模系统模式的阅读清单。案例学习都是从服务于数百万甚至数亿用用户的线上系统总结的。

#### 如果你的系统变慢了 :traffic_light:

> 定位你的问题: 可扩展问题(对于单个用户请求响应很快但是在负载高时变慢)或者性能问题(对于单个用户请求慢)可以参考[设计原则](#设计原则)和[可伸缩性](#可伸缩性)和[性能](#性能)问题在技术公司是如何解决的。
>
> [智能](#智能)部分是为处理数据,机器学习和深度学习的人准备

#### 如果你的系统挂了 :construction:

> "即使某天你失去了一切,如果保持镇静,你还能东山再起" - Thuan Pham,Uber CTO。所以遇事不慌,记住[可用性](#可用性)和[稳定性](#稳定性)的重要性。

#### 如果你面临系统设计的面试 :ocean:
> 在白板设计应用之前,看看[面试笔记](#面试笔记)和[完整图示的实际架构](#架构)有个全面的认识。你还可以看看技术大牛的[演讲](#演讲),了解他们怎么构建,扩展和优化他们的系统。推荐一些[书籍](#书籍)(大部分都是免费的)给你!祝你好运:four_leaf_clover:

#### 如果你正在构建自己的梦之队 :ferris_wheel:

> 扩大团队规模的目标不是增加团队规模,而是增加团队产出和价值。你可以在[组织](organization)中看到技术公司如何在各个方面实现这个目标:雇用,管理,组织,文化和沟通。

#### 社区的力量 :mountain_cableway::aerial_tramway::mountain_cableway:

> 欢迎贡献!你可以看看 [contribution guidelines](CONTRIBUTING.md)。如果你发现一些链接失效或错误,请提交PR。

> 这个项目花了很多时间整理。如果你觉得对你有帮助,请分享到Facebook,Twitter和微博,或者分享到聊天群众!知识就是力量,分享知识力量翻倍。谢谢。

## Content
- [设计原则](#设计原则)
- [伸缩性](#伸缩性)
- [可用性](#可用性)
- [稳定性](#稳定性)
- [性能](#性能)
- [智能](#智能)
- [架构](#架构)
- [面试](#面试)
- [组织](#组织)
- [演讲](#演讲)
- [推荐书籍](#推荐书籍)

## 设计原则
* [大规模服务的经验教训 - Eric Brewer, UC Berkeley & Google](https://people.eecs.berkeley.edu/~brewer/papers/GiantScale-IEEE.pdf)
* [构建大型分布式系统的设计、经验和建议 - Jeff Dean, Google](https://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf)
* [如何设计一个好的API&为什么这很重要- Joshua Bloch, CMU & Google](https://www.infoq.com/presentations/effective-api-design)
* [关于效率,可靠性,扩展性 - James Hamilton, VP at AWS](http://mvdirona.com/jrh/work/)
* [为企业构建平台时要牢记的事情 - Heidi Williams, VP Platform at Box](https://blog.box.com/blog/4-things-to-keep-in-mind-when-building-a-platform-for-the-enterprise/)
* [混沌工程原理](https://www.usenix.org/conference/srecon17americas/program/presentation/rosenthal)
* [在混乱中寻找秩序](https://www.usenix.org/conference/srecon16/program/presentation/lueder)
* [12-factor应用](https://12factor.net/) | [原文](./原文/The_Twelve-Factor_APP.md) | [译文](./译文/12-factor应用.md)
* [整洁架构](https://8thlight.com/blog/uncle-bob/2012/08/13/the-clean-architecture.html)
* [高内聚和低耦合](http://www.math-cs.gordon.edu/courses/cs211/lectures-2009/Cohesion,Coupling,MVC.pdf)
* [单体和微服务](https://medium.com/@SkyscannerEng/monoliths-and-microservices-8c65708c3dbf)
* [CAP理论和权衡](http://robertgreiner.com/2014/08/cap-theorem-revisited/)
* [CP数据库和AP数据库](https://blog.andyet.com/2014/10/01/right-database)
* [无状态vs有状态的可扩展性](http://ithare.com/scaling-stateful-objects/)
* [Scale Up vs Scale Out](https://www.brianjgraf.com/2013/05/17/scalability-scale-up-scale-out-care/)
* [Scale Up vs Scale Out: 隐藏的成本](https://blog.codinghorror.com/scaling-up-vs-scaling-out-hidden-costs/)
* [Scaling Out最佳实践](https://blog.openshift.com/best-practices-for-horizontal-application-scaling/)
* [持续交付的最佳实践](https://techblog.rakuten.co.jp/2018/02/06/cd-the-best-practice/)
* [ACID 和 BASE](https://neo4j.com/blog/acid-vs-base-consistency-models-explained/)
* [阻塞/非阻塞和同步/异步](https://blogs.msdn.microsoft.com/csliu/2009/08/27/io-concept-blockingnon-blocking-vs-syncasync/)
* [数据库的性能和可扩展性](https://use-the-index-luke.com/sql/testing-scalability)
* [数据库隔离水平及其对性能和可扩展性的影响](http://highscalability.com/blog/2011/2/10/database-isolation-levels-and-their-effects-on-performance-a.html)
* [大型集群中数据丢失的概率](https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html)
* [高可扩展解决方案的数据访问:使用SQL, NoSQL和Polyglot持久化技术](https://docs.microsoft.com/en-us/previous-versions/msp-n-p/dn271399(v=pandp.10))
* [SQL vs NoSQL](https://www.upwork.com/hiring/data/sql-vs-nosql-databases-whats-the-difference/)
* [SQL vs NoSQL - 来自Salesforce的经验](https://engineering.salesforce.com/sql-or-nosql-9eaf1d92545b)
* [NoSQL数据库: 调查和决策指导](https://medium.baqend.com/nosql-databases-a-survey-and-decision-guidance-ea7823a822d)
* [分片是如何工作的](https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6) | [原文](./原文/how_sharding_works.md) | [译文](./译文/how_sharding_works.md)
* [一致性hash](http://www.tom-e-white.com/2007/11/consistent-hashing.html)
* [一致性哈希:算法上的权衡](https://medium.com/@dgryski/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8)
* [不要被散列技巧欺骗](https://booking.ai/dont-be-tricked-by-the-hashing-trick-192a6aae3087)
* [Netflix的统一一致哈希](https://medium.com/netflix-techblog/distributing-content-to-open-connect-3e3e391d4dc9)
* [最终一致 - Werner Vogels, CTO at Amazon](https://www.allthingsdistributed.com/2008/12/eventually_consistent.html)
* [缓存为王](https://www.stevesouders.com/blog/2012/10/11/cache-is-king/)
* [反缓存](https://www.the-paper-trail.org/post/2014-06-06-paper-notes-anti-caching/)
* [了解延迟](http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it)
* [每个程序员都应该知道的延迟](http://norvig.com/21-days.html#answers)
* [服务可用性演算](https://queue.acm.org/detail.cfm?id=3096459&__s=dnkxuaws9pogqdnxmx8i)
* [扩展Web应用程序时的体系结构问题:瓶颈,数据库,CPU,IO](http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html)
* [常见瓶颈](http://highscalability.com/blog/2012/5/16/big-list-of-20-common-bottlenecks.html)
* [分布式交易之外的生活](https://queue.acm.org/detail.cfm?id=3025012)
* [依靠软件在不同的层级上可靠地重定向流量](https://www.usenix.org/conference/srecon15/program/presentation/taveira)
* [故意打破的东西](https://www.usenix.org/conference/srecon17americas/program/presentation/andrus)
* [避免过度工程化](https://medium.com/@rdsubhas/10-modern-software-engineering-mistakes-bc67fbef4fc8)
* [可扩展性最糟糕的做法](https://www.infoq.com/articles/scalability-worst-practices)
* [使用坚实的技术---不要重新发明车轮---保持简单!](https://medium.com/@DataStax/instagram-engineerings-3-rules-to-a-scalable-cloud-application-architecture-c44afed31406)
* [通过分配复杂性来简化](https://jobs.zalando.com/tech/blog/simplicity-by-distributing-complexity/)
* [为什么过度使用是不好的](http://tech.transferwise.com/why-over-reusing-is-bad/)
* [性能是一种特性](https://blog.codinghorror.com/performance-is-a-feature/)
* [让性能成为你工作流程的一部分](https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/)
* [服务器端渲染比客户端渲染的优点](https://medium.com/walmartlabs/the-benefits-of-server-side-rendering-over-client-side-rendering-5d07ff2cefe8)
* [编写可扩展的代码](https://blog.rackspace.com/writing-code-that-scales)
* [自动化与摘要:在Facebook的经验](https://architecht.io/lessons-from-facebook-on-engineering-for-scale-f5716f0afc7a)
* [AWS应该做的和不应该做的](https://8thlight.com/blog/sarah-sunday/2017/09/15/aws-dos-and-donts.html)
* [(UI)设计没有规模---Spotify的设计总监Stanley Wood](https://medium.com/@hellostanley/design-doesnt-scale-4d81e12cbc3e)
* [Linux性能](http://www.brendangregg.com/linuxperf.html)
* [构建快速、灵活的Web应用 - Ilya Grigorik](https://www.igvita.com/2016/05/20/building-fast-and-resilient-web-applications/)
* [接受部分故障,尽量减少服务损失](https://www.usenix.org/conference/srecon17asia/program/presentation/wang_daxin)
* [松耦合的设计](http://bulgerpartners.com/how-loosely-coupled-architectures-are-helping-the-modernization-of-legacy-software/)
* [弹性设计](http://highscalability.com/blog/2012/12/31/designing-for-resiliency-will-be-so-2013.html)
* [自愈设计](https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/self-healing)
* [Scaling Out的设计](https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/scale-out)
* [演化设计](https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/design-for-evolution)
* [从错误中吸取教训](http://highscalability.com/blog/2013/8/26/reddit-lessons-learned-from-mistakes-made-scaling-to-1-billi.html)

## 伸缩性
* [微服务和组织架构](https://martinfowler.com/microservices/)
* [Riot Games 容器(8 parts) ](https://engineering.riotgames.com/news/thinking-inside-container)
* [Pinterest 容器化](https://medium.com/@Pinterest_Engineering/containerization-at-pinterest-92295347f2f3)
* [Netflix 容器用途的演变](https://medium.com/netflix-techblog/the-evolution-of-container-usage-at-netflix-3abfc096781b)
* [Uber Docker化MySQL](https://eng.uber.com/dockerizing-mysql/)
* [Spotify 微服务测试](https://labs.spotify.com/2018/01/11/testing-of-microservices/)
* [Treehouse的docker应用](https://medium.com/treehouse-engineering/lessons-learned-running-docker-in-production-5dce99ece770)
* [SoundCloud 微服务](https://developers.soundcloud.com/blog/inside-a-soundcloud-microservice)
* [Stripe 可靠地运行Kubernetes](https://stripe.com/blog/operating-kubernetes)
* [Rakuten Kubernetes流量路由(2 parts)](https://techblog.rakuten.co.jp/2017/09/28/k8s-routing2/)
* [Agrarian-Scale Kubernetes (3 parts) at New York Times](https://open.nytimes.com/agrarian-scale-kubernetes-part-3-ee459887ed7e)
* [BBC 纳米服务](https://medium.com/bbc-design-engineering/powering-bbc-online-with-nanoservices-727840ba015b)
* [Bloomberg Kubernetes集群的测试工具PowerfulSeal](https://www.techatbloomberg.com/blog/powerfulseal-testing-tool-kubernetes-clusters/)
* [Netflix 微服务协调器Conductor](https://medium.com/netflix-techblog/netflix-conductor-a-microservices-orchestrator-2e8d4771bf40)
* [Shopify 为超过10万家在线商店提供动力的Docker容器](https://shopifyengineering.myshopify.com/blogs/engineering/docker-at-shopify-how-we-built-containers-that-power-over-100-000-online-shops)
* [Medium微服务架构](https://medium.engineering/microservice-architecture-at-medium-9c33805eb74f)
* [Betabrand 从裸机到Kubernetes](https://boxunix.com/post/bare_metal_to_kube/)
* [Kubernetes在Tinder](https://medium.com/tinder-engineering/tinders-move-to-kubernetes-cda2a6372f44)
* [Pinterest的Kubernetes平台](https://medium.com/pinterest-engineering/building-a-kubernetes-platform-at-pinterest-fb3d9571c948)
* [Nubank的微服务](https://medium.com/building-nubank/microservices-at-nubank-an-overview-2ebcb336c64d)
* [分布式缓存](https://www.wix.engineering/single-post/scaling-to-100m-to-cache-or-not-to-cache)
* [EVCache: Netflix的分布式内存缓存](https://medium.com/netflix-techblog/caching-for-a-global-netflix-7bcc457012f1)
* [Netflix Cache Warmer 基础架构:EVCache](https://medium.com/netflix-techblog/cache-warming-agility-for-a-stateful-service-2d3b1da82642)
* [Box 强大的Memcache流量分析器:Memsniff ](https://blog.box.com/blog/introducing-memsniff-robust-memcache-traffic-analyzer/)
* [Etsy 一致哈希缓存和缓存涂抹](https://codeascraft.com/2017/11/30/how-etsy-caches/)
* [Facebook 照片缓存的分析](https://code.facebook.com/posts/220956754772273/an-analysis-of-facebook-photo-caching/)
* [Facebook 内存高效实践](https://code.facebook.com/posts/964122680272229/web-performance-cache-efficiency-exercise/)
* [tCache: Scalable Data-aware Java Caching at Trivago](http://tech.trivago.com/2015/10/15/tcache/)
* [Trivago 减少50%Memcached内存使用](http://tech.trivago.com/2017/12/19/how-trivago-reduced-memcached-memory-usage-by-50/)
* [Yelp 缓存内部服务调用](https://engineeringblog.yelp.com/2018/03/caching-internal-service-calls-at-yelp.html)
* [Allegro 利用大数据估算缓存效率](https://allegro.tech/2017/01/estimating-the-cache-efficiency-using-big-data.html)
* [Zalando 分布式缓存](https://jobs.zalando.com/tech/blog/distributed-cache-akka-kubernetes/)
* [NetFlix 从RAM到SSD的应用数据缓存](https://medium.com/netflix-techblog/evolution-of-application-data-caching-from-ram-to-ssd-a33d6fa7a690)
* [Skyscanner 复制式缓存的权衡](https://medium.com/@SkyscannerEng/the-tradeoffs-of-a-replicated-cache-b6680c722f58)
* [DoorDash 避开 "缓存风暴"](https://blog.doordash.com/avoiding-cache-stampede-at-doordash-55bbf596d94b)
* [ Yext 使用Quadtrees进行位置缓存](http://engblog.yext.com/post/geolocation-caching)
* [Quoraji 进程内缓存:Pycache](https://engineering.quora.com/Pycache-lightning-fast-in-process-caching)
* [可扩展Redis在Twitter的应用](http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html)
* [Slack 使用Redis扩展任务队列](https://slack.engineering/scaling-slacks-job-queue-687222e9d100)
* [Github 将持久性数据从Redis中移出](https://githubengineering.com/moving-persistent-data-out-of-redis/)
* [Instagram 在 Redis 中存储数以亿计的简单键值对](https://engineering.instagram.com/storing-hundreds-of-millions-of-simple-key-value-pairs-in-redis-1091ae80f74c) | [原文](./原文/Storing_hundreds_of_millions_of_simple_key-value_pairs_in_Redis/Storing_hundreds_of_millions_of_simple_key-value_pairs_in_Redis.md) | [译文](./译文/在redis中存储成千上万个简单的kv.md)
* [Redis在Trivago的应用](http://tech.trivago.com/2017/01/25/learn-redis-the-hard-way-in-production/)
* [Deliveroo 优化Redis存储](https://deliveroo.engineering/2017/01/19/optimising-membership-queries.html)
* [Wattpad Redis中的内存优化](http://engineering.wattpad.com/post/23244724794/store-more-stuff-memory-optimization-in-redis)
* [Heroku使用Redis Fleet](https://blog.heroku.com/rolling-redis-fleet)
* [HTTP缓存和CDN](https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching)
* [Zynga 降低移动游戏延迟 Zynga Geo Proxy](https://www.zynga.com/blogs/engineering/zynga-geo-proxy-reducing-mobile-game-latency)
* [Condé Nast使用Google AMP](https://technology.condenast.com/story/the-why-and-how-of-google-amp-at-conde-nast)
* [Deliveroo Hosting Infrastructure (CDNs)A/B测试](https://deliveroo.engineering/2016/09/19/ab-testing-cdns.html)
* [SoundCloud HAProxy 与 Kubernetes 用于面向用户的流量](https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic)
* [Bandaid:Dropbox 的服务代理](https://blogs.dropbox.com/tech/2018/03/meet-bandaid-the-dropbox-service-proxy/)
* [LINE LIVE 编码器层中的 CDN](https://engineering.linecorp.com/en/blog/detail/230)
* [分布式锁](https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)
* [Google 松耦合分布式系统的锁服务:Chubby](https://blog.acolyer.org/2015/02/13/the-chubby-lock-service-for-loosely-coupled-distributed-systems/)
* [Uber 分布式锁](https://www.youtube.com/watch?v=MDuagr729aU)
* [GoSquared 使用redis实现分布式锁 ](https://engineering.gosquared.com/distributed-locks-using-redis)
* [Twitter ZooKeeper使用](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/zookeeper-at-twitter.html)
* [Chartio 使用分布式锁定消除重复查询](https://blog.chartio.com/posts/eliminating-duplicate-queries-using-distributed-locking)
* [分布式链路和追踪](https://www.oreilly.com/ideas/understanding-the-value-of-distributed-tracing)
* [Zipkin: Twitter分布式链路追踪](https://blog.twitter.com/engineering/en_us/a/2012/distributed-systems-tracing-with-zipkin.html)
* [SoundCloud 使用Kubernetes Pod元数据改善Zipkin跟踪](https://developers.soundcloud.com/blog/using-kubernetes-pod-metadata-to-improve-zipkin-traces)
* [Facebook可扩展的分布式跟踪和分析:Canopy](https://www.infoq.com/presentations/canopy-scalable-tracing-analytics-facebook)
* [Pintrace: Pinterest的分布式追踪](https://medium.com/@Pinterest_Engineering/distributed-tracing-at-pinterest-with-new-open-source-tools-a4f8a5562f6b)
* [LinkedIn实时分布式追踪](https://engineering.linkedin.com/distributed-service-call-graph/real-time-distributed-tracing-website-performance-and-efficiency)
* [Shopify 大规模跟踪服务基础架构](https://www.usenix.org/conference/srecon17americas/program/presentation/arthorne)
* [HelloFresh 分布式追踪](https://engineering.hellofresh.com/scaling-hellofresh-distributed-tracing-7b182928247d)
* [Pinterest 分析分布式链路数据 ](https://medium.com/@Pinterest_Engineering/analyzing-distributed-trace-data-6aae58919949)
* [Uber 分布式追踪](https://eng.uber.com/distributed-tracing/)
* [Uber 跟踪分布式JVM应用程序:JVM Profiler](https://eng.uber.com/jvm-profiler/)
* [Data Checking at Dropbox](https://www.usenix.org/conference/srecon17asia/program/presentation/mah)
* [Showmax分布书追踪](https://tech.showmax.com/2016/10/tracing-distributed-systems-at-showmax/)
* [Palantir osquery在企业中的应用 ](https://medium.com/@palantir/osquery-across-the-enterprise-3c3c9d13ec55)
* [StatsD在Etsy](https://codeascraft.com/2011/02/15/measure-anything-measure-everything/)
* [StatsD在DoorDash](https://blog.doordash.com/scaling-statsd-84d456a7cc2a)
* [分布式调度](https://www.csee.umbc.edu/courses/graduate/CMSC621/fall02/lectures/ch11.pdf)
* [Google 构建Cron](https://landing.google.com/sre/sre-book/chapters/distributed-periodic-scheduling/)
* [Quora 分布式调度架构](https://engineering.quora.com/Quoras-Distributed-Cron-Architecture)
* [Airbnb : Cron的替代品: Chronos](https://medium.com/airbnb-engineering/chronos-a-replacement-for-cron-f05d7d986a9d)
* [Scheduler在Nextdoor](https://engblog.nextdoor.com/we-don-t-run-cron-jobs-at-nextdoor-6f7f9cc62040)
* [Uber 面向不同集群工作负载的统一资源调度器: Peloton ](https://eng.uber.com/peloton/)
* [Netflix 用于Apache Mesos框架的OSS调度器: Fenzo ](https://medium.com/netflix-techblog/fenzo-oss-scheduler-for-apache-mesos-frameworks-5c340e77e543)
* [Airflow](https://airflow.apache.org/)
* [Airflow在Airbnb](https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8)
* [Airflow在Pandora](https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee)
* [Airflow在Robinhood](https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8)
* [Airflow在Lyft](https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff)
* [Airflow在Drivy](https://drivy.engineering/airflow-architecture/)
* [Airflow在Grab](https://engineering.grab.com/experimentation-platform-data-pipeline)
* [Airflow在Adobe](https://medium.com/adobetech/adobe-experience-platform-orchestration-service-with-apache-airflow-952203723c0b)
* [Auditing Airflow Job Runs at Walmart](https://medium.com/walmartlabs/auditing-airflow-batch-jobs-73b45100045)
* [MaaT: Alibaba 基于 DAG 的分布式任务调度器](https://hackernoon.com/meet-maat-alibabas-dag-based-distributed-task-scheduler-7c9cf0c83438)
* [boundary-layer: Etsy 的声明式 Airflow Workflows](https://codeascraft.com/2018/11/14/boundary-layer%e2%80%89-declarative-airflow-workflows/)
* [分布式监控与告警](https://www.oreilly.com/ideas/monitoring-distributed-systems)
* [Alibaba 监控系统](https://www.usenix.org/conference/srecon18asia/presentation/xinchi)
* [Dailymotion 实时用户监控](https://medium.com/dailymotion/real-user-monitoring-1948375f8be5)
* [Uber 预警系统](https://eng.uber.com/observability-at-scale/)
* [SoundCloud 服务级别目标 (SLO) 警报](https://developers.soundcloud.com/blog/alerting-on-slos)
* [Uber 用于可观察性异常检测的基于作业的预测工作流](https://eng.uber.com/observability-anomaly-detection/)
* [HackerEarth 使用 Graphite 和 Cabot 的监控和警报系统](http://engineering.hackerearth.com/2017/03/21/monitoring-and-alert-system-using-graphite-and-cabot/)
* [Securitybot:Dropbox 的分布式警报机器人](https://blogs.dropbox.com/tech/2017/02/meet-securitybot-open-sourcing-automated-security-at-scale/)
* [Twitter 上的可观察性(2 部分)](https://blog.twitter.com/engineering/en_us/a/2016/observability-at-twitter-technical-overview-part-ii.html)
* [Slack分布式安全告警](https://slack.engineering/distributed-security-alerting-c89414c992d6)
* [Bloomberg 实时新闻提醒](https://www.infoq.com/presentations/news-alerting-bloomberg)
* [Unicorn: eBay 的修复系统](https://www.ebayinc.com/stories/blogs/tech/unicorn-rheos-remediation-center/)
* [M3:Uber的指标和监控平台](https://eng.uber.com/optimizing-m3/)
* [Athena:Dropbox 的自动化构建健康管理系统](https://blogs.dropbox.com/tech/2019/05/athena-our-automated-build-health-management-system/)
* [Nuage:LinkedIn 的云管理服务](https://engineering.linkedin.com/blog/2019/solving-manageability-challenges-with-nuage)
* [ThirdEye: LinkedIn监控平台](https://engineering.linkedin.com/blog/2019/06/smart-alerts-in-thirdeye--linkedins-real-time-monitoring-platfor)
* [分布式安全](https://msdn.microsoft.com/en-us/library/cc767123.aspx)
* [Dropbox 的大规模安全方法](https://blogs.dropbox.com/tech/2018/02/security-at-scale-the-dropbox-approach/)
* [Aardvark 和 Repokid:用于 Netflix 分布式高速开发的 AWS 最低权限](https://medium.com/netflix-techblog/introducing-aardvark-and-repokid-53b081bf3a7e)
* [LISA:LinkedIn 的分布式防火墙](https://www.slideshare.net/MikeSvoboda/2017-lisa-linkedins-distributed-firewall-dfw)
* [Coinbase 云中存储比特币的安全基础设施](https://engineering.coinbase.com/how-coinbase-builds-secure-infrastructure-to-store-bitcoin-in-the-cloud-30a6504e40ba)
* [BinaryAlert:Airbnb 的实时无服务器恶意软件检测](https://medium.com/airbnb-engineering/binaryalert-real-time-serverless-malware-detection-ca44370c1b90)
* [可扩展的 IAM 架构以保护对 Segment 中 100 个 AWS 账户的访问](https://segment.com/blog/secure-access-to-100-aws-accounts/)
* [Indeed 的 OAuth 审计工具箱](http://engineering.indeedblog.com/blog/2018/04/oaudit-toolbox/)
* [Yelp 的 Active Directory 密码黑名单](https://engineeringblog.yelp.com/2018/04/ad-password-blacklisting.html)
* [Slack 的大规模系统调用审计](https://slack.engineering/syscall-auditing-at-scale-e6a3ca8ac1b8)
* [Athenz:雅虎的细粒度、基于角色的访问控制](https://yahooeng.tumblr.com/post/160481899076/open-sourcing-athenz-fine-grained-role-based)
* [WebAuthn 支持 Dropbox 安全登录](https://blogs.dropbox.com/tech/2018/05/introducing-webauthn-support-for-secure-dropbox-sign-in/)
* [Slack 的安全开发生命周期 (SDL)](https://slack.engineering/moving-fast-and-securing-things-540e6c5ae58a)
* [Kinvolk 的非特权容器构建](https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/)
* [Diffy:Netflix 云中数字取证的差分引擎](https://medium.com/netflix-techblog/netflix-sirt-releases-diffy-a-differencing-engine-for-digital-forensics-in-the-cloud-37b71abd2698)
* [在 Netflix 的 AWS 中检测凭据泄露](https://medium.com/netflix-techblog/netflix-cloud-security-detecting-credential-compromise-in-aws-9493d6fd373a)
* [Spotify 可扩展的用户隐私](https://labs.spotify.com/2018/09/18/scalable-user-privacy/)
* [AVA:在 Indeed 审计 Web 应用程序](https://engineering.indeedblog.com/blog/2018/09/application-scanning/)
* [TTL 即服务:自动撤销 Yelp 的陈旧权限](https://engineeringblog.yelp.com/2018/11/ttl-as-a-service.html)
* [Slack 的企业密钥管理](https://slack.engineering/engineering-dive-into-slack-enterprise-key-management-1fce471b178c)
* [分布式消息](https://arxiv.org/pdf/1704.00411.pdf)
* [Cape: Dropbox事件流处理框架](https://blogs.dropbox.com/tech/2017/05/introducing-cape/)
* [Brooklin:LinkedIn 近实时数据流的分布式服务](https://engineering.linkedin.com/blog/2019/brooklin-open-source)
* [Samza:LinkedIn 的延迟洞察流处理系统](https://engineering.linkedin.com/blog/2018/04/samza-aeon--latency-insights-for-asynchronous-one-way-flows)
* [Bullet:雅虎流数据的前瞻性查询引擎](https://yahooeng.tumblr.com/post/161855616651/open-sourcing-bullet-yahoos-forward-looking)
* [EventHorizon:用于在 Etsy 上观看事件流的工具](https://codeascraft.com/2018/05/29/the-eventhorizon-saga/)
* [Qmessage: Quora分布式异步任务队列](https://engineering.quora.com/Qmessage-Handling-Billions-of-Tasks-Per-Day)
* [Cherami:用于在 Uber 传输异步任务的消息队列系统](https://eng.uber.com/cherami/)
* [Riot Games消息服务](https://engineering.riotgames.com/news/riot-messaging-service)
* [在 Zillow 使用事件日志调试生产](https://www.zillow.com/engineering/debugging-production-event-logging/)
* [Netflix 的跨平台应用内消息编排服务](https://medium.com/netflix-techblog/building-a-cross-platform-in-app-messaging-orchestration-service-86ba614f92d8)
* [Netflix 的视频看门人](https://medium.com/netflix-techblog/re-architecting-the-video-gatekeeper-f7b0ac2f6b00)
* [在 Netflix 为数百万台设备扩展推送消息](https://www.infoq.com/presentations/neflix-push-messaging-scale)
* [在 Indeed 上使用 RabbitMQ 延迟异步消息处理](http://engineering.indeedblog.com/blog/2017/06/delaying-messages/)
* [雅虎对流计算引擎的基准测试](https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at)
* [在 Deliveroo 使用 Protobuf 模式验证提高流数据质量](https://deliveroo.engineering/2019/02/05/improving-stream-data-quality-with-protobuf-schema-validation.html)
* [事件驱动消息](https://martinfowler.com/articles/201701-event-driven.html)
* [Alibaba 领域驱动设计](https://medium.com/swlh/creating-coding-excellence-with-domain-driven-design-88f73d2232c3)
* [Weebly 领域驱动设计](https://medium.com/weebly-engineering/how-to-organize-your-monolith-before-breaking-it-into-services-69cbdb9248b0)
* [Moonpig 领域驱动设计](https://engineering.moonpig.com/development/modelling-for-domain-driven-design)
* [Netflix 下载的扩展事件溯源](https://www.infoq.com/presentations/netflix-scale-event-sourcing)
* [Jet.com 扩展事件溯源](https://medium.com/@eulerfx/scaling-event-sourcing-at-jet-9c873cac33b8)
* [eBay 事件溯源(2部分)](https://www.ebayinc.com/stories/blogs/tech/event-sourcing-in-action-with-ebays-continuous-delivery-team/)
* [mytaxi 事件源](https://inside.mytaxi.com/event-sourcing-an-evolutionary-perspective-31e7387aa6f1)
* [Brainly 使用事件溯源和 CQRS 模式的可扩展内容提要](https://medium.com/engineering-brainly/scalable-content-feed-using-event-sourcing-and-cqrs-patterns-e09df98bf977)
* [发布订阅消息](https://aws.amazon.com/pub-sub-messaging/)
* [Pulsar:在雅虎大规模发布-订阅消息](https://yahooeng.tumblr.com/post/150078336821/open-sourcing-pulsar-pub-sub-messaging-at-scale)
* [虫洞:Facebook 的发布订阅系统](https://code.facebook.com/posts/188966771280871/wormhole-pub-sub-system-moving-data-through-space-and-time/)
* [LINE 聊天架构中的 Pub-Sub](https://engineering.linecorp.com/en/blog/detail/85)
* [Kafka the Message Broker](https://martin.kleppmann.com/papers/kafka-debull15.pdf)
* [Kafka在LinkedIn](https://engineering.linkedin.com/kafka/running-kafka-scale)
* [Kafka在 Pinterest](https://medium.com/pinterest-engineering/how-pinterest-runs-kafka-at-scale-ff9c6f735be)
* [Kafka在Trello](https://tech.trello.com/why-we-chose-kafka/)
* [Kafka在Salesforce](https://engineering.salesforce.com/how-apache-kafka-inspired-our-platform-events-architecture-2f351fe4cf63)
* [Kafka在Rakuten](https://techblog.rakuten.co.jp/2016/01/28/rakuten-paas-kafka/)
* [Kafka在纽约时报](https://open.nytimes.com/publishing-with-apache-kafka-at-the-new-york-times-7f0e3b7d2077)
* [Kafka在Yelp](https://engineeringblog.yelp.com/2016/07/billions-of-messages-a-day-yelps-real-time-data-pipeline.html)
* [Shopify Kubernetes下的Kafka](https://shopifyengineering.myshopify.com/blogs/engineering/running-apache-kafka-on-kubernetes-at-shopify)
* [在 Yelp 不停机的情况下迁移 Kafka 的 Zookeeper](https://engineeringblog.yelp.com/2019/01/migrating-kafkas-zookeeper-with-no-downtime.html)
* [Uber 使用 Kafka 重新处理和死信队列](https://eng.uber.com/reliable-reprocessing/)
* [Chaperone: Uber 审计 Kafka 端到端 ](https://eng.uber.com/chaperone/)
* [在 Dropbox 的基础设施中查找 Kafka 吞吐量限制](https://blogs.dropbox.com/tech/2019/01/finding-kafkas-throughput-limit-in-dropbox-infrastructure/)
* [Walmart 成本编排](https://medium.com/walmartlabs/cost-orchestration-at-walmart-f34918af67c4)
* [Hulu InfluxDB 和 Kafka 每秒可扩展到超过 100 万个指标](https://medium.com/hulu-tech-blog/how-hulu-uses-influxdb-and-kafka-to-scale-to-over-1-million-metrics-a-second-1721476aaff5)
* [流数据重复删除](https://en.wikipedia.org/wiki/Data_deduplication)
* [Exactly-once Semantics with Kafka](https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/)
* [Tapjoy 实时去重](http://eng.tapjoy.com/blog-list/real-time-deduping-at-scale)
* [Segment 重复数据删除](https://segment.com/blog/exactly-once-delivery/)
* [Mail.Ru 重复数据删除](https://medium.com/@andrewsumin/efficient-storage-how-we-went-down-from-50-pb-to-32-pb-99f9c61bf6b4)
* [分布式日志](https://blog.codinghorror.com/the-problem-with-logging/)
* [LinkedIn日志](https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying)
* [Pinterest 可扩展且可靠的日志摄取](https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754)
* [Twitter 高性能日志复制服务](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2015/building-distributedlog-twitter-s-high-performance-replicated-log-servic.html)
* [CERN Accelerator 使用 Spark 的日志服务](https://databricks.com/blog/2017/12/14/the-architecture-of-the-next-cern-accelerator-logging-service.html)
* [Quora 日志和聚合](https://engineering.quora.com/Logging-and-Aggregation-at-Quora)
* [BookKeeper: Yahoo分布式日志存储](https://yahooeng.tumblr.com/post/109908973316/bookkeeper-yahoos-distributed-log-storage-is)
* [LogDevice: Facebook分布式日志数据存储](https://code.facebook.com/posts/357056558062811/logdevice-a-distributed-data-store-for-logs/)
* [LogFeeder: Yelp日志收集系统](https://engineeringblog.yelp.com/2018/03/introducing-logfeeder.html)
* [Badoo 守护进程日志的收集与分析](https://badoo.com/techblog/blog/2016/06/06/collection-and-analysis-of-daemon-logs-at-badoo/)
* [Palantir 使用静态代码分析进行日志解析](https://medium.com/palantir/using-static-code-analysis-to-improve-log-parsing-18f0d1843965)
* [分布式搜索](http://nwds.cs.washington.edu/files/nwds/pdf/Distributed-WR.pdf)
* [Instagram的搜索架构](https://instagram-engineering.com/search-architecture-eeb34a936d3a)
* [eBay 搜索架构](http://www.cs.otago.ac.nz/homepages/andrew/papers/2017-8.pdf)
* [Box 搜索架构](https://medium.com/box-tech-blog/scaling-box-search-using-lumos-22d9e0cb4175)
* [Pinterest 通用搜索系统](https://medium.com/pinterest-engineering/building-a-universal-search-system-for-pinterest-e4cb03a898d4)
* [eBay 将搜索引擎效率提高25%以上 ](https://www.ebayinc.com/stories/blogs/tech/making-e-commerce-search-faster/)
* [Palantir 使用 Lucene 索引和查询遥测日志](https://medium.com/palantir/indexing-and-querying-telemetry-logs-with-lucene-234c5ce3e5f3)
* [LinkedIn 搜索联合架构(2018)](https://engineering.linkedin.com/blog/2018/03/search-federation-architecture-at-linkedin)
* [Slack 搜索](https://slack.engineering/search-at-slack-431f8c80619e)
* [DoorDash 搜索和推荐 ](https://blog.doordash.com/powering-search-recommendations-at-doordash-8310c5cfd88c)
* [Twitter 搜素服务(2014)](https://blog.twitter.com/engineering/en_us/a/2014/building-a-complete-tweet-index.html)
* [Traveloka 自动完成搜索(2 部分)](https://medium.com/traveloka-engineering/high-quality-autocomplete-search-part-2-d5b15bb0dadf)
* [Canva 数据驱动的自动更正系统](https://product.canva.com/building-a-data-driven-autocorrection-system/)
* [Dropbox 搜索引擎:Nautilus](https://blogs.dropbox.com/tech/2018/09/architecture-of-nautilus-the-new-dropbox-search-engine/)
* [LinkedIn 搜素架构: Galene](https://engineering.linkedin.com/search/did-you-mean-galene)
* [Manas:Pinterest 的高性能定制搜索系统](https://medium.com/@Pinterest_Engineering/manas-a-high-performing-customized-search-system-cf189f6ca40f)
* [Sherlock:Flipkart 的近实时搜索索引](https://tech.flipkart.com/sherlock-near-real-time-search-indexing-95519783859d)
* [Nebula:用于在 Airbnb 上构建搜索后端的存储平台](https://medium.com/airbnb-engineering/nebula-as-a-storage-platform-to-build-airbnbs-search-backends-ecc577b05f06)
* [ELK (Elasticsearch, Logstash, Kibana) Stack](https://logz.io/blog/15-tech-companies-chose-elk-stack/)
* [Uber ELK 实时预测](https://eng.uber.com/elk/)
* [Envato 构建可扩展的 ELK 栈](https://webuild.envato.com/blog/building-a-scalable-elk-stack/)
* [ELK在Robinhood](https://robinhood.engineering/taming-elk-4e1349f077c3)
* [Uber 弹性Elasticsearch集群](https://www.infoq.com/presentations/uber-elasticsearch-clusters?utm_source=presentations_about_Case_Study&utm_medium=link&utm_campaign=Case_Study)
* [eBay Elasticsearch 性能调优实践](https://www.ebayinc.com/stories/blogs/tech/elasticsearch-performance-tuning-practice-at-ebay/)
* [Elasticsearch在Kickstarter](https://kickstarter.engineering/elasticsearch-at-kickstarter-db3c487887fc)
* [Elasticsearch在Target](https://tech.target.com/2017/05/25/elasticsearch-cloud.html)
* [Trivago 使用 Logstash和Google protobuf进行日志解析](https://tech.trivago.com/2016/01/19/logstash_protobuf_codec/)
* [Yelp 使用数据管道和Elasticsearch进行快速订单搜索](https://engineeringblog.yelp.com/2018/06/fast-order-search.html)
* [Yelp 将核心业务搜索迁移到 Elasticsearch](https://engineeringblog.yelp.com/2017/06/moving-yelps-core-business-search-to-elasticsearch.html)
* [Vinted 分片 Elasticsearch](http://engineering.vinted.com/2017/06/05/sharding-out-elasticsearch/)
* [Wattpad 使用 Elasticsearch 进行自我排名搜索](http://engineering.wattpad.com/post/146216619727/self-ranking-search-with-elasticsearch-at-wattpad)
* [Redmart 升级 Elasticsearch(3 部分)](http://geeks.redmart.com/2018/12/11/upgrading-elasticsearch-at-redmart-pt-3-testing-customer-reactions/)
* [Vulcanizer:一个在 Github 上运行 Elasticsearch 的库](https://github.blog/2019-03-05-vulcanizer-a-library-for-operating-elasticsearch/)
* [分布式存储](http://highscalability.com/blog/2011/11/1/finding-the-right-data-solution-for-your-application-in-the.html)
* [内存存储](https://medium.com/@denisanikin/what-an-in-memory-database-is-and-how-it-persists-data-efficiently-f43868cff4c1)
* [MemSQL 架构 - 快速(MVCC、InMem、LockFree、CodeGen)和熟悉的 (SQL)](http://highscalability.com/blog/2012/8/14/memsql-architecture-the-fast-mvcc-inmem-lockfree-codegen-and.html)
* [在 Quora 优化 Memcached 效率](https://engineering.quora.com/Optimizing-Memcached-Efficiency)
* [在 Cisco UCS 上使用 MemSQL 的实时数据仓库](https://blogs.cisco.com/datacenter/memsql)
* [Tapjoy 迁移到 MemSQL](http://eng.tapjoy.com/blog-list/moving-to-memsql)
* [Disney 用于实时洞察的 MemSQL 和 Kinesis](https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/68131)
* [Pandora MemSQL 在仪表板中查询数千亿行](https://engineering.pandora.com/using-memsql-at-pandora-79a86cb09b57)
* [对象存储](http://www.datacenterknowledge.com/archives/2013/10/04/object-storage-the-future-of-scale-out)
* [Uber可伸缩的HDFS](https://eng.uber.com/scaling-hdfs/)
* [Reasons for Choosing S3 over HDFS at Databricks](https://databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html)
* [Quantcast基于S3的文件系统](https://www.quantcast.com/blog/quantcast-file-system-on-amazon-s3/)
* [Image Recovery at Scale Using S3 Versioning at Trivago](https://tech.trivago.com/2018/09/03/efficient-image-recovery-at-scale-using-amazon-s3-versioning/)
* [Yahoo 云对象存储](https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at)
* [LinkedIn 分布式不可变对象存储: Ambry](https://www.usenix.org/conference/srecon17americas/program/presentation/shenoy)
* [LinkedIn 在最小的硬件上对HDFS进行规模化测试,实现最大的保真度: Dynamometer](https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum)
* [Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb](https://medium.com/airbnb-engineering/hammerspace-persistent-concurrent-off-heap-storage-3db39bb04472)
* [MezzFS: Mounting Object Storage in Media Processing Platform at Netflix](https://medium.com/netflix-techblog/mezzfs-mounting-object-storage-in-netflixs-media-processing-platform-cda01c446ba)
* [Magic Pocket: In-house Multi-exabyte Storage System at Dropbox](https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pocket/)
* [关系数据库](https://www.mysql.com/products/cluster/scalability.html)
* [MySQL for Schema-less Data at FriendFeed](https://backchannel.org/blog/friendfeed-schemaless-mysql)
* [Pinterest的MySQL应用](https://medium.com/@Pinterest_Engineering/learn-to-stop-using-shiny-new-things-and-love-mysql-3e1613c2ce14)
* [Twitch的PostgreSQL应用](https://blog.twitch.tv/how-twitch-uses-postgresql-c34aa9e56f58)
* [Scaling MySQL-based Financial Reporting System at Airbnb](https://medium.com/airbnb-engineering/tracking-the-money-scaling-financial-reporting-at-airbnb-6d742b80f040)
* [Scaling MySQL at Wix](https://www.wix.engineering/single-post/scaling-to-100m-mysql-is-a-better-nosql)
* [MaxScale (MySQL) Database Proxy at Airbnb](https://medium.com/airbnb-engineering/unlocking-horizontal-scalability-in-our-web-serving-tier-d907449cdbcf)
* [Uber 从Postgres到MySQL](https://eng.uber.com/mysql-migration/)
* [Handling Growth with Postgres at Instagram](https://engineering.instagram.com/handling-growth-with-postgres-5-tips-from-instagram-d5d7e7ffdfcb)
* [Scaling the Analytics Database (Postgres) at TransferWise](http://tech.transferwise.com/scaling-our-analytics-database/)
* [Updating a 50 Terabyte PostgreSQL Database at Adyen](https://medium.com/adyen/updating-a-50-terabyte-postgresql-database-f64384b799e7)
* [Scaling Database Access for 100s of Billions of Queries per Day at PayPal](https://medium.com/paypal-engineering/scaling-database-access-for-100s-of-billions-of-queries-per-day-paypal-introducing-hera-e192adacda54)
* [副本](https://m.alphasights.com/a-primer-on-database-replication-381b319cd032)
* [MySQL Parallel Replication (4 parts) at Booking.com](https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-4-annex-under-the-hood-eb456cf8b2fb)
* [Mitigating MySQL Replication Lag and Reducing Read Load at Github](https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/)
* [Black-Box Auditing: Verifying End-to-End Replication Integrity between MySQL and Redshift at Yelp](https://engineeringblog.yelp.com/2018/04/black-box-auditing.html)
* [Monitoring MySQL Delayed Replication at IMVU](https://engineering.imvu.com/2013/01/09/monitoring-delayed-replication-with-a-focus-on-mysql/)
* [Partitioning Main MySQL Database at Airbnb](https://medium.com/airbnb-engineering/how-we-partitioned-airbnb-s-main-database-in-two-weeks-55f7e006ff21)
* [Herb: Multi-DC Replication Engine for Schemaless Datastore at Uber](https://eng.uber.com/herb-datacenter-replication/)
* [分片](https://quabase.sei.cmu.edu/mediawiki/index.php/Shard_data_set_across_multiple_servers_(Range-based))
* [Pinterest MySQL分片](https://medium.com/@Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f)
* [Twilio MySQL分片](https://www.twilio.com/engineering/2014/06/26/how-we-replaced-our-data-pipeline-with-zero-downtime)
* [Square MySQL分片](https://medium.com/square-corner-blog/sharding-cash-10280fa3ef3b)
* [Sharding Layer of Schemaless Datastore at Uber](https://eng.uber.com/schemaless-rewrite/)
* [Sharding & IDs at Instagram](https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c)
* [Box 提高批量索引的性能:Solr](https://blog.box.com/blog/solr-improving-performance-batch-indexing/)
* [Geosharded Recommendations (3 parts) at Tinder](https://medium.com/tinder-engineering/geosharded-recommendations-part-3-consistency-2d2cb2f0594b)
* [Presto分布式SQL查询引擎](https://research.fb.com/wp-content/uploads/2019/03/Presto-SQL-on-Everything.pdf?)
* [Presto在Pinterest](https://medium.com/@Pinterest_Engineering/presto-at-pinterest-a8bda7515e52)
* [Lyft的Presto基础架构](https://eng.lyft.com/presto-infrastructure-at-lyft-b10adb9db01)
* [Presto在Grab](https://engineering.grab.com/scaling-like-a-boss-with-presto)
* [Uber 使用Presto和Apache Parquet进行工程数据分析 ](https://eng.uber.com/presto/)
* [Slack 数据整理](https://slack.engineering/data-wrangling-at-slack-f2e0ff633b69)
* [Netflix AWS上的大数据平台中的Presto](https://medium.com/netflix-techblog/using-presto-in-our-big-data-platform-on-aws-938035909fd4)
* [非关系数据库](https://www.thoughtworks.com/insights/blog/nosql-databases-overview)
* [KV数据库](http://www.cs.ucsb.edu/~agrawal/fall2009/dynamo.pdf)
* [DynamoDB在Nike](https://medium.com/nikeengineering/becoming-a-nimble-giant-how-dynamo-db-serves-nike-at-scale-4cc375dbb18e)
* [DynamoDB在Segment](https://segment.com/blog/the-million-dollar-eng-problem/)
* [DynamoDB在Mapbox](https://blog.mapbox.com/scaling-mapbox-infrastructure-with-dynamodb-streams-d53eabc5e972)
* [Manhattan: Twitter分布式KV数据库](https://blog.twitter.com/engineering/en_us/a/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale.html)
* [Sherpa: Distributed NoSQL Key-Value Store at Yahoo](https://yahooeng.tumblr.com/post/120730204806/sherpa-scales-new-heights)
* [HaloDB: Embedded Key-Value Storage Engine at Yahoo](https://yahooeng.tumblr.com/post/178262468576/introducing-halodb-a-fast-embedded-key-value)
* [MPH: Fast and Compact Immutable Key-Value Stores at Indeed](http://engineering.indeedblog.com/blog/2018/02/indeed-mph/)
* [zBase: High Performance, Elastic, Distributed Key-Value Store at Zynga](https://www.zynga.com/blogs/engineering/zbase-high-performance-elastic-distributed-key-value-store-2)
* [Venice: Distributed Key-Value Database at Linkedin](https://engineering.linkedin.com/blog/2017/02/building-venice-with-apache-helix)
* [列式数据库](https://aws.amazon.com/nosql/columnar/)
* [Cassandra](http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf)
* [Cassandra在Instagram](https://www.slideshare.net/DataStax/cassandra-at-instagram-2016)
* [Walmart 使用Cassandra存储图片](https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593)
* [Storing Messages with Cassandra at Discord](https://blog.discordapp.com/how-discord-stores-billions-of-messages-7fa6ec7ee4c7)
* [Scaling Cassandra Cluster at Walmart](https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-a71ca01f8c04)
* [Scaling Ad Analytics with Cassandra at Yelp](https://engineeringblog.yelp.com/2016/08/how-we-scaled-our-ad-analytics-with-cassandra.html)
* [Scaling to 100+ Million Reads/Writes using Spark and Cassandra at Dream11](https://medium.com/dream11-tech-blog/leaderboard-dream11-4efc6f93c23e)
* [Moving Food Feed from Redis to Cassandra at Zomato](https://www.zomato.com/blog/how-we-moved-our-food-feed-from-redis-to-cassandra)
* [Benchmarking Cassandra Scalability on AWS at Netflix](https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e)
* [Service Decomposition at Scale with Cassandra at Intuit QuickBooks](https://quickbooks-engineering.intuit.com/service-decomposition-at-scale-70405ac2f637)
* [Cassandra for Keeping Counts In Sync at SoundCloud](https://developers.soundcloud.com/blog/keeping-counts-in-sync)
* [cstar: Cassandra Orchestration Tool at Spotify](https://labs.spotify.com/2018/09/04/introducing-cstar-the-spotify-cassandra-orchestration-tool-now-open-source/)
* [HBase](https://hbase.apache.org/)
* [HBase在Salesforce](https://engineering.salesforce.com/investing-in-big-data-apache-hbase-b9d98661a66b)
* [HBase在Facebook Messages](https://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919/)
* [HBase在 Imgur Notification](https://blog.imgur.com/2015/09/15/tech-tuesday-imgur-notifications-from-mysql-to-hbase/)
* [Improving HBase Backup Efficiency at Pinterest](https://medium.com/@Pinterest_Engineering/improving-hbase-backup-efficiency-at-pinterest-86159da4b954)
* [HBase在小米](https://www.slideshare.net/HBaseCon/hbase-practice-at-xiaomi)
* [Redshift](https://www.allthingsdistributed.com/2018/11/amazon-redshift-performance-optimization.html)
* [Redshift at GIPHY](https://engineering.giphy.com/scaling-redshift-without-scaling-costs/)
* [Redshift at Hudl](https://www.hudl.com/bits/the-low-hanging-fruit-of-redshift-performance)
* [Redshift at Drivy](https://drivy.engineering/redshift_tips_ticks_part_1/)
* [文档数据库](https://msdn.microsoft.com/en-us/magazine/hh547103.aspx)
* [eBay: 使用MongoDB构建关键任务型多数据中心应用程序](https://www.mongodb.com/blog/post/ebay-building-mission-critical-multi-data-center-applications-with-mongodb)
* [MongoDB在Baidu: 多租户集群在160个磁盘上存储2000多亿文档](https://www.mongodb.com/blog/post/mongodb-at-baidu-powering-100-apps-across-600-nodes-at-pb-scale)
* [Addepar 迁移Mongo数据](https://medium.com/build-addepar/migrating-mountains-of-mongo-data-63e530539952)
* [The AWS and MongoDB Infrastructure of Parse (acquired by Facebook)](https://medium.baqend.com/parse-is-gone-a-few-secrets-about-their-infrastructure-91b3ab2fcf71)
* [LinkedIn Couchbase生态](https://engineering.linkedin.com/blog/2017/12/couchbase-ecosystem-at-linkedin)
* [SimpleDB在Zendesk](https://medium.com/zendesk-engineering/resurrecting-amazon-simpledb-9404034ec506)
* [LinkedIn 分布式文档存储: Espresso](https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store)
* [图数据库](https://www.eecs.harvard.edu/margo/papers/systor13-bench/)
* [Twitter 分布式图数据库: FlockDB](https://blog.twitter.com/engineering/en_us/a/2010/introducing-flockdb.html)
* [Facebook 社交图表的分布式数据存储: TAO ](https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/11730-atc13-bronson.pdf)
* [eBay 分布式知识图库 Akutan ](https://tech.ebayinc.com/engineering/akutan-a-distributed-knowledge-graph-store/)
* [时间序列数据库](https://www.influxdata.com/time-series-database/)
* [Facebook 高性能的时间序列存储引擎: Beringei ](https://code.facebook.com/posts/952820474848503/beringei-a-high-performance-time-series-storage-engine/)
* [Twitter 用于存储指标的时间序列数据库: MetricsDB ](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/metricsdb.html)
* [Netflix 内存中的时间序列数据库: Atlas](https://medium.com/netflix-techblog/introducing-atlas-netflixs-primary-telemetry-platform-bd31f4d8ed9a)
* [Spotify 时间序列数据库: Heroic](https://labs.spotify.com/2015/11/17/monitoring-at-spotify-introducing-heroic/)
* [SoundCloud 时间序列事件的分布式存储系统:Roshi](https://developers.soundcloud.com/blog/roshi-a-crdt-system-for-timestamped-events)
* [Pinterest 时间序列数据库: Goku](https://medium.com/@Pinterest_Engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181)
* [Netflix 可扩展时间序列数据存储(2 parts) ](https://medium.com/netflix-techblog/scaling-time-series-data-storage-part-ii-d67939655586)
* [实时分析数据库 (Druid)](https://druid.apache.org/)
* [Druid在 Airbnb](https://medium.com/airbnb-engineering/druid-airbnb-data-platform-601c312f2a4c)
* [Druid在Walmart](https://medium.com/walmartlabs/event-stream-analytics-at-walmart-with-druid-dcf1a37ceda7)
* [Druid在eBay](https://tech.ebayinc.com/engineering/monitoring-at-ebay-with-druid/)
* [分布式存储仓库、依赖库和配置管理](https://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/)
* [Github 分布式Git: DGit](https://githubengineering.com/introducing-dgit/)
* [Palantir 分布式Git服务: Stemma](https://medium.com/@palantir/stemma-distributed-git-server-70afbca0fc29)
* [Flickr 分布式系统的配置管理 ](https://code.flickr.net/2016/03/24/configuration-management-for-distributed-systems-using-github-and-cfg4j/)
* [Microsoft Git仓库](https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-largest-git-repo-on-the-planet/)
* [Microsoft 用大型存储库解决Git问题 ](https://www.infoq.com/news/2017/02/GVFS)
* [Google 单一存储库](https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext)
* [Scaling Infrastructure and (Git) Workflow at Adyen](https://medium.com/adyen/from-0-100-billion-scaling-infrastructure-and-workflow-at-adyen-7b63b690dfb6)
* [Dotfiles Distribution at Booking.com](https://medium.com/booking-com-infrastructure/dotfiles-distribution-dedb69c66a75)
* [Secret Detector: Preventing Secrets in Source Code at Yelp](https://engineeringblog.yelp.com/2018/06/yelps-secret-detector.html)
* [Managing Software Dependency at Scale at LinkedIn](https://engineering.linkedin.com/blog/2018/09/managing-software-dependency-at-scale)
* [Twitter的动态配置](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/dynamic-configuration-at-twitter.html)
* [扩展持续集成和持续交付](https://www.synopsys.com/blogs/software-security/agile-cicd-devops-glossary/)
* [Facebook 持续集成stack](https://code.fb.com/web/rapid-release-at-massive-scale/)
* [Continuous Integration with Distributed Repositories and Dependencies at Netflix](https://medium.com/netflix-techblog/towards-true-continuous-integration-distributed-repositories-and-dependencies-2a2e3108c051)
* [Screwdriver: Continuous Delivery Build System for Dynamic Infrastructure at Yahoo](https://yahooeng.tumblr.com/post/155765242061/open-sourcing-screwdriver-yahoos-continuous)
* [Betterment的CI/CD](https://www.betterment.com/resources/ci-cd-shortening-the-feedback-loop/)
* [Brainly的CI/CD](https://medium.com/engineering-brainly/ci-cd-at-scale-fdfb0f49e031)
* [Scaling iOS CI with Anka at Shopify](https://engineering.shopify.com/blogs/engineering/scaling-ios-ci-with-anka)
* [Scaling Jira Server at Yelp](https://engineeringblog.yelp.com/2019/04/Scaling-Jira-Server-Administration-For-The-Enterprise.html)
* [Auto-scaling CI/CD cluster at Flexport](https://flexport.engineering/how-flexport-halved-testing-costs-with-an-auto-scaling-ci-cd-cluster-8304297222f)

## 可用性
* [Resilience Engineering: Learning to Embrace Failure](https://queue.acm.org/detail.cfm?id=2371297)
* [Resilience Engineering with Project Waterbear at LinkedIn](https://engineering.linkedin.com/blog/2017/11/resilience-engineering-at-linkedin-with-project-waterbear)
* [Resiliency against Traffic Oversaturation at iHeartRadio](https://tech.iheart.com/resiliency-against-traffic-oversaturation-77c5ed92a5fb)
* [Resiliency in Distributed Systems at GO-JEK](https://blog.gojekengineering.com/resiliency-in-distributed-systems-efd30f74baf4)
* [Practical NoSQL Resilience Design Pattern for the Enterprise at eBay](https://www.ebayinc.com/stories/blogs/tech/practical-nosql-resilience-design-pattern-for-the-enterprise/)
* [Ensuring Resilience to Disaster at Quora](https://engineering.quora.com/Ensuring-Quoras-Resilience-to-Disaster)
* [Resilience在Shopify](https://scaleyourcode.com/blog/article/23)
* [Site Resiliency at Expedia](https://www.infoq.com/presentations/expedia-website-resiliency?utm_source=presentations_about_Case_Study&utm_medium=link&utm_campaign=Case_Study)
* [Failover](http://cloudpatterns.org/mechanisms/failover_system)
* [The Evolution of Global Traffic Routing and Failover](https://www.usenix.org/conference/srecon16/program/presentation/heady)
* [Testing for Disaster Recovery Failover Testing](https://www.usenix.org/conference/srecon17asia/program/presentation/liu_zehua)
* [Designing a Microservices Architecture for Failure](https://blog.risingstack.com/designing-microservices-architecture-for-failure/)
* [ELB for Automatic Failover at GoSquared](https://engineering.gosquared.com/use-elb-automatic-failover)
* [Eliminate the Database for Higher Availability at American Express](http://americanexpress.io/eliminate-the-database-for-higher-availability/)
* [Failover with Redis Sentinel at Vinted](http://engineering.vinted.com/2015/09/03/failover-with-redis-sentinel/)
* [High-availability SaaS Infrastructure at FreeAgent](http://engineering.freeagent.com/2017/02/06/ha-infrastructure-without-breaking-the-bank/)
* [MySQL High Availability at GitHub](https://github.blog/2018-06-20-mysql-high-availability-at-github/)
* [负载均衡](https://blog.vivekpanyam.com/scaling-a-web-service-load-balancing/)
* [Introduction to Modern Network Load Balancing and Proxying](https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236)
* [Top Five (Load Balancing) Scalability Patterns](https://www.f5.com/company/blog/top-five-scalability-patterns)
* [Load Balancing infrastructure to support more than 1.3 billion users at Facebook](https://www.usenix.org/conference/srecon15europe/program/presentation/shuff)
* [DHCPLB: DHCP Load Balancer at Facebook](https://code.facebook.com/posts/1734309626831603/dhcplb-an-open-source-load-balancer/)
* [Katran: Scalable Network Load Balancer at Facebook](https://code.facebook.com/posts/1906146702752923/open-sourcing-katran-a-scalable-network-load-balancer/)
* [Load Balancing with Eureka at Netflix](https://medium.com/netflix-techblog/netflix-shares-cloud-load-balancing-and-failover-tool-eureka-c10647ef95e5)
* [Edge Load Balancing at Netflix](https://medium.com/netflix-techblog/netflix-edge-load-balancing-695308b5548c)
* [Zuul 2: Cloud Gateway at Netflix](https://medium.com/netflix-techblog/open-sourcing-zuul-2-82ea476cb2b3)
* [Yelp的负载均衡](https://engineeringblog.yelp.com/2017/05/taking-zero-downtime-load-balancing-even-further.html)
* [Github的负载均衡](https://githubengineering.com/introducing-glb/)
* [Vimeo一致性hash提升负载均衡](https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed)
* [500 pixel UDP的负载均衡](https://developers.500px.com/udp-load-balancing-with-keepalived-167382d7ad08)
* [QALM: QoS Load Management Framework at Uber](https://eng.uber.com/qalm/)
* [Traffic Steering using Rum DNS at LinkedIn](https://www.usenix.org/conference/srecon17europe/program/presentation/rastogi)
* [Traffic Infrastructure (Edge Network) at Dropbox](https://blogs.dropbox.com/tech/2018/10/dropbox-traffic-infrastructure-edge-network/)
* [Monitor DNS systems at Stripe](https://stripe.com/en-sg/blog/secret-life-of-dns)
* [限流](https://www.keycdn.com/support/rate-limiting/)
* [Rate Limiting for Scaling to Millions of Domains at Cloudfare](https://blog.cloudflare.com/counting-things-a-lot-of-different-things/)
* [Cloud Bouncer: Distributed Rate Limiting at Yahoo](https://yahooeng.tumblr.com/post/111288877956/cloud-bouncer-distributed-rate-limiting-at-yahoo)
* [Scaling API with Rate Limiters at Stripe](https://stripe.com/blog/rate-limiters)
* [Rate Limiting at Etsy](https://www.sans.org/summit-archives/file/summit-archive-1509593697.pdf)
* [Distributed Rate Limiting at Allegro](https://allegro.tech/2017/04/hermes-max-rate.html)
* [Ratequeue: Core Queueing-And-Rate-Limiting System at Twilio](https://www.twilio.com/blog/2017/11/chaos-engineering-ratequeue-ha.html)
* [Quotas Service at Grab](https://engineering.grab.com/quotas-service)
* [自动扩容](https://medium.com/@BotmetricHQ/top-11-hard-won-lessons-learned-about-aws-auto-scaling-5bfe56da755f)
* [Pinterest 自动扩容](https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64)
* [Autoscaling Based on Request Queuing at Square](https://medium.com/square-corner-blog/autoscaling-based-on-request-queuing-c4c0f57f860f)
* [Autoscaling Jenkins at Trivago](http://tech.trivago.com/2017/02/17/your-definite-guide-for-autoscaling-jenkins/)
* [Autoscaling Pub-Sub Consumers at Spotify](https://labs.spotify.com/2017/11/20/autoscaling-pub-sub-consumers/)
* [Autoscaling Bigtable Clusters based on CPU Load at Spotify](https://labs.spotify.com/2018/12/18/bigtable-autoscaler-saving-money-and-time-using-managed-storage/)
* [Autoscaling AWS Step Functions Activities at Yelp](https://engineeringblog.yelp.com/2019/06/autoscaling-aws-step-functions-activities.html)
* [Scryer: Predictive Auto Scaling Engine at Netflix](https://medium.com/netflix-techblog/scryer-netflixs-predictive-auto-scaling-engine-a3f8fc922270)
* [Bouncer: Simple AWS Auto Scaling Rollovers at Palantir](https://medium.com/palantir/bouncer-simple-aws-auto-scaling-rollovers-c5af601d65d4)
* [Clusterman: Autoscaling Mesos Clusters at Yelp](https://engineeringblog.yelp.com/2019/02/autoscaling-mesos-clusters-with-clusterman.html)
* [Google高可用分布式存储系统](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36737.pdf)
* [Yahoo高可用NodeJS](https://yahooeng.tumblr.com/post/68823943185/nodejs-high-availability)
* [Operations (11 parts) at LinkedIn](https://www.linkedin.com/pulse/introduction-every-day-monday-operations-benjamin-purgason)
* [Monitoring Powers High Availability for LinkedIn Feed](https://www.usenix.org/conference/srecon17americas/program/presentation/barot)
* [Supporting Global Events at Facebook](https://code.facebook.com/posts/166966743929963/how-production-engineers-support-global-events-on-facebook/)
* [BlaBlaCar的高可用](https://medium.com/blablacar-tech/the-expendables-backends-high-availability-at-blablacar-8cea3b95b26b)
* [ Netflix的高可用](https://medium.com/@NetflixTechBlog/tips-for-high-availability-be0472f2599c)
* [High Availability Cloud Infrastructure at Twilio](https://www.twilio.com/engineering/2011/12/12/scaling-high-availablity-infrastructure-in-cloud)
* [Dropbox 自动化数据中心运营](https://blogs.dropbox.com/tech/2019/01/automating-datacenter-operations-at-dropbox/)
* [Riot Games 全球化玩家账户](https://technology.riotgames.com/news/globalizing-player-accounts)

## 稳定性
* [熔断](https://martinfowler.com/bliki/CircuitBreaker.html)
* [分布式系统的熔断](https://www.infoq.com/presentations/circuit-breaking-distributed-systems)
* [LINE分布式服务的熔断](https://engineering.linecorp.com/en/blog/detail/76)
* [Applying Circuit Breaker to Channel Gateway at LINE](https://engineering.linecorp.com/en/blog/detail/78)
* [Lessons in Resilience at SoundCloud](https://developers.soundcloud.com/blog/lessons-in-resilience-at-SoundCloud)
* [Circuit Breaker for Scaling Containers](https://f5.com/about-us/blog/articles/the-art-of-scaling-containers-circuit-breakers-28919)
* [Protector: Circuit Breaker for Time Series Databases at Trivago](http://tech.trivago.com/2016/02/23/protector/)
* [Improved Production Stability with Circuit Breakers at Heroku](https://blog.heroku.com/improved-production-stability-with-circuit-breakers)
* [Circuit Breakers at Zendesk](https://medium.com/zendesk-engineering/the-joys-of-circuit-breaking-ee6584acd687)
* [Circuit Breakers at Traveloka](https://medium.com/traveloka-engineering/circuit-breakers-dont-let-your-dependencies-bring-you-down-5ba1c5cf1eec)
* [超时](https://www.javaworld.com/article/2824163/application-performance/stability-patterns-applied-in-a-restful-architecture.html)
* [Neflix 容错(超时和重试、线程分离、信号量、断路器)](https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a)
* [DoorDash 强制超时:一种可靠性方法](https://doordash.engineering/2018/12/21/enforce-timeout-a-doordash-reliability-methodology/)
* [eBay 对启用 tcp_tw_recycle 的连接超时问题进行故障排除](https://www.ebayinc.com/stories/blogs/tech/a-vip-connection-timeout-issue-caused-by-snat-and-tcp-tw-recycle/)
* [Booking.com MySQL 的崩溃安全复制](https://medium.com/booking-com-infrastructure/better-crash-safe-replication-for-mysql-a336a69b317f)
* [Bulkheads: Partition and Tolerate Failure in One Part](https://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html)
* [Steady State: Always Put Logs on Separate Disk](https://docs.microsoft.com/en-us/sql/relational-databases/policy-based-management/place-data-and-log-files-on-separate-drives)
* [Throttling: Maintain a Steady Pace](http://www.sosp.org/2001/papers/welsh.pdf)
* [Multi-Clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn](https://engineering.linkedin.com/blog/2017/11/improving-resiliency-and-stability-of-a-large-scale-api)
* [Determinism (4 parts) in League of Legends Server](https://engineering.riotgames.com/news/determinism-league-legends-fixing-divergences)

## 性能
* [操作系统,存储,数据库,网络的性能优化](https://stackify.com/application-performance-metrics/)
* [Instagram 通过后台数据预取提高性能](https://engineering.instagram.com/improving-performance-with-background-data-prefetching-b191acb39898)
* [eBay 解决网络I/O瓶颈的压缩技术](https://www.ebayinc.com/stories/blogs/tech/how-ebays-shopping-cart-used-compression-techniques-to-solve-network-io-bottlenecks/)
* [Dropbox 优化Web服务器,实现高吞吐量和低延迟](https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/)
* [Netflix 60,000毫秒内的Linux性能分析](https://medium.com/netflix-techblog/linux-performance-analysis-in-60-000-milliseconds-accc10403c55)
* [Live Downsizing Google Cloud Persistent Disks (PD-SSD) at Mixpanel](https://engineering.mixpanel.com/2018/07/31/live-downsizing-google-cloud-pds-for-fun-and-profit/)
* [Zapier 使用jemalloc与Python和Celery降低40%的RAM使用率 ](https://zapier.com/engineering/celery-python-jemalloc/)
* [Reducing Memory Footprint at Slack](https://slack.engineering/reducing-slacks-memory-footprint-4480fec7e8eb)
* [Pinterest的性能提升](https://medium.com/@Pinterest_Engineering/driving-user-growth-with-performance-improvements-cfc50dafadd7)
* [Wix的服务端渲染](https://www.youtube.com/watch?v=f9xI2jR71Ms)
* [Yelp MySQLStreamer的30倍性能提升 ](https://engineeringblog.yelp.com/2018/02/making-30x-performance-improvements-on-yelps-mysqlstreamer.html)
* [Optimizing APIs through Dynamic Polyglot Runtime, Fully Asynchronous, and Reactive Programming at Netflix](https://medium.com/netflix-techblog/optimizing-the-netflix-api-5c9ac715cf19)
* [Performance Monitoring with Riemann and Clojure at Walmart](https://medium.com/walmartlabs/performance-monitoring-with-riemann-and-clojure-eafc07fcd375)
* [Performance Tracking Dashboard for Live Games at Zynga](https://www.zynga.com/blogs/engineering/live-games-have-evolving-performance)
* [Optimizing CAL Report Hadoop MapReduce Jobs at eBay](https://www.ebayinc.com/stories/blogs/tech/optimization-of-cal-report-hadoop-mapreduce-job/)
* [Performance Tuning on Quartz Scheduler at eBay](https://www.ebayinc.com/stories/blogs/tech/performance-tuning-on-quartz-scheduler/)
* [Profiling C++ (Part 1: Optimization, Part 2: Measurement and Analysis) at Riot Games](https://engineering.riotgames.com/news/profiling-optimisation)
* [HomeAway 剖析React服务器端渲染](https://medium.com/homeaway-tech-blog/profiling-react-server-side-rendering-to-free-the-node-js-event-loop-7f0fe455a901)
* [Mixpanel 诊断Linux内核中的网络问题](https://code.mixpanel.com/2015/03/26/diagnosing-networking-issues-in-the-linux-kernel/)
* [Dailymotion 硬件辅助视频转码](https://medium.com/dailymotion-engineering/hardware-assisted-video-transcoding-at-dailymotion-66cd2db448ae)
* [Cross Shard Transactions at 10 Million RPS at Dropbox](https://blogs.dropbox.com/tech/2018/11/cross-shard-transactions-at-10-million-requests-per-second/)
* [Pinterest API剖析](https://medium.com/@Pinterest_Engineering/api-profiling-at-pinterest-6fa9333b4961)
* [Pagelets Parallelize Server-side Processing at Yelp](https://engineeringblog.yelp.com/2017/07/generating-web-pages-in-parallel-with-pagelets.html)
* [Improving key expiration in Redis at Twitter](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/improving-key-expiration-in-redis.html)
* [Ad Delivery Network Performance Optimization with Flame Graphs at MindGeek](https://medium.com/mindgeek-engineering-blog/ad-delivery-network-performance-optimization-with-flame-graphs-bc550cf59cf7)
* [Predictive CPU isolation of containers at Netflix](https://medium.com/netflix-techblog/predictive-cpu-isolation-of-containers-at-netflix-91f014d856c7)
* [GC性能优化](https://confluence.atlassian.com/enterprise/garbage-collection-gc-tuning-guide-461504616.html)
* [LinkedIn Java应用GC优化](https://engineering.linkedin.com/garbage-collection/garbage-collection-optimization-high-throughput-and-low-latency-java-applications) | [原文](./原文/Garbage_Collection_Optimization_for_High-Throughput_and_Low-Latency_Java_Applications.md) | [译文](./译文/LinkedIn_Java应用GC优化.md)
* [Adobe 高吞吐、低延迟机器学习服务中的垃圾收集 ](https://medium.com/adobetech/engineering-high-throughput-low-latency-machine-learning-services-7d45edac0271)
* [SoundCloud Redux应用中的垃圾回收 ](https://developers.soundcloud.com/blog/garbage-collection-in-redux-applications)
* [Twitch Go应用中的垃圾回收 ](https://blog.twitch.tv/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2)
* [Alibaba 分析V8垃圾收集日志 ](https://www.linux.com/blog/can-nodejs-scale-ask-team-alibaba)
* [Instagram 用于减少每个请求50%内存增长的Python垃圾回收](https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf)
* [Performance Impact of Removing Out of Band Garbage Collector (OOBGC) at Github](https://githubengineering.com/removing-oobgc/)
* [Allegro 调试Java内存泄漏 ](https://allegro.tech/2018/05/a-comedy-of-errors-debugging-java-memory-leaks.html)
* [Alibaba JVM优化](https://www.youtube.com/watch?v=X4tmr3nhZRg)
* [图片, 视频, 页加载性能优化](https://developers.google.com/web/fundamentals/performance/why-performance-matters/)
* [Optimizing 360 Photos at Scale at Facebook](https://code.facebook.com/posts/129055711052260/optimizing-360-photos-at-scale/)
* [Etsy 减少图片基础架构中的图片文件大小](https://codeascraft.com/2017/05/30/reducing-image-file-size-at-etsy/)
* [Pinterest 提升GIF性能](https://medium.com/@Pinterest_Engineering/improving-gif-performance-on-pinterest-8dad74bf92f1)
* [Pinterest 优化视频回放性能 ](https://medium.com/@Pinterest_Engineering/optimizing-video-playback-performance-caf55ce310d1)
* [Netflix 使用动态优化器优化低带宽视频流 ](https://medium.com/netflix-techblog/optimized-shot-based-encodes-now-streaming-4b9464204830)
* [YouTube 自适应视频流媒体](https://youtube-eng.googleblog.com/2018/04/making-high-quality-video-efficient.html)
* [Dailymotion 缩短视频加载时间](https://medium.com/dailymotion/reducing-video-loading-time-fa9c997a2294)
* [LinkedIn 使用 Brotli 压缩提升网站速度](https://engineering.linkedin.com/blog/2017/05/boosting-site-speed-using-brotli-compression)
* [Zillow 提升主页性能](https://www.zillow.com/engineering/improving-homepage-performance/)
* [Expedia 优化客户端性能过程](https://medium.com/expedia-engineering/go-fast-or-go-home-the-process-of-optimizing-for-client-performance-57bb497402e)
## 智能
* [大数据](https://insights.sei.cmu.edu/sei_blog/2017/05/reference-architectures-for-big-data-systems.html)
* [Uber 数据平台](https://eng.uber.com/uber-big-data-platform/)
* [BMW 数据平台](https://www.unibw.de/code/events-u/jt-2018-workshops/ws3_bigdata_vortrag_widmann.pdf)
* [Netflix 数据平台](https://www.youtube.com/watch?v=CSDIThSwA7s)
* [Flipkart 数据平台](https://tech.flipkart.com/overview-of-flipkart-data-platform-20c6d3e9a196)
* [Khan Academy 数据平台](http://engineering.khanacademy.org/posts/khanalytics.htm)
* [Airbnb 数据平台](https://medium.com/airbnb-engineering/data-infrastructure-at-airbnb-8adfb34f169c)
* [LinkedIn 的数据基础架构](https://www.infoq.com/presentations/big-data-infrastructure-linkedin)
* [GO-JEK 数据基础架构](https://blog.gojekengineering.com/data-infrastructure-at-go-jek-cd4dc8cbd929)
* [Pinterest 数据基础架构](https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754)
* [Pinterest 数据分析架构](https://medium.com/@Pinterest_Engineering/behind-the-pins-building-analytics-f7b508cdacab)
* [Spotify 大数据处理](https://labs.spotify.com/2017/10/23/big-data-processing-at-spotify-the-road-to-scio-part-2/)
* [Uber 大数据处理](https://cdn.oreillystatic.com/en/assets/1/event/160/Big%20data%20processing%20with%20Hadoop%20and%20Spark%2C%20the%20Uber%20way%20Presentation.pdf)
* [数据分析流水线](https://cdn.oreillystatic.com/en/assets/1/event/269/Lyft_s%20analytics%20pipeline_%20From%20Redshift%20to%20Apache%20Hive%20and%20Presto%20Presentation.pdf)
* [数据分析流水线](https://tech.grammarly.com/blog/building-a-versatile-analytics-pipeline-on-top-of-apache-spark)
* [Teads数据分析流水线](https://medium.com/teads-engineering/give-meaning-to-100-billion-analytics-events-a-day-d6ba09aa8f44)
* [PayPal 用于实时欺诈预防的 ML 数据管道](https://www.infoq.com/presentations/paypal-ml-fraud-prevention-2018)
* [LinkedIn 大数据分析和机器学习技术](https://cdn.oreillystatic.com/en/assets/1/event/269/Big%20data%20analytics%20and%20machine%20learning%20techniques%20to%20drive%20and%20grow%20business%20Presentation%201.pdf)
* [LinkedIn Hadoop 上的自助报告平台](https://cdn.oreillystatic.com/en/assets/1/event/137/Building%20a%20self-serve%20real-time%20reporting%20platform%20at%20LinkedIn%20Presentation%201.pdf)
* [LinkedIn 隐私保护分析和报告](https://engineering.linkedin.com/blog/2019/04/privacy-preserving-analytics-and-reporting-at-linkedin)
* [Walmart 用于跟踪项目可用性的分析平台](https://medium.com/walmartlabs/how-we-build-a-robust-analytics-platform-using-spark-kafka-and-cassandra-lambda-architecture-70c2d1bc8981)
* [HALO:Facebook 的硬件分析和生命周期优化](https://code.fb.com/data-center-engineering/hardware-analytics-and-lifecycle-optimization-halo-at-facebook/)
* [RBEA:King 的实时分析平台](https://techblog.king.com/rbea-scalable-real-time-analytics-king/)
* [AresDB:Uber GPU 驱动的实时分析引擎](https://eng.uber.com/aresdb/)
* [AthenaX:Uber的流分析平台](https://eng.uber.com/athenax/)
* [Keystone:Netflix 的实时流处理平台](https://medium.com/netflix-techblog/keystone-real-time-stream-processing-platform-a3ee651812a)
* [数据手册:在 Uber 使用元数据将大数据转化为知识](https://eng.uber.com/databook/)
* [Amundsen: Data Discovery & Metadata Engine at Lyft](https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9)
* [Maze: Funnel Visualization Platform at Uber](https://eng.uber.com/maze/)
* [Metacat: Making Big Data Discoverable and Meaningful at Netflix](https://medium.com/netflix-techblog/metacat-making-big-data-discoverable-and-meaningful-at-netflix-56fb36a53520)
* [SpinalTap: Change Data Capture System at Airbnb](https://medium.com/airbnb-engineering/capturing-data-evolution-in-a-service-oriented-architecture-72f7c643ee6f)
* [Accelerator: Fast Data Processing Framework at eBay](https://www.ebayinc.com/stories/blogs/tech/announcing-the-accelerator-processing-1-000-000-000-lines-per-second-on-a-single-computer/)
* [Omid: Transaction Processing Platform at Yahoo](https://yahooeng.tumblr.com/post/180867271141/a-new-chapter-for-omid)
* [TensorFlowOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo](https://yahooeng.tumblr.com/post/157196488076/open-sourcing-tensorflowonspark-distributed-deep)
* [CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo](https://yahooeng.tumblr.com/post/139916828451/caffeonspark-open-sourced-for-distributed-deep)
* [Spark on Scala: Analytics Reference Architecture at Adobe](https://medium.com/adobetech/spark-on-scala-adobe-analytics-reference-architecture-7457f5614b4c)
* [Experimentation Platform at Airbnb](https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-23fd17c91166)
* [Smart Product Platform at Zalando](https://jobs.zalando.com/tech/blog/zalando-smart-product-platform/?gh_src=4n3gxh1)
* [LINE 日志分析平台](https://www.slideshare.net/wyukawa/strata2017-sg)
* [Myntra 数据可视化平台](https://medium.com/myntra-engineering/universal-dashboarding-platform-udp-data-visualisation-platform-at-myntra-5f2522fcf72d)
* [Building and Scaling Data Lineage at Netflix](https://medium.com/netflix-techblog/building-and-scaling-data-lineage-at-netflix-to-improve-data-infrastructure-reliability-and-1a52526a7977)
* [Building a scalable data management system for computer vision tasks at Pinterest](https://medium.com/@Pinterest_Engineering/building-a-scalable-data-management-system-for-computer-vision-tasks-a6dee8f1c580)
* [Structured Data at Etsy](https://codeascraft.com/2019/07/31/an-introduction-to-structured-data-at-etsy/)
* [分布式机器学习](https://www.csie.ntu.edu.tw/~cjlin/talks/bigdata-bilbao.pdf)
* [Aroma: Using ML for Code Recommendation at Facebook](https://code.fb.com/developer-tools/aroma/)
* [Michelangelo: Machine Learning Platform at Uber](https://eng.uber.com/michelangelo/)
* [Scaling Michelangelo](https://eng.uber.com/scaling-michelangelo/)
* [Horovod: Open Source Distributed Deep Learning Framework for TensorFlow at Uber](https://eng.uber.com/horovod/)
* [COTA: Improving Customer Care with NLP & Machine Learning at Uber](https://eng.uber.com/cota/)
* [Manifold: Model-Agnostic Visual Debugging Tool for Machine Learning at Uber](https://eng.uber.com/manifold/)
* [Repo-Topix: Topic Extraction Framework at Github](https://githubengineering.com/topics/)
* [Concourse: Generating Personalized Content Notifications in Near-Real-Time at LinkedIn](https://engineering.linkedin.com/blog/2018/05/concourse--generating-personalized-content-notifications-in-near)
* [Altus Care: Applying a Chatbot to Platform Engineering at eBay](https://www.ebayinc.com/stories/blogs/tech/altus-care-apply-chatbot-to-ebay-platform-engineering/)
* [Box Graph: Spontaneous Social Network at Box](https://blog.box.com/blog/box-graph-how-we-built-spontaneous-social-network/)
* [PricingNet: Pricing Modelling with Neural Networks at Skyscanner](https://hackernoon.com/pricingnet-modelling-the-global-airline-industry-with-neural-networks-833844d20ea6)
* [PinText: Multitask Text Embedding System at Pinterest](https://medium.com/pinterest-engineering/pintext-a-multitask-text-embedding-system-in-pinterest-b80ece364555)
* [Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp](https://engineeringblog.yelp.com/2018/01/building-a-distributed-ml-pipeline-part1.html)
* [Learning with Privacy at Scale at Apple](https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html)
* [Deep Learning for Image Classification Experiment at Mercari](https://medium.com/mercari-engineering/mercaris-image-classification-experiment-using-deep-learning-9b4e994a18ec)
* [Deep Learning for Frame Detection in Product Images at Allegro](https://allegro.tech/2016/12/deep-learning-for-frame-detection.html)
* [Content-based Video Relevance Prediction at Hulu](https://medium.com/hulu-tech-blog/content-based-video-relevance-prediction-b2c448e14752)
* [Improving Photo Selection With Deep Learning at TripAdvisor](http://engineering.tripadvisor.com/improving-tripadvisor-photo-selection-deep-learning/)
* [Personalized Recommendations for Experiences Using Deep Learning at TripAdvisor](https://www.tripadvisor.com/engineering/personalized-recommendations-for-experiences-using-deep-learning/)
* [Personalised Recommender Systems at BBC](https://medium.com/bbc-design-engineering/developing-personalised-recommender-systems-at-the-bbc-e26c5e0c4216)
* [机器学习在Condé Nast](https://technology.condenast.com/story/handbag-brand-and-color-detection)
* [自然语言处理和内容分析在Condé Nast](https://technology.condenast.com/story/natural-language-processing-and-content-analysis-at-conde-nast-part-2-system-architecture)
* [Machine Learning Applications In The E-commerce Domain (4 parts) at Rakuten](https://techblog.rakuten.co.jp/2017/07/12/machine-learning-applications-in-the-e-commerce-domain-4/)
* [Mapping the World of Music Using Machine Learning (2 parts) at iHeartRadio](https://tech.iheart.com/mapping-the-world-of-music-using-machine-learning-part-2-aa50b6a0304c)
* [Machine Learning to Improve Streaming Quality at Netflix](https://medium.com/netflix-techblog/using-machine-learning-to-improve-streaming-quality-at-netflix-9651263ef09f)
* [Machine Learning to Match Drivers & Riders at GO-JEK](https://blog.gojekengineering.com/how-we-use-machine-learning-to-match-drivers-riders-b06d617b9e5)
* [Improving Video Thumbnails with Deep Neural Nets at YouTube](https://youtube-eng.googleblog.com/2015/10/improving-youtube-video-thumbnails-with_8.html)
* [Quantile Regression for Delivering On Time at Instacart](https://tech.instacart.com/how-instacart-delivers-on-time-using-quantile-regression-2383e2e03edb)
* [Cross-Lingual End-to-End Product Search with Deep Learning at Zalando](https://jobs.zalando.com/tech/blog/search-deep-neural-network/)
* [Jane Street机器学习](https://blog.janestreet.com/real-world-machine-learning-part-1/)
* [Machine Learning for Ranking Answers End-to-End at Quora](https://engineering.quora.com/A-Machine-Learning-Approach-to-Ranking-Answers-on-Quora)
* [Clustering Similar Stories Using LDA at Flipboard](http://engineering.flipboard.com/2017/02/storyclustering)
* [Similarity Search at Flickr](https://code.flickr.net/2017/03/07/introducing-similarity-search-at-flickr/)
* [Large-Scale Machine Learning Pipeline for Job Recommendations at Indeed](http://engineering.indeedblog.com/blog/2016/04/building-a-large-scale-machine-learning-pipeline-for-job-recommendations/)
* [Deep Learning from Prototype to Production at Taboola](http://engineering.taboola.com/deep-learning-from-prototype-to-production/)
* [Atom Smashing using Machine Learning at CERN](https://cdn.oreillystatic.com/en/assets/1/event/144/Atom%20smashing%20using%20machine%20learning%20at%20CERN%20Presentation.pdf)
* [Mapping Tags at Medium](https://medium.engineering/mapping-mediums-tags-1b9a78d77cf0)
* [Clustering with the Dirichlet Process Mixture Model in Scala at Monsanto](http://engineering.monsanto.com/2015/11/23/chinese-restaurant-process/)
* [Map Pins with DBSCAN & Random Forests at Foursquare](https://engineering.foursquare.com/you-are-probably-here-better-map-pins-with-dbscan-random-forests-9d51e8c1964d)
* [Detecting and Preventing Fraud at Uber](https://eng.uber.com/advanced-technologies-detecting-preventing-fraud-uber/)
* [Forecasting at Uber](https://eng.uber.com/forecasting-introduction/)
* [Financial Forecasting at Uber](https://eng.uber.com/transforming-financial-forecasting-machine-learning/)
* [Productionizing ML with Workflows at Twitter](https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html)
* [GUI Testing Powered by Deep Learning at eBay](https://www.ebayinc.com/stories/blogs/tech/gui-testing-powered-by-deep-learning/)
* [Scaling Machine Learning to Recommend Driving Routes at Pivotal](http://engineering.pivotal.io/post/scaling-machine-learning-to-recommend-driving-routes/)
* [实时预测在DoorDash](https://www.infoq.com/presentations/doordash-real-time-predictions)
* [Dropbox 机器智能](https://blogs.dropbox.com/tech/2018/09/machine-intelligence-at-dropbox-an-update-from-our-dbxi-team/)
* [Dropbox 用于从数十亿张图像中索引文本的机器学习](https://blogs.dropbox.com/tech/2018/10/using-machine-learning-to-index-text-from-billions-of-images/)
* [Etsy 通过语义嵌入建模用户旅程](https://codeascraft.com/2018/07/12/modeling-user-journey-via-semantic-embeddings/)
* [LinkedIn 自动假账户检测](https://engineering.linkedin.com/blog/2018/09/automated-fake-account-detection-at-linkedin)
* [Airbnb 构建知识图谱](https://medium.com/airbnb-engineering/contextualizing-airbnb-by-building-knowledge-graph-b7077e268d5a)
* [Instagram 核心建模](https://instagram-engineering.com/core-modeling-at-instagram-a51e0158aa48)
* [Mercari 用于禁止物品检测的神经架构搜索 (NAS)](https://tech.mercari.com/entry/2019/04/26/163000)
* [Airbnb 计算机视觉](https://medium.com/airbnb-engineering/amenity-detection-and-beyond-new-frontiers-of-computer-vision-at-airbnb-144a4441b72e)
* [Zillow 3D 家庭后端算法](https://www.zillow.com/engineering/behind-zillow-3d-home-backend-algorithms/)
* [Lyft 长期预测](https://eng.lyft.com/making-long-term-forecasts-at-lyft-fac475b3ba52)

## 架构
* [Systems We Make](https://systemswemake.com/)
* [Uber 技术栈(两部分)](https://eng.uber.com/tech-stack-part-two/)
* [Medium 技术栈](https://medium.engineering/the-stack-that-helped-medium-drive-2-6-millennia-of-reading-time-e56801f7c492)
* [Shopif 技术栈](https://engineering.shopify.com/blogs/engineering/e-commerce-at-scale-inside-shopifys-tech-stack)
* [Services (2 parts) at Airbnb](https://medium.com/airbnb-engineering/building-services-at-airbnb-part-2-142be1c5d506)
* [印象笔记架构](https://evernote.com/blog/a-digest-of-evernotes-architecture/)
* [Riot Games 聊天服务架构(三部分)](https://engineering.riotgames.com/news/chat-service-architecture-persistence)
* [英雄联盟客户端更新架构](https://technology.riotgames.com/news/architecture-league-client-update)
* [Slack基础架构](https://slack.engineering/how-slack-built-shared-channels-8d42c895b19f)
* [LinkedIn 后端](https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin)
* [Flickr 后端](https://yahooeng.tumblr.com/post/157200523046/introducing-tripod-flickrs-backend-refactored)
* [Zendesk基础架构(3 parts)](https://medium.com/zendesk-engineering/the-history-of-infrastructure-at-zendesk-part-3-foundation-team-forming-and-evolving-9859e40f5390)
* [Grubhub 云基础设施](https://bytes.grubhub.com/cloud-infrastructure-at-grubhub-94db998a898a)
* [LinkedIn 实时呈现平台](https://engineering.linkedin.com/blog/2018/01/now-you-see-me--now-you-dont--linkedins-real-time-presence-platf)
* [LinkedIn 设置平台](https://engineering.linkedin.com/blog/2019/05/building-member-trust-through-a-centralized-and-scalable-setting)
* [Pinterest 广告的实时用户操作计数系统](https://medium.com/@Pinterest_Engineering/building-a-real-time-user-action-counting-system-for-ads-88a60d9c9a)
* [Riot Games API平台](https://engineering.riotgames.com/news/riot-games-api-deep-dive)
* [The New York Times 游戏平台](https://open.nytimes.com/play-by-play-moving-the-nyt-games-platform-to-gcp-with-zero-downtime-cf425898d569)
* [Kabootar:Swiggy 的通信平台](https://bytes.swiggy.com/kabootar-swiggys-communication-platform-e5a43cc25629)
* [Simone:Netflix 的分布式模拟服务](https://medium.com/netflix-techblog/https-medium-com-netflix-techblog-simone-a-distributed-simulation-service-b2c85131ca1b)
* [Seagull:帮助在 Yelp 上每天运行超过 2000 万次测试的分布式系统](https://engineeringblog.yelp.com/2017/04/how-yelp-runs-millions-of-tests-every-day.html)
* [Netflix Play API 服务架构](https://qconsf.com/system/files/presentation-slides/qcon_netflix_play_api.pdf)
* [LINE 贴纸服务架构](https://www.slideshare.net/linecorp/architecture-sustaining-line-sticker-services)
* [Stack Overflow Enterprise at Palantir](https://medium.com/@palantir/terraforming-stack-overflow-enterprise-in-aws-47ee431e6be7)
* [Pinterest Following流, Interest流和 Picked For You 架构](https://medium.com/@Pinterest_Engineering/building-a-dynamic-and-responsive-pinterest-7d410e99f0a9)
* [WeWork API 规范工作流](https://engineering.wework.com/our-api-specification-workflow-9337448d6ee6)
* [Netflix 媒体数据库](https://medium.com/netflix-techblog/implementing-the-netflix-media-database-53b5a840b42a)
* [Walmart 会员交易历史架构](https://medium.com/walmartlabs/member-transaction-history-architecture-8b6e34b87c21)
* [金融和银行系统架构](https://www.sesameindia.com/images/core-banking-system-architecture)
* [开放银行标准的参考架构](https://hortonworks.com/blog/reference-architecture-open-banking-standard/)
* [Monzo 银行后端](https://monzo.com/blog/2016/09/19/building-a-modern-bank-backend/)
* [ Wealthsimple 交易平台扩展](https://medium.com/@Wealthsimple/engineering-at-wealthsimple-reinventing-our-trading-platform-for-scale-17e332241b6c)
* [Margo Bank 核心银行系统](https://medium.com/margobank/choosing-an-architecture-85750e1e5a03)
* [Nubank 加购](https://www.infoq.com/presentations/nubank-architecture)
* [TransferWise技术栈](http://tech.transferwise.com/the-transferwise-stack-heartbeat-of-our-little-revolution/)
* [Addepar技术栈](https://medium.com/build-addepar/our-tech-stack-a4f55dab4b0d)
* [Avoiding Double Payments in a Distributed Payments System at Airbnb](https://medium.com/airbnb-engineering/avoiding-double-payments-in-a-distributed-payments-system-2981f6b070bb)

## 面试
* [设计大规模系统](https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/)
* [我的伸缩偶像 - Jeff Atwood (a dose of Endorphins before your interview, JK)](https://blog.codinghorror.com/my-scaling-hero/)
* [构建大型分布式系统的软件工程建议 - Jeff Dean](https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf)
* [Introduction to Architecting Systems for Scale](https://lethain.com/introduction-to-architecting-systems-for-scale/)
* [系统设计面试的剖析](https://hackernoon.com/anatomy-of-a-system-design-interview-4cb57d75a53f)[原文](./原文/Anatomy%20of%20a%20System%20Design%20Interview.md)
* [在系统设计面试之前你需要知道的8件事](http://blog.gainlo.co/index.php/2015/10/22/8-things-you-need-to-know-before-system-design-interviews/)
* [10个系统设计的面试问题](https://hackernoon.com/top-10-system-design-interview-questions-for-software-engineers-8561290f0444)
* [十大常见的大规模软件架构模式概述](https://towardsdatascience.com/10-common-software-architectural-patterns-in-a-nutshell-a0b47a1e9013)
* [云端大数据设计模式- Lynn Langit](https://lynnlangit.com/2017/03/14/beyond-relational/)
* [如何在 45 分钟的系统设计面试中不设计 Netflix?](https://hackernoon.com/how-not-to-design-netflix-in-your-45-minute-system-design-interview-64953391a054)
* [API Best Practices: Webhooks, Deprecation, and Design](https://zapier.com/engineering/api-best-practices/)
* [Explaining Low-Level Systems (操作系统, 网络/协议, 数据库, 存储)](https://www.palantir.com/how-to-ace-a-systems-design-interview/)
* [OSI & TCP/IP 备忘单](http://jaredheinrichs.com/mastering-the-osi-tcpip-models.html)
* [Linux中I/O等待时间的精确含义](http://veithen.github.io/2013/11/18/iowait-linux.html)
* [Paxos Made Live——工程视角](https://research.google.com/archive/paxos_made_live.html)
* [如何实现分布式锁](https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)
* [SQL事务隔离级别](http://elliot.land/post/sql-transaction-isolation-levels-explained) | [原文](./原文/SQL_Transaction_Isolation_Levels_Explained.md) | [译文](./译文/SQL事务隔离级别.md)
* ["What Happens When... and How" 问题](https://www.glassdoor.com/Interview/What-happens-when-you-type-www-google-com-in-your-browser-QTN_56396.htm)
* [Netflix: 当您按下播放键时会发生什么?](http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html)
* [Monzo: 点对点支付的工作原理](https://monzo.com/blog/2018/04/05/how-monzo-to-monzo-payments-work/)
* [传输和对等互连:您的请求如何到达 GitHub](https://githubengineering.com/transit-and-peering-how-your-requests-reach-github/)
* [Spotify 如何串流音乐](https://labs.spotify.com/2018/08/31/smoother-streaming-with-bbr/)

## 组织
* [Engineering Levels at SoundCloud](https://developers.soundcloud.com/blog/engineering-levels)
* [Engineering Roles at Palantir](https://medium.com/palantir/dev-versus-delta-demystifying-engineering-roles-at-palantir-ad44c2a6e87)
* [Scaling Engineering Teams at Twitter](https://www.youtube.com/watch?v=-PXi_7Ld5kU)
* [Scaling Decision-Making Across Teams at LinkedIn](https://engineering.linkedin.com/blog/2018/03/scaling-decision-making-across-teams-within-linkedin-engineering)
* [Scaling Data Science Team at GOJEK](https://blog.gojekengineering.com/the-dynamics-of-scaling-an-organisation-cb96dbe8aecd)
* [Scaling Agile at Zalando](https://jobs.zalando.com/tech/blog/scaling-agile-zalando/?gh_src=4n3gxh1)
* [Scaling Agile at bol.com](https://hackernoon.com/how-we-run-bol-com-with-60-autonomous-teams-fe7a98c0759)
* [Lessons Learned from Scaling a Product Team at Intercom](https://blog.intercom.com/how-we-build-software/)
* [Hiring, Managing, and Scaling Engineering Teams at Typeform](https://medium.com/@eleonorazucconi/toby-oliver-cto-typeform-on-hiring-managing-and-scaling-engineering-teams-86bef9e5a708)
* [Scaling the Datagram Team at Instagram](https://instagram-engineering.com/scaling-the-datagram-team-fc67bcf9b721)
* [Scaling the Design Team at Flexport](https://medium.com/flexport-design/designing-a-design-team-a9a066bc48a5)
* [Team Model for Scaling a Design System at Salesforce](https://medium.com/salesforce-ux/the-salesforce-team-model-for-scaling-a-design-system-d89c2a2d404b)
* [Building Analytics Team (4 parts) at Wish](https://medium.com/wish-engineering/scaling-the-analytics-team-at-wish-part-4-recruiting-2a9823b9f5a)
* [From 2 Founders to 1000 Employees at Transferwise](https://medium.com/transferwise-ideas/from-2-founders-to-1000-employees-how-a-small-scale-startup-grew-into-a-global-community-9f26371a551b)
* [Lessons Learned Growing a UX Team from 10 to 170 at Adobe](https://medium.com/thinking-design/lessons-learned-growing-a-ux-team-from-10-to-170-f7b47be02262)
* [Five Lessons from Scaling at Pinterest](https://medium.com/@sarahtavel/five-lessons-from-scaling-pinterest-6a699a889b08)
* [Approach Engineering at Vinted](http://engineering.vinted.com/2018/09/04/how-we-approach-engineering-at-vinted/)
* [Using Metrics to Improve the Development Process (and Coach People) at Indeed](https://engineering.indeedblog.com/blog/2018/10/using-metrics-to-improve-the-development-process-and-coach-people/)
* [Mistakes to Avoid while Creating an Internal Product at Skyscanner](https://medium.com/@SkyscannerEng/9-mistakes-to-avoid-while-creating-an-internal-product-63d579b00b1a)
* [RACI (Responsible, Accountable, Consulted, Informed) at Etsy](https://codeascraft.com/2018/01/04/selecting-a-cloud-provider/)
* [Four Pillars of Leading People (Empathy, Inspiration, Trust, Honesty) at Zalando](https://jobs.zalando.com/tech/blog/four-pillars-leadership/)
* [Shopify 结对编程](https://engineering.shopify.com/blogs/engineering/pair-programming-explained)
* [Distributed Responsibility at Asana](https://blog.asana.com/2017/12/distributed-responsibility-engineering-manager/)
* [Rotating Engineers at Zalando](https://jobs.zalando.com/tech/blog/rotating-engineers-at-zalando/)
* [Code Review](https://ai.google/research/pubs/pub47025)
* [Palantir Code Review](https://medium.com/@palantir/code-review-best-practices-19e02780015f)
* [LINE Code Review](https://engineering.linecorp.com/en/blog/effective-code-review/)
* [Medium Code Reviews](https://medium.engineering/code-reviews-at-medium-bed2c0dce13a)
* [LinkedIn Code Review](https://engineering.linkedin.com/blog/2018/06/scaling-collective-code-ownership-with-code-reviews)
* [Disney Code Review](https://medium.com/disney-streaming/the-secret-to-better-code-reviews-c14c7884b9ac)

## 演讲
* [一节课讲解分布式系统 - Tim Berglund, Senior Director of Developer Experience at Confluent](https://www.youtube.com/watch?v=Y6Ev8GIlbxc)
* [Facebook 构建实时基础设施 - Jeff Barber and Shie Erlich, Software Engineer at Facebook](https://www.usenix.org/conference/srecon17americas/program/presentation/erlich)
* [为 Google 构建可靠的社交基础设施 - Marc Alvidrez, Senior Manager at Google](https://www.usenix.org/conference/srecon16/program/presentation/alvidrez)
* [以 Google 规模构建分布式构建系统 - Aysylu Greenberg, SDE at Google](https://www.youtube.com/watch?v=K8YuavUy6Qc)
* [Dropbox 网站可靠性工程 - Tammy Butow, Site Reliability Engineering Manager at Dropbox](https://www.youtube.com/watch?v=ggizCjUCCqE)
* [How Google Does Planet-Scale for Planet-Scale Infra - Melissa Binde, SRE Director for Google Cloud Platform](https://www.youtube.com/watch?v=H4vMcD7zKM0)
* [Netflix 微服务指南 - Josh Evans, Director of Operations Engineering at Netflix](https://www.youtube.com/watch?v=CZ3wIuvmHeM&t=2837s)
* [在大型在线服务中实现快速响应 - Jeff Dean, Google Senior Fellow](https://www.youtube.com/watch?v=1-3Ahy7Fxsc)
* [Shopify 处理 80K RPS 名人销售的架构 - Simon Eskildsen, Engineering Lead at Shopify](https://www.youtube.com/watch?v=N8NWDHgWA28)
* [Facebook的扩展经验 - Bobby Johnson, Director of Engineering at Facebook](https://www.youtube.com/watch?v=QCHiNEw73AU)
* [Salesforce 大中华区的性能优化 - Jeff Cheng, Enterprise Architect at Salesforce](https://www.salesforce.com/video/1757880/)
* [GIPHY 如何向 3 亿用户提供 GIF 动图 - Alex Hoang and Nima Khoshini, Services Engineers at GIPHY](https://vimeo.com/252367076)
* [Alibaba 高性能数据包处理平台 - Haiyong Wang, Senior Director at Alibaba](https://www.youtube.com/watch?v=wzsxJqeVIhY&list=PLMu8-hpCxIVENuAue7bd0eCAglLGY_8AW&index=7)
* [解决大规模数据中心与云互联问题 - Ihab Tarazi, CTO at Equinix](https://atscaleconference.com/videos/solving-large-scale-data-center-and-cloud-interconnection-problems/)
* [Dropbox 扩展 - Kevin Modzelewski, Back-end Engineer at Dropbox](https://www.youtube.com/watch?v=PE4gwstWhmc)
* [Dropbox 可靠性扩展 - Sat Kriya Khalsa, SRE at Dropbox](https://www.youtube.com/watch?v=IhGWOaD5BYQ)
* [Facebook 性能扩展 - Bill Jia, VP of Infrastructure at Facebook](https://atscaleconference.com/videos/performance-scale-2018-opening-remarks/)
* [Facebook 将直播视频扩展到十亿用户 - Sachin Kulkarni, Director of Engineering at Facebook](https://www.youtube.com/watch?v=IO4teCbHvZw)
* [Instagram 扩展基础设施 - Lisa Guo, Instagram Engineering](https://www.youtube.com/watch?v=hnpzNAPiC0E)
* [Twitter 扩展基础设施 - Yao Yue, Staff Software Engineer at Twitter](https://www.youtube.com/watch?v=6OvrFkLSoZ0)
* [Etsy 扩展基础设施 - Bethany Macri, Engineering Manager at Etsy](https://www.youtube.com/watch?v=LfqyhM1LeIU)
* [Alibaba 为全球购物假期扩展实时基础设施 - Xiaowei Jiang, Senior Director at Alibaba](https://atscaleconference.com/videos/scaling-alibabas-real-time-infrastructure-for-global-shopping-holiday/)
* [Spotify 扩展数据基础设施 - Matti (Lepistö) Pehrs, Spotify](https://www.youtube.com/watch?v=cdsfRXr9pJU)
* [Pinterest 扩展 - Marty Weiner, Pinterest’s founding engineer](https://www.youtube.com/watch?v=jQNCuD_hxdQ&list=RDhnpzNAPiC0E&index=11)
* [扩展 Slack - Bing Wei, Software Engineer (Infrastructure) at Slack](https://www.infoq.com/presentations/slack-scalability)
* [Youtube 扩展后端 - Sugu Sougoumarane, SDE at Youtube](https://www.youtube.com/watch?v=5yDO-tmIoXY&feature=youtu.be)
* [Uber 扩展后端 - Matt Ranney, Chief Systems Architect at Uber](https://www.youtube.com/watch?v=nuiLcWE8sPA)
* [Netflix 扩展全球 CDN - Dave Temkin, Director of Global Networks at Netflix](https://www.youtube.com/watch?v=tbqcsHg-Q_o)
* [扩展负载平衡基础设施以支持 Facebook 的 13 亿用户 - Patrick Shuff, Production Engineer at Facebook](https://www.youtube.com/watch?v=bxhYNfFeVF4)
* [将(NSFW 网站)扩展到每天超过 2 亿次观看 - Eric Pickup, Lead Platform Developer at MindGeek](https://www.youtube.com/watch?v=RlkCdM_f3p4)
* [Quora 扩展计数基础设施 - Chun-Ho Hung and Nikhil Gar, SEs at Quora](https://www.infoq.com/presentations/quora-analytics)
* [Microsoft 扩展 Git - Saeed Noursalehi, Principal Program Manager at Microsoft](https://www.youtube.com/watch?v=g_MPGU_m01s)
* [Shopify 扩展跨多个数据中心多租户架构 - Weingarten, Engineering Lead at Shopify](https://www.youtube.com/watch?v=F-f0-k46WVk)

## 推荐书籍
* [Big Data, Web Ops & DevOps Ebooks - O'Reilly (Online - Free)](http://www.oreilly.com/webops/free/)
* [Google Site Reliability Engineering (Online - Free)](https://landing.google.com/sre/book.html)
* [Distributed Systems for Fun and Profit (Online - Free)](http://book.mixu.net/distsys/)
* [What Every Developer Should Know About SQL Performance (Online - Free)](https://use-the-index-luke.com/sql/table-of-contents)
* [Beyond the Twelve-Factor App - Exploring the DNA of Highly Scalable, Resilient Cloud Applications (Free)](http://www.oreilly.com/webops-perf/free/beyond-the-twelve-factor-app.csp)
* [Chaos Engineering - Building Confidence in System Behavior through Experiments (Free)](http://www.oreilly.com/webops-perf/free/chaos-engineering.csp?intcmp=il-webops-free-product-na_new_site_chaos_engineering_text_cta)
* [The Art of Scalability](http://theartofscalability.com/)
* [Designing Data-Intensive Applications](https://dataintensive.net/)
* [Web Scalability for Startup Engineers](https://www.goodreads.com/book/show/23615147-web-scalability-for-startup-engineers)
* [Scalability Rules: 50 Principles for Scaling Web Sites](http://scalabilityrules.com/)

## License

这个项目由[Nguyen Quoc Binh](https://www.linkedin.com/in/binhnguyennus/) 在 [2017 Christmas Eve](https://github.com/binhnguyennus/awesome-scalability/graphs/contributors) 创建,献给那些在工作中牺牲个人生活的深夜程序员。

## Donation

请我喝杯咖啡,好吗?谢谢你! 这对我意义非凡::heart:

[![](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://paypal.me/binhnguyennus)