awesome-distributed-systems
A curated list to learn about distributed systems
https://github.com/theanalyst/awesome-distributed-systems
Last synced: 2 days ago
JSON representation
-
Books
- Scalable Web Architecture and Distributed Systems
- Distributed Systems for fun and profit
- Principles of Distributed Systems
- Making reliable distributed systems in the presence of software errors
- Distributed Computing, Hagit Attiya and Jennifer Welch
- Impossibility Results for Distributed Computing
- Akka in Action, Second Edition
- Systemantics: how systems work and especially how they fail
- Designing Data Intensive Applications
- Think Distributed Systems
- Making reliable distributed systems in the presence of software errors
- Designing Distributed Systems, Brendan Burns
- Distributed Systems for fun and profit
- Distributed Machine Learning Patterns, Yuan Tang
- Distributed Computing, Hagit Attiya and Jennifer Welch
- Impossibility Results for Distributed Computing
- Distributed Systems: Concepts and Design, George Coulouris
- Distributed Systems Principles and Paradigms, Andrew Tanenbaum
- Distributed Algorithms, Nancy Lynch
-
Bootcamp
- CAP Theorem - plain-english-introduction-to-cap-theorem) explanation
- CAP Theorem - plain-english-introduction-to-cap-theorem) explanation
- An Introduction to Distributed Systems
- FLP Impossibility Result (paper) - paper-trail.org/blog/a-brief-tour-of-flp-impossibility/) to follow along
-
Papers
-
Storage & Databases
- blog
- The Google File System
- CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data - rados-pdsw07.pdf)
- blog
- The Google File System
- CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data - rados-pdsw07.pdf)
- Bigtable: A Distributed Storage System for Structured Data
- Cassandra: A Decentralized Structured Storage System
- Dynamo: Amazon's Highly Available Key Value Store
-
Distributed Consensus and Fault-Tolerance
- Practical Byzantine Fault Tolerance
- The Byzantine Generals Problem
- The Part Time Parliament
- Paxos Made Simple
- The Chubby Lock Service for loosely coupled distributed systems
- Practical Byzantine Fault Tolerance
- The Byzantine Generals Problem
- Impossibility of Distributed Consensus with One Faulty Process
- Conflict-free Replicated Data Types - kv/), [Redis](https://redis.io/) and [Akka](https://akka.io/). A great talk on the subject by Martin Kleppmann can be found [here](https://www.youtube.com/watch?v=B5NULPSiOGw)
- Azos.Sky.Server.Locking - based consensus. The approach avoids distributed state machine/phase synchronization and is very simple to understand and implement
- Paxos made live - An engineering perspective
- Raft Consensus Algorithm
- Paxos Made Simple
-
Programming Models
-
Messaging systems
-
Verification of Distributed Systems
-
Testing, monitoring and tracing
- Dapper - systems tracing infrastructure, this was also the basis for the design of open source projects such as [Zipkin](http://zipkin.io/), [Apache SkyWalking](https://github.com/apache/incubator-skywalking), [Pinpoint](https://github.com/naver/pinpoint) and [HTrace](http://htrace.incubator.apache.org/).
-
Videos
-
Verification of Distributed Systems
-
-
Courses
-
Verification of Distributed Systems
- CMU: Distributed Systems
- Software Defined Networking
- ETH Zurich: Distributed Systems
- ETH Zurich: Distributed Systems Part 2 - tolerance among other things. In particular fault tolerance issues (models, consensus, agreement) and replication issues (2PC,3PC, Paxos), which are critical in understanding distributed systems are explained in great detail.
- Distributed Systems Course
- CMU: Distributed Systems
- Distributed Systems Course
- ETH Zurich: Distributed Systems
- Distributed Systems - playlist](https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_HdUFe97RItdiB). A computer science entrance course, covered basic models and algorithms in distributed systems, also discussed CRDT, collaboration software and google's spanner.
- Reliable Distributed Algorithms, Part 1
- Reliable Distributed Algorithms, Part 2
- Cloud Computing Concepts
- MIT 6.824 - playlist](https://www.youtube.com/playlist?list=PLrw6a1wE39_tb2fErI4-WkMbsvGQk9_UB) MIT distributed system lectures, in each video they discuss papers like GFS, Zookeeper, RAFT, Spanner...
-
-
Blogs and other reading links
-
Verification of Distributed Systems
- How we implemented consistent hashing efficiently
- There is No Now
- The C10K problem
- On Designing and Deploying Internet-Scale Services
- Files are hard
- Distributed Systems Testing: The Lost World
- Distributed Systems: Take Responsibility for Failover
- The C10K problem
- Files are hard
- Distributed Systems Testing: The Lost World
- SWIM Protocol explained
- Turing Lecture: The Computer Science of Concurrency: The Early Years
- Notes on Distributed Systems for Young Bloods
- Amazon Builder's Library
- There is No Now
- Turing Lecture: The Computer Science of Concurrency: The Early Years
- The Paper Trail
- aphyr
-
-
Meta Lists
-
Verification of Distributed Systems
- Readings in distributed systems
- List of required readings for Distributed Systems
- Readings in distributed systems
- Distributed Systems meta list
- The Distributed Reader
- Awesome Distributed Consensus
- Beginner's Guide to Distributed Systems
- The Distributed Reader
- A Distributed Systems Reading List
- Distributed Systems Readings
-
-
Research
Categories
Sub Categories
Keywords
distributed-systems
2
argo
1
argo-workflows
1
book
1
cloud-computing
1
cloud-native
1
data-science
1
devops
1
distributed-machine-learning
1
kubeflow
1
kubernetes
1
large-scale-machine-learning
1
machine-learning
1
machine-learning-pipelines
1
manning-publications
1
mlops
1
python
1
tensorflow
1
awesome
1
consensus-algorithm
1
paxos
1
raft
1
fault-injection
1
fuzzing
1
jepsen
1
jepsen-tests
1
testing
1