Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-distributed-systems
A curated list to learn about distributed systems
https://github.com/theanalyst/awesome-distributed-systems
Last synced: 3 days ago
JSON representation
-
Books
- Scalable Web Architecture and Distributed Systems
- Distributed Systems for fun and profit
- Distributed Systems Principles and Paradigms, Andrew Tanenbaum
- Principles of Distributed Systems
- Making reliable distributed systems in the presence of software errors
- Designing Data Intensive Applications
- Distributed Computing, Hagit Attiya and Jennifer Welch
- Distributed Algorithms, Nancy Lynch
- Impossibility Results for Distributed Computing
- Designing Distributed Systems, Brendan Burns
- Distributed Systems: Concepts and Design, George Coulouris
- Akka in Action, Second Edition
- Systemantics: how systems work and especially how they fail
- Think Distributed Systems
-
Bootcamp
- CAP Theorem - plain-english-introduction-to-cap-theorem) explanation
- Fallacies of Distributed Computing
- Distributed systems theory for the distributed engineer
- FLP Impossibility Result (paper) - paper-trail.org/blog/a-brief-tour-of-flp-impossibility/) to follow along
-
Papers
-
Storage & Databases
-
Messaging systems
-
Distributed Consensus and Fault-Tolerance
- Practical Byzantine Fault Tolerance
- The Byzantine Generals Problem
- Impossibility of Distributed Consensus with One Faulty Process
- The Part Time Parliament
- Paxos Made Simple
- The Chubby Lock Service for loosely coupled distributed systems
- Paxos made live - An engineering perspective
- Raft Consensus Algorithm
- Conflict-free Replicated Data Types - kv/), [Redis](https://redis.io/) and [Akka](https://akka.io/). A great talk on the subject by Martin Kleppmann can be found [here](https://www.youtube.com/watch?v=B5NULPSiOGw)
- Azos.Sky.Server.Locking - based consensus. The approach avoids distributed state machine/phase synchronization and is very simple to understand and implement
- The Part Time Parliament
- Paxos Made Simple
-
Testing, monitoring and tracing
- Dapper - systems tracing infrastructure, this was also the basis for the design of open source projects such as [Zipkin](http://zipkin.io/), [Apache SkyWalking](https://github.com/apache/incubator-skywalking), [Pinpoint](https://github.com/naver/pinpoint) and [HTrace](http://htrace.incubator.apache.org/).
-
Programming Models
-
Verification of Distributed Systems
-
Videos
-
Verification of Distributed Systems
-
-
Courses
-
Verification of Distributed Systems
- Reliable Distributed Algorithms, Part 1
- Reliable Distributed Algorithms, Part 2
- Cloud Computing Concepts
- CMU: Distributed Systems
- Software Defined Networking
- ETH Zurich: Distributed Systems
- ETH Zurich: Distributed Systems Part 2 - tolerance among other things. In particular fault tolerance issues (models, consensus, agreement) and replication issues (2PC,3PC, Paxos), which are critical in understanding distributed systems are explained in great detail.
- Distributed Systems Course
- MIT 6.824 - playlist](https://www.youtube.com/playlist?list=PLrw6a1wE39_tb2fErI4-WkMbsvGQk9_UB) MIT distributed system lectures, in each video they discuss papers like GFS, Zookeeper, RAFT, Spanner...
- Distributed Systems - playlist](https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_HdUFe97RItdiB). A computer science entrance course, covered basic models and algorithms in distributed systems, also discussed CRDT, collaboration software and google's spanner.
- MIT 6.824 - playlist](https://www.youtube.com/playlist?list=PLrw6a1wE39_tb2fErI4-WkMbsvGQk9_UB) MIT distributed system lectures, in each video they discuss papers like GFS, Zookeeper, RAFT, Spanner...
-
-
Blogs and other reading links
-
Verification of Distributed Systems
- Amazon Builder's Library
- How we implemented consistent hashing efficiently
- Notes on Distributed Systems for Young Bloods
- There is No Now
- The Paper Trail
- aphyr
- All Things Distributed - Wernel Vogel's (Amazon CTO) blog on distributed systems
- Distributed Systems: Take Responsibility for Failover
- The C10K problem
- On Designing and Deploying Internet-Scale Services
- Files are hard
- Distributed Systems Testing: The Lost World
- SWIM Protocol explained
- Turing Lecture: The Computer Science of Concurrency: The Early Years
-
-
Research
-
Meta Lists
-
Verification of Distributed Systems
-
Categories