https://github.com/ycd/distributed-systems-papers
An opinionated must read papers on Distributed Systems
https://github.com/ycd/distributed-systems-papers
cloud-computing distributed distributed-computing distributed-systems papers parallel-computing
Last synced: about 1 month ago
JSON representation
An opinionated must read papers on Distributed Systems
- Host: GitHub
- URL: https://github.com/ycd/distributed-systems-papers
- Owner: ycd
- License: mit
- Created: 2021-02-01T14:04:32.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2022-10-11T18:28:54.000Z (over 3 years ago)
- Last Synced: 2026-01-16T05:23:13.467Z (2 months ago)
- Topics: cloud-computing, distributed, distributed-computing, distributed-systems, papers, parallel-computing
- Homepage:
- Size: 102 KB
- Stars: 30
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
* **Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of Computers** (2006) _Chee Shin Yeo,, Rajkumar Buyya, Hossein Pourreza, Rasit Eskicioglu, Peter Graham, Frank Sommers_ [[PDF](http://www.cloudbus.org/papers/ic_cluster.pdf)]
* **Zanzibar: Google’s Consistent, Global Authorization System** (2019) _Ruoming Pang, Ramon Caceres, Mike Burrows, Zhifeng Chen, Pratik Dave,Nathan Germer, Alexander Golynski, Kevin Graney, and Nina Kang, Google;Lea Kissner, Humu, Inc.; Jeffrey L. Korn, Google; Abhishek Parmar, Carbon, Inc. Christopher D. Richards and Mengzhi Wang, Google_ [[PDF](https://www.usenix.org/system/files/atc19-pang.pdf)]
* **Bipartisan Paxos: A Family of Fast, Leaderless, Modular State Machine Replication Protocols** (2020) _Michael Whittaker, Neil Giridharan, Adriana Szekeres, Joseph M. Hellerstein, Ion Stoica_ [[PDF](https://mwhittaker.github.io/publications/bipartisan_paxos.pdf)]
* **In Search of an Understandable Consensus Algorithm** (2014) _Diego Ongaro and John Ousterhout_ [[PDF](https://raft.github.io/raft.pdf)]
* **exTreme Modelling in Practice** (2020), _A. Jesse Jiryu Davis, Max Hirschhorn, Judah Schvimer_ [[PDF](https://arxiv.org/pdf/2006.00915.pdf)]
* **Starling: A Scalable Query Engine on Cloud Function Services** (2019) _Matthew Perron, Raul Castro Fernandez, David DeWitt, Samuel Madden_ [[PDF](https://arxiv.org/pdf/1911.11727.pdf)]
* **Lambada: Interactive Data Analytics on Cold Data using Serverless Cloud Infrastructure** (2019) _Ingo Müller, Renato Marroquín, Gustavo Alonso_ [[PDF](https://arxiv.org/pdf/1912.00937.pdf)]
* **The Google File System** (2003) _Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung_ [[PDF](https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf)]
* **Amazon Web Services: Millions of Tiny Databases** (2020) _Marc Brooker, Tao Chen, and Fan Ping_ [[PDF](https://www.usenix.org/system/files/nsdi20-paper-brooker.pdf)]
* **Tiered Replication: A Cost-effective Alternative to Full Cluster Geo-replication** (2015) _Asaf Cidon, Stanford University; Robert Escriva, Cornell University; Sachin Katti and Mendel Rosenblum, Stanford University; Emin Gün Sirer, Cornell University_ [[PDF](https://www.usenix.org/system/files/conference/atc15/atc15-paper-cidon.pdf)]
* **Scalable State-Machine Replication** (2014) _Carlos Eduardo Bezerra, Fernando Pedone, Robbert van Renesse†_ [[PDF](https://www.inf.usi.ch/faculty/pedone/Paper/2014/2014DSNa.pdf)]
* **Designing Distributed Systems Using Approximate Synchrony in Data Center Networks** (2015) _Dan R. K. Ports, Jialin Li, Vincent Liu, Naveen Kr. Sharma,and Arvind Krishnamurthy_ [[PDF](https://homes.cs.washington.edu/~arvind/papers/specpaxos.pdf)]
* **Armada: Low-Effort Verification of High-Performance Concurrent Programs** (2020) _Yixuan Chen, Manos Kapritsos, Bryan Parno, Shaz Qadeer, Upamanyu Sharma, James R. Wilcox, Xueyuan Zhao_ [[PDF](http://jamesrwilcox.com/armada.pdf)]
* **Ocean Vista: Gossip-Based Visibility Control for Speedy Geo-Distributed Transactions** (2019) _Hua Fan, Wojciech Golab_ [[PDF](http://www.vldb.org/pvldb/vol12/p1471-fan.pdf)]
* **Consolidating Concurrency Control and Consensus for Commits under Conflicts** (2016) _Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li_ [[PDF](http://mpaxos.com/pub/janus-osdi16.pdf)]
* **Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency** (2014) _Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble_ [[PDF](https://syslab.cs.washington.edu/papers/latency-socc14.pdf)]
* **Near-Optimal Latency Versus Cost Tradeoffs in Geo-Distributed Storage** (2020) _Muhammed Uluyol, Anthony Huang, Ayush Goel, Mosharaf Chowdhury, Harsha V. Madhyastha_ [[PDF](https://www.usenix.org/system/files/nsdi20-paper-uluyol.pdf)]
* **Scaling symbolic evaluation for automated verification of systems code with Serval** (2019) _Luke Nelson, James Bornholt, Ronghui Gu, Andrew Baumann, Emina Torlak, Xi Wang_ [[PDF](https://dl.acm.org/doi/10.1145/3341301.3359641)]
* **Incremental Inference of Inductive Invariants for Verification of Distributed Protocols** (2019) _Haojun Ma, Aman Goel, Jean Baptiste Jeannin, Manos Kapritsos, Baris Kasikci ,Karem A. Sakallah_ [[PDF](https://6826.csail.mit.edu/2020/papers/i4.pdf)]
* **Incremental Consistency Guarantees for Replicated Objects** (2016) _Rachid Guerraoui, Matej Pavlovic, and Dragos-Adrian Seredinschi_ [[PDF](https://www.usenix.org/system/files/conference/osdi16/osdi16-guerraoui.pdf)]
* **Canopus: A Scalable and Massively Parallel Consensus Protocol** (2017) _Sajjad Rizvi, Bernard Wong, Srinivasan Keshav_ [[PDF](https://cs.uwaterloo.ca/~bernard/Canopus.pdf)]
* **Consus: Taming the Paxi** (2016) _Robert Escriva, Robbert van Renesse_ [[PDF](https://arxiv.org/pdf/1612.03457.pdf)]
* **Stable and Consistent Membership at Scale with Rapid** (2018) _Lalith Suresh, Dahlia Malkhi, Parikshit Gopalan, Ivan Porto Carreiro, Zeeshan Lokhandwala_ [[PDF](https://www.usenix.org/system/files/conference/atc18/atc18-suresh.pdf)]
* **Unifying Consensus and Atomic Commitment for Effective Cloud Data Management** (2019) _Sujaya Maiyya, Faisal Nawab†, Divyakant Agrawal, Amr El Abbadi_ [[PDF](http://www.vldb.org/pvldb/vol12/p611-maiyya.pdf)]
* **WormSpace: A Modular Foundation for Simple, Verifiable Distributed Systems** (2019) _Ji-Yong Shin, Jieung Kim, Wolf Honore, Hernán Vanzetto, Srihari Radhakrishnan,Mahesh Balakrishnan, Zhong Shao_ [[PDF](https://dl.acm.org/doi/pdf/10.1145/3357223.3362739)]
* **Dynamic atomic storage without consensus** (2011) _Marcos Kawazoe Aguilera, Idit Keidar, Dahlia Malkhi, Alexander Shraer_[[PDF](http://www.cs.technion.ac.il/~shralex/DynaStore-PODC09.pdf)]
* **PaxosStore: High-availability Storage Made Practical in WeChat** (2017) _Jianjun Zheng, Qian Lin, Jiatao Xu, Cheng Wei, Chuwei Zeng, Pingan Yang, Yunfan Zhang_[[PDF](https://www.vldb.org/pvldb/vol10/p1730-lin.pdf)]
* **Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering** (2016) _Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports_ [[PDF](https://www.usenix.org/sites/default/files/conference/protected-files/osdi16_slides_li_jialin.pdf)]
* **Paxos Made Moderately Complex** (2015) _Robbert van Renesse, Deniz Altinbuken_ [[PDF](https://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf)]
* **Paxos Made Live An Engineering Perspective** (2017) _Tushar Chandra, Robert Griesemer, Joshua Redstone_ [[PDF](https://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/paper2-1.pdf)]
* **Exploiting Commutativity For Practical Fast Replication** (2019) _Seo Jin Park, John Ousterhout_ [[PDF](https://www.usenix.org/system/files/nsdi19-park.pdf)]
* **Mergeable Replicated Data Types** (2019) _Gowtham Kaki, Swarn Priya, KC Sivaramakrishnan, Suresh Jagannathan_ [[PDF](https://kcsrk.info/papers/oopsla19-mrdt.pdf)]
* **Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes** (2018) _Alexandre Verbitski, Anurag Gupta, Debanjan Saha, James Corey, Kamal Gupta,Murali Brahmadesam, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvilli, Xiaofeng Bao_ [[PDF](https://dl.acm.org/doi/pdf/10.1145/3037697.3037722)]
* **SLOG: Serializable, Low-latency, Geo-replicated Transactions** (2019) _Kun Ren, Dennis Li, Daniel Abadi_ [[PDF](http://www.vldb.org/pvldb/vol12/p1747-ren.pdf)]
* **Enabling Lightweight Transactions with Precision Time** (2017) _Pulkit A. Misra, Jeffrey S. Chase, Johannes Gehrke, Alvin R. Lebeck_ [[PDF](https://users.cs.duke.edu/~alvy/papers/milana_semel_asplos2017.pdf)]
* **Interactive Checks for Coordination Avoidance** (2018) _Michael Whittaker, Joseph M. Hellerstein_ [[PDF](http://www.vldb.org/pvldb/vol12/p14-whittaker.pdf)]
* **Slicer: Auto-Sharding for Datacenter Applications** (2016) _Atul Adya, Daniel Myers, Jon Howell, Jeremy Elson, Colin Meek, Vishesh Khemani, Stefan Fulger, Pan Gu, Lakshminath Bhuvanagiri, Jason Hunter, Roberto Peon, Larry Kai, Alexander Shraer, Arif Merchant,_ [[PDF](https://www.usenix.org/system/files/conference/osdi16/osdi16-adya.pdf)]
* **Apache Hadoop YARN: yet another resource negotiator** (2013) _Vinod Kumar Vavilapallih, Arun C Murthyh, Chris Douglas, Sharad Agarwali, Mahadev Konarh, Robert Evansy, Thomas Gravesy, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O’Malley, Sanjay Radia, Benjamin Reed, Eric Baldeschwieler_ [[PDF](http://www1.ece.neu.edu/~ningfang/SimPaper/YARN-SOCC2013.pdf)]
* **Firmament: Fast, Centralized Cluster Scheduling at Scale** (2016) _Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, Steven Hand_ [[PDF](https://www.usenix.org/system/files/conference/osdi16/osdi16-gog.pdf)]
* **Large-scale cluster management at Google with Borg** (2015) _Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, John Wilkes_ [[PDF](https://dl.acm.org/doi/pdf/10.1145/2741948.2741964)]
* **The many faces of consistency** (2016) _Marcos K. Aguilera, D. Terry_ [[PDF](http://sites.computer.org/debull/A16mar/p3.pdf)]
* **The SNOW Theorem and Latency-Optimal Read-Only Transactions** (2016) _Haonan Lu, Christopher Hodsdon, Khiem Ngo, Shuai Mu†, Wyatt Lloyd_ [[PDF](https://www.usenix.org/sites/default/files/conference/protected-files/osdi16_slides_lu_haonan.pdf)]
* **FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs** (2016) _Anuj Kalia, Carnegie Mellon University; Michael Kaminsky, Intel Labs; David G. Andersen_ [[PDF](https://www.usenix.org/system/files/conference/osdi16/osdi16-kalia.pdf)]
* **No compromises: distributed transactions with consistency, availability, and performance** (2015) _Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, Miguel Castro_ [[PDF](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/SOSP15-paper227-alternate-final-version.pdf)]
* **Arabesque: a system for distributed graph mining** (2015) _Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, Ashraf Aboulnaga_ [[PDF](https://marcoserafini.github.io/papers/arabesque.pdf)]
* **Petuum: A New Platform for Distributed Machine Learning on Big Data** (2015) _Eric P. Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, AbhimanuKumar, Yaoliang Yu_ [[PDF](https://www.cs.cmu.edu/~./seunghak/petuum_kdd15.pdf)]
* **Twitter Heron: Stream Processing at Scale** (2015) _Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, Siddarth Taneja_ [[PDF](https://dl.acm.org/doi/pdf/10.1145/2723372.2742788)]
* **Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing** (2014) _Chong Sun, Narasimhan Rampalli, Frank Yang, AnHai Doan_ [[PDF](http://pages.cs.wisc.edu/~anhai/papers/chimera-vldb14.pdf)]
* **Holistic Configuration Management at Facebook** (2015) _Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl_ [[PDF](http://sigops.org/s/conferences/sosp/2015/current/2015-Monterey/008-tang-online.pdf)]
* **Building Consistent Transactions with Inconsistent Replication** (2015) _Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports_ [[PDF](http://sigops.org/s/conferences/sosp/2015/current/2015-Monterey/048-zhang-online.pdf)]
* **High-Performance ACID via Modular Concurrency Control** (2015) _Chao Xie, Chunzhi Su, Cody Littley, Lorenzo Alvisi, Manos Kapritsos, Yang Wan_ [[PDF](https://www.cs.utexas.edu/~lorenzo/papers/Chao15Callas.pdf)]
* **ZooKeeper: Wait-free coordination for Internet-scale systems** (2010) _Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, Benjamin Reed_ [[PDF](https://www.usenix.org/legacy/event/atc10/tech/full_papers/Hunt.pdf)]
* **The Chubby lock service for loosely-coupled distributed systems** (2006) _Mike Burrows_ [[PDF](https://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf)]
* **In Search of an Understandable Consensus Algorithm** (2014) _Diego Ongaro, John Ousterhout_ [[PDF](https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf)]
* **WPaxos: Wide Area Network Flexible Consensus** (2017) _Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas and Tevfik Kosar_ [[PDF](https://arxiv.org/pdf/1703.08905.pdf)]
* **Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems** (2014) _Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Renna Rodrigues, Xu Zhao, Yongle Zhang, Pranay U. Jain, and Michael Stumm_[[PDF](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf)]
* **Why Does the Cloud Stop Computing? Lessons from Hundreds of Service Outages** (2016) _Haryadi S. Gunawi, Mingzhe Hao, Riza O. Suminto, Agung Laksono, Anang D. Satria, Jeffry Adityatama, Kurnia J. Eliazar_ [[PDF](https://ucare.cs.uchicago.edu/pdf/socc16-cos.pdf)]
* **Serverless Computing: One Step Forward, Two Steps Back** (2018) _Joseph M. Hellerstein, Jose Faleiro, Joseph E. Gonzalez, Johann Schleier-Smith, Vikram Sreekanti, Alexey Tumanov, Chenggang Wu_ [[PDF](https://arxiv.org/pdf/1812.03651.pdf)]
* **Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing** (2012) _Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica_ [[PDF](https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf)]
* **PNUTS: Yahoo!'s hosted data serving platform** (2009) _Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni_ [[PDF](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.685.4771)]
* **CAP Twelve years later: How the "Rules" have Changed** (2012) _Eric Brewer_ [[PDF](https://sites.cs.ucsb.edu/~rich/class/cs293b-cloud/papers/brewer-cap.pdf)]
* **Life beyond Distributed Transactions: an Apostate’s Opinion** (2015) _Pat Helland_ [[PDF](https://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf)]
* **Spanner: Google’s Globally-Distributed Database** (2012) _James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford_ [[PDF](https://www.usenix.org/system/files/conference/osdi12/osdi12-final-16.pdf)]
* **The Tail at Scale** (2013) _Jeffrey Dean and Luiz André Barroso_ [[PDF](https://www.barroso.org/publications/TheTailAtScale.pdf)]
* **Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases** (2014) _Sandeep Kulkarni, Murat Demirbas, Deepak Madeppa, Bharadwaj Avva, and Marcelo Leone_ [[PDF](https://cse.buffalo.edu/tech-reports/2014-04.pdf)]
* **TaxDC: A Taxonomy of nondeterministic concurrency bugs in datacenter distributed systems** (2016) _Tanakorn Leesatapornwongsa, Jeffrey F Lukman,Shan Lu, Haryadi S Gunawi p_ [[PDF](https://ucare.cs.uchicago.edu/pdf/asplos16-TaxDC.pdf)]
* **Spark: Cluster Computing with Working Sets** (2016) _Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica_ [[PDF](https://www.usenix.org/legacy/event/hotcloud10/tech/full_papers/Zaharia.pdf)]
* **Silent Data Corruptions at Scale** (2021) _Harish Dattatraya Dixit, Sneha Pendharkar, Matt Beadon, Chris Mason, Tejasvi Chakravarthy, Bharath Muthiah, Sriram Sankar_ [[PDF](https://arxiv.org/pdf/2102.11245.pdf)]
* **FoundationDB: A Distributed Unbundled Transactional Key Value Store** (2021) _Jingyu Zhou, Meng Xu, Alexander Shraer, Bala Namasivayam, Alex Miller, Evan Tschannen, Steve Atherton, Andrew J. Beamon, Rusty Sears John Leach, Dave Rosenthal, Xin Dong, Will Wilson, Ben Collins, David Scherer, Alec Grieser, Young Liu, Alvin Moore, Bhaskar Muppana, Xiaoge Su, Vishesh Yadav_ [[PDF](https://www.foundationdb.org/files/fdb-paper.pdf)]
* **Sundial: Fault-tolerant Clock Synchronization for Datacenters** (2021) _Juliang Li, Gautam Kumar, Hema Hariharan, Hassan Wassel, Peter Hochschild, and Dave Platt, Simon Sabato, Minlan Yu, Nandita Dukkipati, Prashant Chandra, Amin Vahdat_ [[PDF](https://www.usenix.org/system/files/osdi20-li_yuliang.pdf)]
* **Running BGP in Data Centers at Scale** (2021) _Anubhavnidhi Abhashkumar, Kausik Subramanian, Alexey Andreyev, Hyojeong Kim,
Nanda Kishore Salem, Jingyi Yang, Petr Lapukhov, Aditya Akella, Hongyi Zeng_ [[PDF](https://research.fb.com/wp-content/uploads/2021/03/Running-BGP-in-Data-Centers-at-Scale_final.pdf)]
* **Amazon Redshift Re-invented** (2022) _Nikos Armenatzoglou, Sanuj Basu, Naga Bhanoori, Mengchu Cai, Naresh Chainani, Kiran Chinta
Venkatraman Govindaraju, Todd J. Green, Monish Gupta, Sebastian Hillig, Eric Hotinger
Yan Leshinksy, Jintian Liang, Michael McCreedy, Fabian Nagel, Ippokratis Pandis, Panos Parchas
Rahul Pathak, Orestis Polychroniou, Foyzur Rahman, Gaurav Saxena, Gokul Soundararajan
Sriram Subramanian, Doug Terry_ [[PDF](https://assets.amazon.science/4b/37/223ac61e450898244a31bed53734/amazon-redshift-re-invented.pdf)]