https://github.com/arpit20adlakha/computer-science-papers-for-system-design

Last synced: 8 months ago
JSON representation
Host: GitHub
URL: https://github.com/arpit20adlakha/computer-science-papers-for-system-design
Owner: arpit20adlakha
License: mit
Created: 2022-01-25T19:46:42.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2025-02-15T07:19:03.000Z (8 months ago)
Last Synced: 2025-02-15T08:25:22.166Z (8 months ago)
Size: 35.9 MB
Stars: 622
Watchers: 8
Forks: 119
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Computer-Science-Papers

## Storagesystems

- Haystack (https://lnkd.in/gSZYcmmB)

- f4: Facebook’s Warm BLOB Storage System (https://lnkd.in/gMEfTpAh)

- The Hadoop Distributed File System (https://lnkd.in/gSUqafDg)

- The Google File System (https://lnkd.in/giUResea)

- Facebook's Tectonic Filesystem: Efficiency from Exascale (https://lnkd.in/geg7-ub9)

- Pelican: A Building Block for Exascale Cold Data Storage (https://lnkd.in/gSse26YK)

- CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data (https://lnkd.in/gUbnK4rH)

- RADOS: a scalable, reliable storage service for petabyte-scale storage (https://lnkd.in/gKwbmzTx)

- Megastore: Providing Scalable, Highly Available Storage for Interactive Services (https://lnkd.in/gT7mSDQN)

- The Design and Implementation of a Log-Structured File System (https://lnkd.in/gVuka_Ym)

- The RAMCloud Storage System (https://lnkd.in/gC3SQccF)

## Analytics

- Monarch: Google's Planet-Scale In-Memory Time Series Database (https://lnkd.in/gbqa7HNa)

- Gorilla: A Fast, Scalable, In-Memory Time Series Database (https://lnkd.in/gd_nUJbu)

- Scuba: Diving into Data at Facebook (https://lnkd.in/gfBrJcge)

- The Unified Logging Infrastructure for Data Analytics at Twitter (https://lnkd.in/gwhNUMnF)

- Cubrick: Indexing Millions of Records per Second for Interactive Analytics (https://lnkd.in/g-n9GUMD)

- Shark: SQL and Rich Analytics at Scale (https://lnkd.in/gqXHq5BG)

- Realtime Data Processing at Facebook (https://lnkd.in/gQdMN4kP)

## Clustermanager and Scheduling 

- Large-scale cluster management at Google with Borg (https://lnkd.in/gT7bG2SF)

- Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing (https://lnkd.in/gEEdRmcD)

- Apache Hadoop YARN: Yet Another Resource Negotiator (https://lnkd.in/g9SVx_Ft)

- Twine: A Unified Cluster Management System for Shared Infrastructure (https://lnkd.in/gbnuqutm)

## Streamprocessing

- MillWheel: Fault-Tolerant Stream Processing at Internet Scale (https://lnkd.in/gC7VjCfG)

- The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing (https://lnkd.in/g-PyJUPa)

- Apache Flink™: Stream and Batch Processing in a Single Engine (https://lnkd.in/gpzRA6v3)

- Drizzle: Fast and Adaptable Stream Processing at Scale (https://lnkd.in/g9Hbnvp7)

- Kafka, Samza and the Unix Philosophy of Distributed Data (https://lnkd.in/grtHkFWN)

- Discretized Streams: Fault-Tolerant Streaming Computation at Scale (https://lnkd.in/gbzc3_Ke)

- Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark (https://lnkd.in/gnQQP2UY)

- Noria: dynamic, partially-stateful data-flow for high-performance web applications (https://lnkd.in/gYtpef34)

## Pubsub

- Kafka: a Distributed Messaging System for Log Processing (https://lnkd.in/dkfPsFwH)

- Scribe: Transporting petabytes per hour via a distributed, buffered queueing system (https://lnkd.in/dTyTBE_t)

- LogDevice: a distributed data store for logs (https://lnkd.in/dvVTBz46)

- Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log (https://lnkd.in/d7xmexrQ)

- CORFU: A Shared Log Design for Flash Clusters (https://lnkd.in/dxiquk5h)

- The FuzzyLog: A Partially Ordered Shared Log (https://lnkd.in/da4ikmEa)

- Ubiq: A Scalable and Fault-tolerant Log Processing Infrastructure (https://lnkd.in/dQTfCDwH)

## Graph processing in distributed setting.

- Pregel: A System for Large-Scale Graph Processing (https://lnkd.in/ggpew7yq)

- PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs (https://lnkd.in/g6f9Mjzk)

- GraphX: Graph Processing in a Distributed Dataflow Framework (https://lnkd.in/gixUZP46)

- Gemini: A Computation-Centric Distributed Graph Processing System (https://lnkd.in/gCs2R5EJ)

- TAO: Facebook’s Distributed Data Store for the Social Graph (https://lnkd.in/gfesm_Hn)

## Consensus and replicated state machines.

- Paxos Made Simple (https://lnkd.in/gk6nxyVj)

- Implementing Fault-Tolerant Services Using the State Machine (https://lnkd.in/gPwNde-i)

- The Chubby lock service for loosely-coupled distributed systems (https://lnkd.in/gFXKTrXR)

- ZooKeeper: Wait-free coordination for Internet-scale systems (https://lnkd.in/gWTYBxQN)

- In Search of an Understandable Consensus Algorithm (https://lnkd.in/gqrKhvsK)

- Virtual Consensus in Delos (https://lnkd.in/g5bitkdM)

## Peertopeer systems and information dessimination.

- Gossip-Based Broadcast (https://lnkd.in/gT74Zb8Z)

- Gossiping in Distributed Systems (https://lnkd.in/g55DFbuP)

- Peer-to-peer membership management for gossip-based protocols (https://lnkd.in/g_XE4TiE)

- Gossip-based Peer Sampling (https://lnkd.in/gSPwEkaW)

- SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol (https://lnkd.in/gxZtR3Nh)

- Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (https://lnkd.in/gyURBizm)

- Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications (https://lnkd.in/grVF9crk)

Additional May be Repeated articles will categorize later.

|	|Short Name|	Title	|Link |	Extra links|

|---| -------- | ---------- |-----|------------|

|1 | Apache Kafka |	Kafka: A Distributed Messaging System for Log Processing |	(https://notes.stephenholiday.com/Kafka.pdf) ||

|2 | Apache Cassandra |	Cassandra - A Decentralized Structured Storage System |	(https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf) ||

|3 | Apache Flink |	Apache Flink: Stream and Batch Processing in a Single Engine |	(https://asterios.katsifodimos.com/assets/publications/flink-deb.pdf)||

|4 | Apache Spark |	Spark: Cluster Computing with Working Sets |	(https://www.usenix.org/legacy/event/hotcloud10/tech/full_papers/Zaharia.pdf) ||

|5 | Apache Zookeeper |	ZooKeeper: Wait-free coordination for Internet-scale systems |	(https://www.usenix.org/legacy/event/atc10/tech/full_papers/Hunt.pdf) ||

|6 | BigTable |	Bigtable: A Distributed Storage System for Structured Data |	(https://research.google.com/archive/bigtable-osdi06.pdf) ||

|8 | Apache Impala |	Apache Impala: A Modern, Open-Source SQL Engine for Hadoop |	(https://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf) ||

|9 | Apache Druid |	Druid: A Real-time Analytical Data Store |	(http://static.druid.io/docs/druid.pdf) ||

|10 | Timer Wheel |	Hashed and Hierarchical Timing Wheels |	(http://www.cs.columbia.edu/~nahum/w6998/papers/sosp87-timing-wheels.pdf) ||

|11 | MillWheel |	MillWheel: Fault-Tolerant Stream Processing at Internet Scale |	(https://research.google.com/pubs/archive/41378.pdf) ||

|12 | Dynamo |	Dynamo: Amazon’s Highly Available Key-value Store |	(https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) ||

|13 | Google File System |	The Google File System |	(https://research.google.com/archive/gfs-sosp2003.pdf) ||

|14 | MapReduce |	MapReduce: Simplified Data Processing on Large Clusters |	(https://research.google.com/archive/gfs-sosp2003.pdf) ||

|15 | Spanner |	Spanner: Google’s Globally-Distributed Database |	(https://research.google.com/archive/spanner-osdi2012.pdf) ||

|16 | Zab |	Zab: High-performance broadcast forprimary-backup systems |	(http://www.cs.cornell.edu/courses/cs6452/2012sp/papers/zab-ieee.pdf) ||

|17 | Paxos | Paxos Made Simple |	(https://lamport.azurewebsites.net/pubs/paxos-simple.pdf) ||

|18 | Chubby |	The Chubby lock service for loosely-coupled distributed systems |	(https://research.google.com/archive/chubby-osdi06.pdf) ||

|19 | Dremel |	Dremel: Interactive Analysis of Web-Scale Datasets |	(https://research.google/pubs/pub36632/) ||

|20 | Megastore |	Megastore:Providing Scalable, Highly Available Storage for Interactive Services |	(https://research.google/pubs/pub36971.pdf) ||

|21 | Raft |	In Search of an Understandable Consensus Algorithm (Extended Version) |	(https://raft.github.io/raft.pdf) ||

|22 | Flexible Paxos |	Flexible Paxos: Quorum Intersection Revisited |	(https://arxiv.org/abs/1608.06696) ||

|23 | Thrift |	Thrift: Scalable Cross-Language Services Implementation |	(https://thrift.apache.org/static/files/thrift-20070401.pdf) ||

|24 | Maglev |	Maglev: A Fast and Reliable Software Network Load Balancer |	(https://research.google.com/pubs/archive/44824.pdf) ||

|25 | LSM |	The Log-Structured Merge-Tree (LSM-Tree) |	(https://www.cs.umb.edu/~poneil/lsmtree.pdf) ||

|26 | Chord |	Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications |	(https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf) ||

|27 | Kademlia |	Kademlia: A Peer-to-peer Information System Based on the XOR Metric |	(https://www.scs.stanford.edu/~dm/home/papers/kpos.pdf) ||

|28 | Mesa |	Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing |	(https://research.google/pubs/pub42851/	) ||

|29 | SCRIBE |	SCRIBE: A large-scale and decentralized application-level multicast infrastructure |	https://rowstron.azurewebsites.net/PAST/jsac.pdf ||

|30 | PAST |	Storage management and caching in PAST- A large-scale, persistent peer-to-peer storage utility |	https://people.mpi-sws.org/~druschel/publications/PAST-hotos.pdf ||

|31 | Pastry |	Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems? |	https://www.cs.cornell.edu/people/egs/615/pastry.pdf ||	

|32 | Linearizability |	Linearizability: A Correctness Condition for Concurrent Objects |	http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf	||

|33 | Time and Clocks |	Time, Clocks, and the Ordering of Events in a Distributed System |	http://lamport.azurewebsites.net/pubs/time-clocks.pdf ||

|34 | CRDTs |	CRDTs: Consistency without concurrency control |	http://hal.archives-ouvertes.fr/docs/00/39/79/81/PDF/RR-6956.pdf	||

|35 | Photon |	Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams |	https://research.google/pubs/pub41318/	||

|36 | TAO |	TAO: Facebook’s Distributed Data Store for the Social Graph |	https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf	||

|37 | Pregel |	Pregel: A System for Large-Scale Graph Processing |	https://15799.courses.cs.cmu.edu/fall2013/static/papers/p135-malewicz.pdf	||

|38 | Dapper |	Dapper: A-large-scale-distributed-tracing-infrastructure |	https://research.google/pubs/pub36356.pdf ||

|39 | Raft Refloated |	Raft Refloated: Do We Have Consensus? |	https://www.cl.cam.ac.uk/~ms705/pub/papers/2015-osr-raft.pdf	||

|40 | Percolator |	Large-scale Incremental Processing Using Distributed Transactions and Notifications |	https://research.google/pubs/pub36726.pdf ||

|41 | Monarch |	Monarch: Google’s Planet-Scale In-Memory Time Series Database |	https://research.google/pubs/pub50652/	||

|42 | Borg |	Large-scale cluster management at Google with Borg |	https://research.google/pubs/pub43438.pdf ||

|43 | Borg - Next |	Borg: the Next Generation |	https://research.google/pubs/pub49065.pdf ||

|44 | Amazon Aurora |	Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases |	https://web.stanford.edu/class/cs245/readings/aurora.pdf	||

|45 | Gorilla |	Gorilla: A Fast, Scalable, In-Memory Time Series Database |	http://www.vldb.org/pvldb/vol8/p1816-teller.pdf ||

|46 | HDFS |	The Hadoop Distributed File System |	https://storageconference.us/2010/Papers/MSST/Shvachko.pdf	||

|47 | Autopilot |	Autopilot: workload autoscaling at Google |	https://dl.acm.org/doi/10.1145/3342195.3387524	||

|48 | Consistent hashing |	Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web | https://dl.acm.org/doi/pdf/10.1145/258533.258660	||

|49 | SEDA |	SEDA: An Architecture for Well-Conditioned, Scalable Internet Services |	http://www.sosp.org/2001/papers/welsh.pdf	||

|50 | Bitcask |	Bitcask: A Log-Structured Hash Table for Fast Key/Value Data |	https://riak.com/assets/bitcask-intro.pdf	||

|51 | DynamoDB |	Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service |	https://www.usenix.org/system/files/atc22-elhemali.pdf	||

|52 | Isolation levels |	A critique of ANSI SQL isolation levels |	https://dl.acm.org/doi/pdf/10.1145/223784.223785	||

|54 | Deletable Bloom Filter |	The deletable bloom filter |	https://arxiv.org/pdf/1005.0352	||

|55 | Hash Coding |	Space\Time Trade-offs in Hash Coding with Allowable Errors |	https://dl.acm.org/doi/pdf/10.1145/362686.362692	||

|56 | Expedite Byzantine |	Shifting Gears- Changing Algorithms on the Fly To Expedite Byzantine Agreement |	https://www.sciencedirect.com/science/article/pii/089054019290035E	||

|57 | Scalability cost |	Scalability! But at what COST? |	https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-mcsherry.pdf ||

|58 | Foundation DB |	FoundationDB: A Distributed Unbundled Transactional Key Value Store |	https://www.foundationdb.org/files/fdb-paper.pdf	||

|59 | Monolith |	Monolith: Real Time Recommendation System With Collisionless Embedding Table |	https://arxiv.org/pdf/2209.07663	||

|60 | Memcache at Facebook |	Scaling Memcache at Facebook |	https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf ||

|61 | MilliSampler |	A microscopic view of bursts, buffer contention, and loss in data centers |	https://dl.acm.org/doi/pdf/10.1145/3517745.3561430 |	https://engineering.fb.com/2023/04/17/networking-traffic/millisampler-network-traffic-analysis/ |

|62 | FlexiRaft |	FlexiRaft: Flexible Quorums with Raft |	https://www.cidrdb.org/cidr2023/papers/p83-yadav.pdf	||

|63 | Minesweeper |	Scalable Statistical Root Cause Analysis on AppTelemetry |	https://arxiv.org/abs/2010.09974	||

|64 | Shard Manager |	Shard Manager: A Generic Shard ManagementFramework for Geo-distributed Applications		| ||

|65 | FlumeJava |	FlumeJava: Easy, Efficient Data-Parallel Pipelines |	https://research.google/pubs/pub35650.pdf	||

|66 | Heron |	Twitter Heron: Stream Processing at Scale |	https://dl.acm.org/doi/pdf/10.1145/2723372.2742788	||

|67 | Dataflow |	The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, OutofOrder Data Processing | https://research.google/pubs/pub43864.pdf	 ||

|68 | Flink |	State Management in Apache Flink |	http://www.vldb.org/pvldb/vol10/p1718-carbone.pdf	||

|69 | Dgraph |	Dgraph: Synchronously Replicated, Transactional and Distributed Graph Database |||
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arpit20adlakha/computer-science-papers-for-system-design

Awesome Lists containing this project

README