{"id":16863556,"url":"https://github.com/vbrazo/system-design-archives","last_synced_at":"2025-03-18T16:37:28.367Z","repository":{"id":114253673,"uuid":"332594503","full_name":"vbrazo/system-design-archives","owner":"vbrazo","description":"My system design archives for an engineering management journey","archived":false,"fork":false,"pushed_at":"2021-03-29T22:28:36.000Z","size":82,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-24T20:29:57.320Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vbrazo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-25T01:07:58.000Z","updated_at":"2023-03-08T00:54:15.000Z","dependencies_parsed_at":"2024-01-09T15:04:13.619Z","dependency_job_id":null,"html_url":"https://github.com/vbrazo/system-design-archives","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vbrazo%2Fsystem-design-archives","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vbrazo%2Fsystem-design-archives/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vbrazo%2Fsystem-design-archives/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vbrazo%2Fsystem-design-archives/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vbrazo","download_url":"https://codeload.github.com/vbrazo/system-design-archives/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244261136,"owners_count":20424880,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T14:38:59.975Z","updated_at":"2025-03-18T16:37:28.347Z","avatar_url":"https://github.com/vbrazo.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# System Design Archives\n\nThis is my personal system design archives and it's where I store my system design research that aims to provide resources to better interview developers in my engineering management journey.\n\n- [System Design](#system-design)\n  - [Client-Server Model](#client-server-model)\n    - [Client](#client)\n    - [Server](#server)\n    - [Client-Server](#client-server)\n    - [IP Address](#ip-address)\n    - [Port](#port)\n    - [DNS](#dns)\n  - [Network Protocols](#network-protocols)\n    - [IP](#ip)\n    - [TCP](#tcp)\n    - [HTTP](#http)\n    - [IP Packet](#ip-packet)\n  - [Storage](#storage)\n    - [Databases](#databases)\n    - [Disk](#disk)\n    - [Memory](#memory)\n    - [Persistence Storage](#persistence-storage)\n  - [Latency and Throughput](#latency-and-throughput)\n    - [Latency](#latency)\n    - [Throughput](#throughput)\n  - [Availability](#availability)\n    - [High availability](#high-availability)\n    - [Nines](#nines)\n    - [Redundancy](#redundancy)\n    - [SLA](#sla)\n    - [SLO](#slo)\n  - [Caching](#caching)\n    - [Cache](#cache)\n    - [Cache Hit](#cache-hit)\n    - [Cache Miss](#cache-miss)\n    - [Cache Eviction Policy](#cache-eviction-policy)\n    - [Content Delivery Network](#content-delivery-network)\n  - [Proxies](#proxies)\n    - [Forward Proxy](#forward-proxy)\n    - [Reverse Proxy](#reverse-proxy)\n    - [Nginx](#nginx)\n  - [Load Balancers](#load-balancers)\n    - [Load balancer](#load-balancer)\n    - [Server-selection strategy](#server-selection-strategy)\n    - [Hot spot](#hot-spot)\n  - [Hashing](#hashing)\n    - [Eventual consistency](#eventual-consistency)\n    - [Strong consistency](#strong-consistency)\n    - [Rendezvous Hashing](#rendezvous-hashing)\n    - [SHA](#sha)\n  - [Relational Databases](#relational-databases)\n    - [Relational database](#relational-database)\n    - [Non-relational database](#non-relational-database)\n    - [SQL](#sql)\n    - [SQL database](#sql-database)\n    - [NoSQL database](#nosql-database)\n    - [ACID Transaction](#acid-transaction)\n    - [Database index](#database-index)\n    - [Strong consistency](#strong-consistency)\n    - [Eventual consistency](#eventual-consistency)\n    - [Postgres](#postgres)\n  - [Key-Value Stores](#key-value-stores)\n    - [Key-value store](#key-value-store)\n    - [Etcd](#etcd)\n    - [Redis](#redis)\n    - [ZooKeeper](#zookeeper)\n  - [Specialized Storage Paradigms](#specialized-storage-paradigms)\n    - [Blob storage](#blob-storage)\n    - [Time Series Database](#time-series-database)\n    - [Graph database](#graph-database)\n    - [Cypher](#cypher)\n    - [Spatial Database](#spatial-database)\n    - [Quadtree](#quadtree)\n    - [Google Storage](#google-storage)\n    - [S3](#s3)\n    - [InfluxDB](#influxdb)\n    - [Prometheus](#prometheus)\n    - [Neo4j](#neo4j)\n  - [Replication And Sharding](#replication-and-sharding)\n    - [Replication](#replication)\n    - [Sharding](#sharding)\n    - [Hot spot](#hot-spot)\n  - [Leader Election](#leader-election)\n    - [Leader election](#leader-election)\n    - [Consensus algorithm](#consensus-algorithm)\n    - [Paxos \u0026 Raft](#paxos-and-raft)\n    - [Etcd](#etcd)\n    - [ZooKeeper](#zookeeper)\n  - [Peer-To-Peer Networks](#peer-to-peer-networks)\n    - [Peer-to-peer network](#peer-to-peer-network)\n    - [Gossip protocol](#gossip-protocol)\n  - [Polling And Streaming](#polling-and-streaming)\n    - [Polling](#polling)\n    - [Streaming](#streaming)\n  - [Configuration](#configuration)\n    - [Configuration](#configuration)\n  - [Rate Limiting](#rate-limiting)\n    - [DoS attack](#dos-attack)\n    - [DDoS attack](#ddos-attack)\n    - [Redis](#redis)\n  - [Logging And Monitoring](#logging-and-monitoring)\n    - [Logging](#logging)\n    - [Monitoring](#monitoring)\n    - [Alerting](#alerting)\n  - [Publish-Subscribe Pattern](#publish-subscribe-pattern)\n    - [Pub-Sub Pattern](#pub-sub-pattern)\n    - [Idempotent operation](#idempotent-operation)\n    - [Apache Kafka](#apache-kafka)\n    - [Cloud Pub/Sub](#cloud-pub-sub)\n  - [MapReduce](#mapreduce)\n    - [Distributed File System](#distributed-file-system)\n    - [Hadoop](#hadoop)\n  - [Security And HTTPS](#security-and-https)\n    - [Main-in-the-middle attack](#main-in-the-middle-attack)\n    - [Symmetric encryption](#symmetric-encryption)\n    - [Asymmetric encryption](#asymmetric-encryption)\n    - [AES](#aes)\n    - [HTTPS](#https)\n    - [TLS](#tls)\n    - [SSL Certificate](#ssl-certificate)\n    - [Certificate Authority](#certificate-authority)\n    - [TLS Handshake](#tls-handshake)\n  - [API Design](#api-design)\n    - [Pagination](#pagination)\n    - [CRUD Operations](#crud-operations)\n\n# System Design\n\n## Client-Server Model\n\nA client is a thing that talks to servers. A server is a thing that talks to clients. The client—server model is a thing made up of a bunch of clients and servers talking to one another.\n\nAnd that, kids, is how the Internet works!\n\n### Client\n\nA machine or process that requests data or service from a server.\n\nNote that a single machine or piece of software can be both a client and a\nserver at the same time. For instance, a single machine could act as a server\nfor end users and as a client for a database.\n\n### Server\n\nA machine or process that provides data or service for a client, usually by\nlistening for incoming network calls.\n\nNote that a single machine or piece of software can be both a client and a\nserver at the same time. For instance, a single machine could act as a server\nfor end users and as a client for a database.\n\n### Client-Server\n\nThe paradigm by which modern systems are designed, which consists of clients\nrequesting data or service from servers and servers providing data or service\nto clients.\n\n### IP Address\n\nAn address given to each machine connected to the public internet. IPv4\naddresses consist of four numbers separated by dots: `a.b.c.d` where all\nfour numbers are between 0 and 255. Special values include:\n\n- 127.0.0.1: Your own local machine. Also referred to as\n- 192.168.x.y: Your private network. For instance, your machine and all\nmachines on your private wifi network will usually have the `192.168` prefix.\n\n### Port\n\nIn order for multiple programs to listen for new network connections on the\nsame machine without colliding, they pick a `port` to listen on. A port\nis an integer between 0 and 65,535 (2^16 ports total).\n\nTypically, ports 0-1023 are reserved for `system ports` (also called `well-known`\nports) and shouldn't be used by user-level processes.\nCertain ports have pre-defined uses, and although you usually won't be\nrequired to have them memorized, they can sometimes come in handy. Below are\nsome examples:\n\n- 22: Secure Shell\n- 53: DNS lookup\n- 80: HTTP\n- 443: HTTPS\n\n### DNS\n\nShort for Domain Name System, it describes the entities and protocols involved in the\ntranslation from domain names to IP Addresses. Typically, machines make a DNS query to\na well known entity which is responsible for returning the IP address (or multiple ones)\nof the requested domain name in the response.\n\n## Network Protocols\n\nIP packets. TCP headers. HTTP requests.\n\nAs daunting as they may seem, these low-level networking concepts are essential\nto understand how machines in a system communicate with one another. And as\nwe all know, proper communication is key for thriving relationships!\n\n### IP\n\nStands for `Internet Protocol`. This network protocol outlines how almost\nall machine-to-machine communications should happen in the world. Other\nprotocols like `TCP`, `UDP` and `HTTP` are built on top of IP.\n\n### TCP\n\nNetwork protocol built on top of the Internet Protocol (IP). Allows for\nordered, reliable data delivery between machines over the public internet by\ncreating a `connection`.\n\nTCP is usually implemented in the kernel, which exposes `sockets` to\napplications that they can use to stream data through an open connection.\n\n### HTTP\n\nThe `H`yper`T`ext `T`ransfer `P`rotocol is a very common network protocol\nimplemented on top of TCP. Clients make HTTP requests, and servers respond with\na response.\n\nRequests typically have the following schema:\n```\nhost: string (example: domaintest.io)\nport: integer (example: 80 or 443)\nmethod: string (example: GET, PUT, POST, DELETE, OPTIONS or PATCH)\nheaders: pair list (example: \"Content-Type\" =\u003e \"application/json\")\nbody: opaque sequence of bytes\n```\n\nResponses typically have the following schema:\n\n```\nstatus code: integer (example: 200, 401)\nheaders: pair list (example: \"Content-Length\" =\u003e 1238)\nbody: opaque sequence of bytes.\n```\n\n### IP Packet\n\nSometimes more broadly referred to as just a (network) `packet`, an IP\npacket is effectively the smallest unit used to describe data being sent over\n`IP`, aside from bytes. An IP packet consists of:\n\n- an `IP header`, which contains the source and destination `IP addresses` as well\nas other information related to the network a `payload`, which is just the data\nbeing sent over the network.\n\n## Storage\n\nAs it turns out, information storage is an incredibly complex topic that is of\nvital importance to systems design.\n\n### Databases\n\nDatabases are programs that either use disk or memory to do 2 core things:\n`record` data and `query` data. In general, they are themselves\nservers that are long lived and interact with the rest of your application\nthrough network calls, with protocols on top of TCP or even HTTP.\n\nSome databases only keep records in memory, and the users of such databases\nare aware of the fact that those records may be lost forever if the machine or\nprocess dies.\n\nFor the most part though, databases need persistence of those records, and\nthus cannot use memory. This means that you have to write your data to disk.\nAnything written to disk will remain through power loss or network partitions,\nso that’s what is used to keep permanent records.\n\nSince machines die often in a large scale system, special disk partitions or\nvolumes are used by the database processes, and those volumes can get\nrecovered even if the machine were to go down permanently.\n\n### Disk\n\nUsually refers to either `HDD (hard-disk drive)` or `SSD (solid-state drive)`.\nData written to disk will persist through power failures and general machine\ncrashes. Disk is also referred to as `non-volatile storage`.\n\nSSD is far faster than HDD (see latencies of accessing data from SSD and HDD)\nbut also far more expensive from a financial point of view. Because of that,\nHDD will typically be used for data that's rarely accessed or updated, but\nthat's stored for a long time, and SSD will be used for data that's frequently\naccessed and updated.\n\n### Memory\n\nShort for `Random Access Memory (RAM)`. Data stored in memory will be\n`lost` when the process that has written that data dies.\n\n### Persistence Storage\n\nUsually refers to disk, but in general it is any form of storage that persists\nif the process in charge of managing it dies.\n\n## Latency and Throughput\n\n### Latency\n\nIf you've ever experienced lag in a video game, it was most likely due to a\ncombination of high latency and low throughput. And lag sucks.\n\nIt is therefore your Call of Duty to master these two concepts and to join the\ncrusade against high ping.\n\nThe time it takes for a certain operation to complete in a system. Most often\nthis measure is a time duration, like milliseconds or seconds. You should know\nthese orders of magnitude:\n\n```\nReading 1 MB from RAM: 250 μs (0.25 ms)\nReading 1 MB from SSD: 1,000 μs (1 ms)\nTransfer 1 MB over Network: 10,000 μs (10 ms)\nReading 1MB from HDD: 20,000 μs (20 ms)\nInter-Continental Round Trip: 150,000 μs (150 ms)\n```\n\n### Throughput\n\nThe number of operations that a system can handle properly per time unit. For\ninstance the throughput of a server can often be measured in requests per\nsecond (RPS or QPS).\n\n## Availability\n\nThe odds of a particular server or service being up and running at any point\nin time, usually measured in percentages. A server that has 99% availability\nwill be operational 99% of the time (this would be described as having two\n`nines` of availability).\n\n### High availability\n\nUsed to describe systems that have particularly high levels of availability,\ntypically 5 nines or more; sometimes abbreviated \"HA\".\n\n### Nines\n\nTypically refers to percentages of uptime. For example, 5 nines of\navailability means an uptime of 99.999% of the time. Below are the downtimes\nexpected per year depending on those 9s:\n\n```\n- 99% (two 9s): 87.7 hours\n- 99.9% (three 9s): 8.8 hours\n- 99.99%: 52.6 minutes\n- 99.999%: 5.3 minutes\n```\n\n### Redundancy\n\nThe process of replicating parts of a system in an effort to make it more\nreliable.\n\n### SLA\n\nShort for \"service-level agreement\", an SLA is a collection of guarantees\ngiven to a customer by a service provider. SLAs typically make guarantees on a\nsystem's availability, amongst other things. SLAs are made up of one or\nmultiple SLOs.\n\n### SLO\n\nShort for \"service-level objective\", an SLO is a guarantee given to a customer\nby a service provider. SLOs typically make guarantees on a system's\navailability, amongst other things. SLOs constitute an SLA.\n\n## Caching\n\n### Cache\n\nA piece of hardware or software that stores data, typically meant to retrieve\nthat data faster than otherwise.\n\nCaches are often used to store responses to network requests as well as\nresults of computationally-long operations.\n\nNote that data in a cache can become `stale` if the main source of truth\nfor that data (i.e., the main database behind the cache) gets updated and the\ncache doesn't.\n\n### Cache Hit\n\nWhen requested data could have been found in a cache but isn't.\n\n### Cache Miss\n\nThis is typically used to refer to a negative consequence of a system failure or of a\npoor design choice. For example:\n\nIf a server goes down, our load balancer will have to forward requests to a\nnew server, which will result in cache misses.\n\n### Cache Eviction Policy\n\nThe policy by which values get evicted or removed from a cache. Popular cache\neviction policies include `LRU` (least-recently used), `FIFO` (first\nin first out), and `LFU` (least-frequently used).\n\n### Content Delivery Network\n\nA `CDN` is a third-party service that acts like a cache for your servers.\nSometimes, web applications can be slow for users in a particular region if\nyour servers are located only in another region. A CDN has servers all around\nthe world, meaning that the latency to a CDN's servers will almost always be\nfar better than the latency to your servers. A CDN's servers are often referred\nto as `PoPs` (Points of Presence). Two of the most popular CDNs are `Cloudflare`\nand `Google Cloud CDN`.\n\n## Proxies\n\n### Forward Proxy\n\nA server that sits between a client and servers and acts on behalf of the\nclient, typically used to mask the client's identity (IP address). Note that\nforward proxies are often referred to as just proxies.\n\n### Reverse Proxy\n\nA server that sits between clients and servers and acts on behalf of the\nservers, typically used for logging, load balancing, or caching.\n\n### Nginx\n\nPronounced \"engine X\"—not \"N jinx\", Nginx is a very popular webserver that's\noften used as a `reverse proxy` and `load balancer`.\n\n## Load Balancers\n\nRelentlessly distributing network requests across multiple servers, these digital\ntraffic cops act as watchful guardians for your system, ensuring that it operates\nat peak performance day and night.\n\n### Load balancer\n\nA type of `reverse proxy` that distributes traffic across servers. Load\nbalancers can be found in many parts of a system, from the DNS layer all the\nway to the database layer.\n\n### Server-selection strategy\n\nHow a `load balancer` chooses servers when distributing traffic amongst\nmultiple servers. Commonly used strategies include round-robin, random\nselection, performance-based selection (choosing the server with the best\nperformance metrics, like the fastest response time or the least amount of\ntraffic), and IP-based routing.\n\n### Hot Spot\n\nWhen distributing a workload across a set of servers, that workload might be\nspread unevenly. This can happen if your `sharding key` or your `hashing function`\nare suboptimal, or if your workload is naturally skewed: some servers will\nreceive a lot more traffic than others, thus creating a \"hot spot\".\n\n## Hashing\n\nHashing? Like from hash tables? Should be simple enough, right?\n\nThe good news is that, yes, hashing like from hash tables.\n\nThe bad news is that, no, not simple enough. The video duration and thumbnail should be ominously indicative.\n\n### Consistent hashing\n\nA type of hashing that minimizes the number of keys that need to be remapped\nwhen a hash table gets resized. It's often used by load balancers to\ndistribute traffic to servers; it minimizes the number of requests that get\nforwarded to different servers when new servers are added or when existing\nservers are brought down.\n\n### Rendezvous Hashing\n\nA type of hashing also coined `highest random weight` hashing. Allows for\nminimal re-distribution of mappings when a server goes down.\n\n### SHA\n\nShort for \"Secure Hash Algorithms\", the SHA is a collection of cryptographic\nhash functions used in the industry. These days, SHA-3 is a popular choice to\nuse in a system.\n\n## Relational Databases\n\nTables and ACID.\n\nNo, we're not describing a drug lord's desk, but rather referring to key properties of relational databases. There's a lot of material to cover here, so hit the play button, kick back, and get ready to store tons of knowledge in the biggest database of them all: your brain.\n\n### Relational database\n\nA type of structured database in which data is stored following a tabular\nformat; often supports powerful querying using SQL.\n\n### Non-relational database\n\nIn contrast with relational database (SQL databases), a type of database that\nis free of imposed, tabular-like structure. Non-relational databases are often\nreferred to as NoSQL databases.\n\n### SQL\n\nStructured Query Language. Relational databases can be used using a derivative\nof SQL such as PostgreSQL in the case of Postgres.\n\n### SQL database\n\nAny database that supports SQL. This term is often used synonymously with\n\"Relational Database\", though in practice, not `every` relational\ndatabase supports SQL.\n\n### NoSQL database\n\nAny database that is not SQL-compatible is called NoSQL.\n\n### ACID Transaction\n\nA type of database transaction that has four important properties:\n\n- Atomicity: The operations that constitute the transaction will either\nall succeed or all fail. There is no in-between state.\n\n- Consistency: The transaction cannot bring the database to an invalid\nstate. After the transaction is committed or rolled back, the rules for each\nrecord will still apply, and all future transactions will see the effect of\nthe transaction. Also named `Strong Consistency`.\n\n- Isolation: The execution of multiple transactions concurrently will\n  have the same effect as if they had been executed sequentially.\n\n- Durability: Any committed transaction is written to non-volatile\nstorage. It will not be undone by a crash, power loss, or network partition.\n\n### Database index\n\nA special auxiliary data structure that allows your database to perform\ncertain queries much faster. Indexes can typically only exist to reference\nstructured data, like data stored in relational databases. In practice, you\ncreate an index on one or multiple columns in your database to greatly speed\nup `read` queries that you run very often, with the downside of slightly\nlonger `writes` to your database, since writes have to also take place in\nthe relevant index.\n\n### Strong consistency\n\nStrong Consistency usually refers to the consistency of ACID transactions,\nas opposed to `Eventual Consistency`.\n\n### Eventual consistency\n\nA consistency model which is unlike `Strong Consistency`. In this model,\nreads might return a view of the system that is stale. An eventually\nconsistent datastore will give guarantees that the state of the database will\neventually reflect writes within a time period (could be 10 seconds, or\nminutes).\n\n### Postgres\n\nA relational database that uses a dialect of SQL called PostgreSQL. Provides\nACID transactions.\n\n## Key-Value Stores\n\nOne of the most commonly used NoSQL paradigms today, the key-value store bases its data model on the associative array data type.\n\nThe result? A fast, flexible storage machine that resembles a hash table. That's right folks, our favorite friendly neighborhood data structure strikes again!\n\n### Key-value store\n\nA Key-Value Store is a flexible NoSQL database that's often used for caching\nand dynamic configuration. Popular options include DynamoDB, Etcd, Redis, and\nZooKeeper.\n\n### Etcd\n\nEtcd is a strongly consistent and highly available key-value store that's\noften used to implement leader election in a system.\n\n### Redis\n\nAn in-memory key-value store. Does offer some persistent storage options but is\ntypically used as a really fast, best-effort caching solution. Redis is also often\nused to implement `rate limiting`.\n\n### ZooKeeper\n\nZooKeeper is a strongly consistent, highly available key-value store. It's\noften used to store important configuration or to perform leader election.\n\n## Specialized Storage Paradigms\n\n### Blob storage\n\nWidely used kind of storage, in small and large scale systems. They don’t\nreally count as databases per se, partially because they only allow the user\nto store and retrieve data based on the name of the blob. This is sort of like\na key-value store but usually blob stores have different guarantees. They\nmight be slower than KV stores but values can be megabytes large (or sometimes\ngigabytes large). Usually people use this to store things like\n`large binaries, database snapshots, or images` and other static assets\nthat a website might have.\n\nBlob storage is rather complicated to have on premise, and only giant\ncompanies like Google and Amazon have infrastructure that supports it. So\nusually in the context of System Design interviews you can assume that you\nwill be able to use `GCS` or `S3`. These are blob storage services\nhosted by Google and Amazon respectively, that cost money depending on how\nmuch storage you use and how often you store and retrieve blobs from that\nstorage.\n\n### Time Series Database\n\nA `TSDB` is a special kind of database optimized for storing and\nanalyzing time-indexed data: data points that specifically occur at a given\nmoment in time. Examples of TSDBs are InfluxDB, Prometheus, and Graphite.\n\n### Graph database\n\nA type of database that stores data following the graph data model. Data\nentries in a graph database can have explicitly defined relationships, much\nlike nodes in a graph can have edges.\n\nGraph databases take advantage of their underlying graph structure to perform\ncomplex queries on deeply connected data very fast.\n\nGraph databases are thus often preferred to relational databases when dealing\nwith systems where data points naturally form a graph and have multiple levels\nof relationships—for example, social networks.\n\n### Cypher\n\nA `graph query language` that was originally developed for the Neo4j\ngraph database, but that has since been standardized to be used with other\ngraph databases in an effort to make it the \"SQL for graphs.\"\n\nCypher queries are often much simpler than their SQL counterparts. Example\nCypher query to find data in `Neo4j`, a popular graph database:\n\n```\nMATCH (some_node:SomeLabel)-[:SOME_RELATIONSHIP]-\u0026gt;(some_other_node:SomeLabel {some_property:'value'})\n```\n\n### Spatial Database\n\nA type of database optimized for storing and querying spatial data like\nlocations on a map. Spatial databases rely on spatial indexes like\n`quadtrees` to quickly perform spatial queries like finding all\nlocations in the vicinity of a region.\n\n### Quadtree\n\nA tree data structure most commonly used to index two-dimensional spatial\ndata. Each node in a quadtree has either zero children nodes (and is therefore\na leaf node) or exactly four children nodes.\n\nA quadtree lends itself well to storing spatial data because it can be\nrepresented as a grid filled with rectangles that are recursively subdivided\ninto four sub-rectangles, where each quadtree node is represented by a\nrectangle and each rectangle represents a spatial region. Assuming we're\nstoring locations in the world, we can imagine a quadtree with a maximum\nnode-capacity `n` as follows:\n\nThe root node, which represents the entire world, is the outermost\nrectangle.\n\nIf the entire world has more than `n` locations, the outermost\nrectangle is divided into four quadrants, each representing a region of the\nworld.\n\nSo long as a region has more than `n` locations, its corresponding\nrectangle is subdivided into four quadrants (the corresponding node in the\nquadtree is given four children nodes).\n\nRegions that have fewer than `n` locations are undivided rectangles\n(leaf nodes).\n\nThe parts of the grid that have many subdivided rectangles represent densely\npopulated areas (like cities), while the parts of the grid that have few\nsubdivided rectangles represent sparsely populated areas (like rural areas).\n\nFinding a given location in a perfect quadtree is an extremely fast operation\nthat runs in `log4(x)` time (where `x` is the total\nnumber of locations), since quadtree nodes have four children nodes.\n\n### Google Storage\n\nGCS is a blob storage service provided by Google.\n\n### S3\n\nS3 is a blob storage service provided by Amazon through `Amazon Web Services (AWS)`.\n\n### InfluxDB\n\nA popular open-source time series database.\n\n### Prometheus\n\nA popular open-source time series database, typically used for monitoring purposes.\n\n### Neo4j\n\nA popular graph database that consists of `nodes`, `relationships`,\n`properties`, and `labels`.\n\n## Replication And Sharding\n\nA system's performance is often only as good as its database; optimize the\nlatter, and watch as the former improves in tandem!\n\nOn that note, in this video we'll examine how data redundancy and data partitioning\ntechniques can be used to enhance a system's fault tolerance, throughput, and\noverall reliability.\n\n### Replication\n\nThe act of duplicating the data from one database server to others. This\nis sometimes used to increase the redundancy of your system and\ntolerate regional failures for instance. Other times you can use\nreplication to move data closer to your clients, thus decreasing\nthe latency of accessing specific data.\n\n### Sharding\n\nSometimes called `data partitioning`, sharding is the\nact of splitting a database into two or more pieces called\n`shards` and is typically done to increase the throughput\nof your database. Popular sharding strategies include:\n\n- Sharding based on a client's region.\n- Sharding based on the type of data being stored (e.g: user data gets\nstored in one shard, payments data gets stored in another shard)\n- Sharding based on the hash of a column (only for structured data).\n\n### Hot Spot\n\nWhen distributing a workload across a set of servers, that workload might be\nspread unevenly. This can happen if your `sharding key` or your `hashing function`\nare suboptimal, or if your workload is naturally skewed: some servers will\nreceive a lot more traffic than others, thus creating a \"hot spot\".\n\n## Leader Election\n\nCitizens in a society typically elect a leader by voting for their preferred\ncandidate. But how do servers in a distributed system choose a master node?\nVia algorithms of course!\n\nThis form of algorithmic democracy is known as \"leader election\", though we\npersonally think \"algorithmocracy\" sounds way cooler.\n\n### Leader election\n\nThe process by which nodes in a cluster (for instance, servers in a set of\nservers) elect a so-called \"leader\" amongst them, responsible for the primary\noperations of the service that these nodes support. When correctly\nimplemented, leader election guarantees that all nodes in the cluster know\nwhich one is the leader at any given time and can elect a new leader if the\nleader dies for whatever reason.\n\n### Consensus algorithm\n\nA type of complex algorithms used to have multiple entities agree on a single\ndata value, like who the \"leader\" is amongst a group of machines. Two popular\nconsensus algorithms are `Paxos` and `Raft`.\n\n### Paxos and Raft\n\nTwo consensus algorithms that, when implemented correctly, allow for the\nsynchronization of certain operations, even in a distributed setting.\n\n### Etcd\n\nEtcd is a strongly consistent and highly available key-value store that's\noften used to implement leader election in a system.\n\n### ZooKeeper\n\nZooKeeper is a strongly consistent, highly available key-value store. It's\noften used to store important configuration or to perform leader election.\n\n## Peer-To-Peer Networks\n\nEquality for all.\nSharing is caring.\nUnity makes strength.\nThe more the merrier.\nTeamwork makes the dream work.\nWelcome to peer-to-peer networks!\n\n### Peer-to-peer network\n\nA collection of machines referred to as peers that divide a workload between\nthemselves to presumably complete the workload faster than would otherwise be\npossible. Peer-to-peer networks are often used in file-distribution systems.\n\n### Gossip protocol\n\nWhen a set of machines talk to each other in a uncoordinated manner in a\ncluster to spread information through a system without requiring a central\nsource of data.\n\n## Polling And Streaming\n\nYou can think of polling and streaming kind of like a classroom; sometimes\nstudents ask the teacher lots of questions, and other times they quiet down and\nlisten attentively to the teacher's lecture.\n\nNow fire up the video and get ready to stream; you won't be able to poll here.\n\n### Polling\n\nThe act of fetching a resource or piece of data regularly at an interval to\nmake sure your data is not too stale.\n\n### Streaming\n\nIn networking, it usually refers to the act of continuously getting a feed of\ninformation from a server by keeping an open connection between the two\nmachines or processes.\n\n## Configuration\n\nThe config file is like the genome of a computer application; it stores parameters\nthat define your system's critical settings, much like your DNA stores the genes\nthat define your physical characteristics.\n\nUnlike its biological counterpart though, the config file is easily editable.\nNo gene therapy needed!\n\n### Configuration\n\nA set of parameters or constants that are critical to a system. Configuration\nis typically written in `JSON` or `YAML` and can be either `static`, meaning\nthat it's hard-coded in and shipped with your system's application code (like\nfrontend code, for instance), or `dynamic`, meaning that it lives outside\nof your system's application code.\n\n## Rate Limiting\n\nThe act of limiting the number of requests sent to or from a system. Rate\nlimiting is most often used to limit the number of incoming requests in order\nto prevent `DoS attacks` and can be enforced at the IP-address level, at the\nuser-account level, or at the region level, for example. Rate limiting can\nalso be implemented in tiers; for instance, a type of network request could be\nlimited to 1 per second, 5 per 10 seconds, and 10 per minute.\n\n### DoS attack\n\nShort for \"denial-of-service attack\", a DoS attack is an attack in which a\nmalicious user tries to bring down or damage a system in order to render it\nunavailable to users. Much of the time, it consists of flooding it with\ntraffic. Some DoS attacks are easily preventable with rate limiting, while\nothers can be far trickier to defend against.\n\n### DDoS attack\n\nShort for \"distributed denial-of-service attack\", a DDoS attack is a DoS\nattack in which the traffic flooding the target system comes from many\ndifferent sources (like thousands of machines), making it much harder to\ndefend against.\n\n### Redis\n\nAn in-memory key-value store. Does offer some persistent storage options but is\ntypically used as a really fast, best-effort caching solution. Redis is also often\nused to implement `rate limiting`.\n\n## Logging And Monitoring\n\nIn order to properly understand and diagnose issues that crop up within a system,\nit’s critical to have mechanisms in place that create audit trails of various events\nthat occur within said system.\n\nSo go ahead, unleash your inner Orwell and go full Big Brother on your application.\n\n### Logging\n\nThe act of collecting and storing logs-useful information about events in\nyour system. Typically your programs will output log messages to its STDOUT\nor STDERR pipes, which will automatically get aggregated into a `centralized\nlogging solution`.\n\n### Monitoring\n\nThe process of having visibility into a system's key metrics, monitoring is\ntypically implemented by collecting important events in a system and\naggregating them in human-readable charts.\n\n### Alerting\n\nThe process through which system administrators get notified when critical\nsystem issues occur. Alerting can be set up by defining specific thresholds\non monitoring charts, past which alerts are sent to a communication channel\nlike Slack.\n\n## Publish-Subscribe Pattern\n\nPublish/Subscribe. Press/Tug. Produce/Consume. Push/Pull. Send/Receive. Throw/Catch. Thrust/Retrieve.\n\nThree of these can be used interchangeably in the context of systems design. The others cannot.\n\n### Pub-Sub Pattern\n\nOften shortened as `Pub/Sub`, the Publish/Subscribe pattern is a popular\nmessaging model that consists of `publishers` and `subscribers`.\nPublishers publish messages to special `topics` (sometimes called\n`channels`) without caring about or even knowing who will read those\nmessages, and subscribers subscribe to topics and read messages coming through\nthose topics.\n\nPub/Sub systems often come with very powerful guarantees like\n`at-least-once delivery`, `persistent storage`,\n`ordering` of messages, and `replayability` of messages.\n\n### Idempotent operation\n\nAn operation that has the same ultimate outcome regardless of how many times\nit's performed. If an operation can be performed multiple times without\nchanging its overall effect, it's idempotent. Operations performed through a\n`Pub/Sub` messaging system typically have to be idempotent, since Pub/Sub\nsystems tend to allow the same messages to be consumed multiple times.\n\nFor example, increasing an integer value in a database is `not` an\nidempotent operation, since repeating this operation will not have the same\neffect as if it had been performed only once. Conversely, setting a value to\n\"COMPLETE\" `is` an idempotent operation, since repeating this operation\nwill always yield the same result: the value will be \"COMPLETE\".\n\n### Apache Kafka\n\nA distributed messaging system created by LinkedIn. Very useful\nwhen using the `streaming` paradigm as opposed to `polling`.\n\n### Cloud Pub Sub\n\nA highly-scalable Pub/Sub messaging service created by Google. Guarantees\n`at-least-once delivery` of messages and supports \"rewinding\" in order to\nreprocess messages.\n\n## MapReduce\n\nA popular framework for processing very large datasets in a distributed\nsetting efficiently, quickly, and in a fault-tolerant manner. A MapReduce job\nis comprised of 3 main steps:\n\n- the `Map` step, which runs a `map function` on the various chunks\nof the dataset and transforms these chunks into intermediate `key-value pairs`.\n\n- the `Shuffle` step, which reorganizes the intermediate\n`key-value pairs` such that pairs of the same key are routed\nto the same machine in the final step.\n\n- the `Reduce` step, which runs a `reduce function` on the newly\nshuffled `key-value pairs` and transforms them into more meaningful\ndata.\n\nThe canonical example of a MapReduce use case is counting the number of\noccurrences of words in a large text file.\n\nWhen dealing with a MapReduce library, engineers and/or systems administrators\nonly need to worry about the map and reduce functions, as well as their inputs\nand outputs. All other concerns, including the parallelization of tasks and\nthe fault-tolerance of the MapReduce job, are abstracted away and taken care\nor by the MapReduce implementation.\n\n### Distributed File System\n\nA Distributed File System is an abstraction over a (usually large) cluster of\nmachines that allow them to act like one large file system. The two most\npopular implementations of a DFS are the `Google File System` (GFS) and\nthe `Hadoop Distributed File System` (HDFS).\n\nTypically, DFSs take care of the classic `availability` and\n`replication` guarantees that can be tricky to obtain in a\ndistributed-system setting. The overarching idea is that files are split into\nchunks of a certain size (4MB or 64MB, for instance), and those chunks are\nsharded across a large cluster of machines. A central control plane is in\ncharge of deciding where each chunk resides, routing reads to the right nodes,\nand handling communication between machines.\n\nDifferent DFS implementations have slightly different APIs and semantics, but\nthey achieve the same common goal: extremely large-scale persistent storage.\n\n### Hadoop\n\nA popular, open-source framework that supports MapReduce jobs and many\nother kinds of data-processing pipelines. Its central component is `HDFS`\n(Hadoop Distributed File System), on top of which other technologies have\nbeen developed.\n\n## Security And HTTPS\n\nWhile network security is of critical importance to virtually any system,\nit's beyond the scope of most system design interviews.\n\nThat being said, having even a cursory understanding of a few key concepts\ncould very well materialize into the edge you need to ace your interview and\nsecure—pun perhaps intended—a job offer.\n\n### Main-in-the-middle attack\n\nAn attack in which the attacker intercepts a line of communication that is\nthought to be private by its two communicating parties.\n\nIf a malicious actor intercepted and mutated an IP packet on its way from a\nclient to a server, that would be a man-in-the-middle attack.\n\nMITM attacks are the primary threat that encryption and `HTTPS` aim to\ndefend against.\n\n### Symmetric encryption\n\nA type of encryption that relies on only a single key to both encrypt and\ndecrypt data. The key must be known to all parties involved in communication\nand must therefore typically be shared between the parties at one point or\nanother.\n\nSymmetric-key algorithms tend to be faster than their asymmetric counterparts.\n\nThe most widely used symmetric-key algorithms are part of the Advanced\nEncryption Standard (`AES`).\n\n### Asymmetric encryption\n\nAlso known as public-key encryption, asymmetric encryption relies on two\nkeys—a public key and a private key—to encrypt and decrypt data. The keys are\ngenerated using cryptographic algorithms and are mathematically connected such\nthat data encrypted with the public key can only be decrypted with the private\nkey.\n\nWhile the private key must be kept secure to maintain the fidelity of this\nencryption paradigm, the public key can be openly shared.\n\nAsymmetric-key algorithms tend to be slower than their symmetric counterparts.\n\n### AES\n\nStands for `Advanced Encryption Standard`. AES is a widely used\nencryption standard that has three symmetric-key algorithms (AES-128, AES-192,\nand AES-256).\n\nOf note, AES is considered to be the \"gold standard\" in encryption and is even\nused by the U.S. National Security Agency to encrypt top secret information.\n\n### HTTPS\n\nThe `H`yper`T`ext `T`ransfer `P`rotocol `S`ecure is\nan extension of `HTTP` that's used for secure communication online. It\nrequires servers to have trusted certificates (usually\n`SSL certificates`) and uses the Transport Layer Security (`TLS`), a\nsecurity protocol built on top of `TCP`, to encrypt data communicated\nbetween a client and a server.\n\n### TLS\n\nThe `T`ansport `L`ayer `S`ecurity is a security protocol over\nwhich `HTTP` runs in order to achieve secure communication online. \"HTTP\nover TLS\" is also known as `HTTPS`.\n\n### SSL Certificate\n\nA digital certificate granted to a server by a `certificate authority`.\nContains the server's public key, to be used as part of the\n`TLS handshake` process in an `HTTPS` connection.\n\nAn SSL certificate effectively confirms that a public key belongs to the\nserver claiming it belongs to them. SSL certificates are a crucial defense\nagainst `man-in-the-middle attacks`.\n\n### Certificate Authority\n\nA trusted entity that signs digital certificates—namely, SSL certificates that\nare relied on in `HTTPS` connections.\n\n### TLS Handshake\n\nThe process through which a client and a server communicating over\n`HTTPS` exchange encryption-related information and establish a secure\ncommunication. The typical steps in a TLS handshake are roughly as follows:\n\n\n- The client sends a `client hello`—a string of random bytes—to the\nserver.\n\n- The server responds with a `server hello`—another string of random\nbytes—as well as its `SSL certificate`, which contains its\n`public key`.\n\n- The client verifies that the certificate was issued by a\n`certificate authority` and sends a `premaster secret`—yet another\nstring of random bytes, this time encrypted with the server's public key—to\nthe server.\n\n- The client and the server use the client hello, the server hello, and the\npremaster secret to then generate the same `symmetric-encryption` session keys,\nto be used to encrypt and decrypt all data communicated during the remainder\nof the connection.\n\n## API Design\n\nCould you pass an API design interview?\n\nIf you're sweating bullets, then sweat no more. This final video is the last\npiece of the puzzle you need to become a true Systems Expert.\n\n### Pagination\n\nWhen a network request potentially warrants a really large response, the\nrelevant API might be designed to return only a single `page`\nof that response (i.e., a limited portion of the response), accompanied by an\nidentifier or token for the client to request the next page if desired.\n\nPagination is often used when designing `List` endpoints. For instance,\nan endpoint to list videos on the YouTube Trending page could return a huge\nlist of videos. This wouldn't perform very well on mobile devices due to the\nlower network speeds and simply wouldn't be optimal, since most users will\nonly ever scroll through the first ten or twenty videos. So, the API could be\ndesigned to respond with only the first few videos of that list; in this case,\nwe would say that the API response is `paginated`.\n\n### CRUD Operations\n\nStands for `Create`, `Read`, `Update`, `Delete` Operations. These four operations\noften serve as the bedrock of a functioning system and therefore find themselves\nat the core of many APIs. The term `CRUD` is very likely to come up during an\nAPI-design interview.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvbrazo%2Fsystem-design-archives","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvbrazo%2Fsystem-design-archives","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvbrazo%2Fsystem-design-archives/lists"}