{"id":30752626,"url":"https://github.com/harshpreet931/systemdesignnotes","last_synced_at":"2026-02-11T20:31:32.875Z","repository":{"id":312687866,"uuid":"1048351520","full_name":"harshpreet931/SystemDesignNotes","owner":"harshpreet931","description":"a comprehension course about system design accompanied by a youtube series - https://www.youtube.com/channel/UCAkpuFPaycl9WflYoyiTpUQ/","archived":false,"fork":false,"pushed_at":"2026-01-12T16:03:29.000Z","size":52783,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-12T18:24:07.173Z","etag":null,"topics":["educational-content","learning-resources","open-source-education","software-architecture","system-design","tech-interview-prep","tutorial"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/harshpreet931.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-01T10:04:04.000Z","updated_at":"2026-01-12T16:27:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"8d05d009-1153-4c42-90dc-02a9e98418d6","html_url":"https://github.com/harshpreet931/SystemDesignNotes","commit_stats":null,"previous_names":["harshpreet931/systemdesignnotes"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/harshpreet931/SystemDesignNotes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harshpreet931%2FSystemDesignNotes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harshpreet931%2FSystemDesignNotes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harshpreet931%2FSystemDesignNotes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harshpreet931%2FSystemDesignNotes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/harshpreet931","download_url":"https://codeload.github.com/harshpreet931/SystemDesignNotes/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harshpreet931%2FSystemDesignNotes/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29343987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T20:11:40.865Z","status":"ssl_error","status_checked_at":"2026-02-11T20:10:41.637Z","response_time":97,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["educational-content","learning-resources","open-source-education","software-architecture","system-design","tech-interview-prep","tutorial"],"created_at":"2025-09-04T08:35:20.628Z","updated_at":"2026-02-11T20:31:32.868Z","avatar_url":"https://github.com/harshpreet931.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# System Design Course\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n![SD_README_LOGO](./assets/sd_readme_logo.png)\n\n\u003e **\"What separates your weekend project from Netflix or Uber? It's not just more servers or code, it's the blueprint. It's system design. : )\"**\n\nWelcome to the most comprehensive, hands on system design course that takes you from zero to hero!\n\n## **Episodes**\n- **Episode 1**: [System Design Fundamentals](./episodes/01-fundamentals/) ✓\n- **Episode 2**: [Monolith vs Microservices](./episodes/02-monolith-microservices/) ✓\n- **Episode 3**: [Functional vs Non-Functional Requirements](./episodes/03-functional-nonfunctional-requirements/) ✓\n- **Episode 4**: [Horizontal vs Vertical Scaling](./episodes/04-horizontal-vertical-scaling/) ✓\n- **Episode 5**: [Stateless vs Stateful Systems](./episodes/05-stateless-stateful-systems/) ✓\n- **Episode 6**: [Load Balancing](./episodes/06-load-balancing/) ✓\n- **Episode 7**: [Caching](./episodes/07-caching/) ✓\n- **Episode 8**: [CDNs Explained](./episodes/08-cdns/) ✓\n- **Episode 9**: [Databases Guide](./episodes/09-databases/) ✓\n- **Episode 10**: [Vector Databases](./episodes/10-vector-databases/) ✓\n- **Episode 11**: [Keys in DBMS](./episodes/11-keys-in-dbms/) ✓\n- **Episode 12**: [Normalization vs Denormalization](./episodes/12-normalization-denormalization/) ✓\n- **Episode 13**: [Database Indexing Mastery](./episodes/13-indexing/) ✓\n- **Episode 14**: [Sharding \u0026 Partitioning](./episodes/14-sharding-partitioning/) ✓\n- **Episode 15**: [Database Transactions \u0026 ACID](./episodes/15-database-transactions/) ✓\n- **Episode 16**: [CAP Theorem is Not Enough](./episodes/16-cap-theorem/) ✓\n- **Episode 17**: [OSI Model Deep Dive](./episodes/17-osi-model/) ✓\n- **Episode 18**: [TCP vs UDP - The Complete Engineering Deep Dive](./episodes/18-tcp-udp/) ✓\n- **Episode 19**: [REST vs gRPC - The Complete Architecture Masterclass](./episodes/19-rest-grpc/) ✓\n- So on...\n\n## Episode 1: System Design Fundamentals\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/01-fundamentals/) | [View Presentation](./episodes/01-fundamentals/presentation/)**\n\n### What You'll Learn:\n- What is System Design and why it matters\n- High-Level Design (HLD) vs Low-Level Design (LLD)\n- Real example: Designing a URL Shortener\n- Hands-on: Build your first system architecture\n\n### Key Concepts Covered:\n```\nSystem Design = Software Architecture Blueprint\n├── High-Level Design (HLD) - The Big Picture\n│   ├── Major Components \u0026 Services\n│   ├── Technology Stack Decisions\n│   ├── Data Flow Architecture\n│   └── Third-party Integrations\n└── Low-Level Design (LLD) - The Details\n    ├── Classes, Methods \u0026 Data Structures\n    ├── Database Schemas \u0026 Relationships\n    ├── Algorithms \u0026 Implementation Logic\n    └── Error Handling \u0026 Edge Cases\n```\n\n## Episode 2: Monoliths vs Microservices\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/02-monolith-microservices/)**\n\n### What You'll Learn:\n- What monolithic architecture is and when to use it\n- What microservices architecture is and its benefits\n- Real-world examples: Netflix's evolution and Uber's architecture\n- Practical decision framework for choosing between approaches\n\n### Key Concepts Covered:\n```\nArchitecture Patterns Comparison\n├── Monolithic Architecture\n│   ├── Single Codebase \u0026 Deployable Unit\n│   ├── Shared Resources \u0026 Database\n│   ├── Advantages: Simple, Fast, Easy to Debug\n│   └── Challenges: Scalability, Technology Lock-in\n└── Microservices Architecture\n    ├── Independent Services \u0026 Databases\n    ├── Distributed System Architecture\n    ├── Advantages: Scalability, Flexibility, Fault Isolation\n    └── Challenges: Complexity, Operational Overhead\n```\n\n## Episode 3: Functional vs Non-Functional Requirements\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/03-functional-nonfunctional-requirements/)**\n\n### What You'll Learn:\n- What requirements are and why they're critical to system design\n- The difference between functional and non-functional requirements\n- How to identify and document both requirement types\n- Real-world example: Online bookstore requirements breakdown\n- The requirements elicitation process\n\n### Key Concepts Covered:\n```\nRequirements = Foundation of System Design\n├── Functional Requirements (WHAT the system does)\n│   ├── User Actions \u0026 Features\n│   ├── System Operations \u0026 Business Logic\n│   ├── Data Processing \u0026 Integrations\n│   └── Example: User can create account, add to cart, checkout\n└── Non-Functional Requirements (HOW WELL it performs)\n    ├── Performance: Response time, load time\n    ├── Scalability: Concurrent users, data growth\n    ├── Availability: Uptime (99.9%, 99.99%)\n    ├── Security: Encryption, authentication, compliance\n    ├── Usability: User experience, accessibility\n    ├── Maintainability: Code quality, integration time\n    └── Portability: Cross-platform, deployment flexibility\n```\n\n## Episode 4: Horizontal vs Vertical Scaling\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/04-horizontal-vertical-scaling/)**\n\n### What You'll Learn:\n- What scalability means across three dimensions (load, data, compute)\n- Vertical scaling: Making one machine more powerful\n- Horizontal scaling: Distributed systems engineering\n- Real-world examples: Netflix's evolution and AWS instances\n- Decision matrix and practical frameworks for choosing the right approach\n- Monitoring, metrics, and autoscaling strategies\n\n### Key Concepts Covered:\n```\nScaling Strategies Comparison\n├── Vertical Scaling (Scale Up)\n│   ├── Upgrade CPU, RAM, Storage on single machine\n│   ├── AWS Example: r6i.large → r6i.24xlarge (48x power)\n│   ├── Advantages: Simple, no code changes, ACID consistency\n│   └── Challenges: Physical limits, single point of failure, cost\n├── Horizontal Scaling (Scale Out)\n│   ├── Add more servers, distribute load\n│   ├── Requires: Stateless architecture, load balancers\n│   ├── Advantages: Unlimited scale, fault tolerance, flexibility\n│   └── Challenges: CAP theorem, network latency, complexity\n└── Hybrid Approach (Best of Both)\n    ├── Vertical for databases, horizontal for app servers\n    ├── Netflix: 1000+ microservices, 300M+ users\n    └── Autoscaling: Reactive, predictive, serverless\n```\n\n## Episode 5: Stateless vs Stateful Systems\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/05-stateless-stateful-systems/)**\n\n### What You'll Learn:\n- What \"state\" means in software systems (memory and session data)\n- Stateless systems: Vending machine analogy and REST APIs\n- Stateful systems: Bank teller analogy and session management\n- Hybrid architecture: Stateless app tier + external state stores\n- Real-world examples: Netflix, Amazon, and WhatsApp architectures\n- Decision framework for choosing the right approach\n\n### Key Concepts Covered:\n```\nState Management Strategies\n├── Stateless Systems (Amnesia Design)\n│   ├── No server memory between requests\n│   ├── Every request includes full context (tokens, auth)\n│   ├── Advantages: Perfect clones, easy scaling, fault tolerance\n│   └── Challenges: Chattier requests, external state needed\n├── Stateful Systems (Memory Design)\n│   ├── Server remembers session context\n│   ├── Requires: Sticky sessions, session storage\n│   ├── Advantages: Efficient, fast (in-memory), simple client\n│   └── Challenges: Sticky sessions, fragile, scaling hard\n└── Hybrid Architecture (Modern Approach)\n    ├── Stateless application servers\n    ├── Centralized state in Redis/DynamoDB/Cassandra\n    ├── Netflix: Stateless microservices + Cassandra\n    ├── Amazon: Stateless servers + DynamoDB carts\n    └── WhatsApp: Stateful connections for real-time (2B users)\n```\n\n## Episode 6: Load Balancing\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/06-load-balancing/)**\n\n### What You'll Learn:\n- What load balancing is and its critical role in distributed systems\n- Primary objectives: Scalability and High Availability\n- 9 load balancing algorithms and when to use each\n- Health monitoring: L4 (TCP) vs L7 (HTTP) checks\n- Session persistence strategies (IP Hash vs Cookie-Based)\n- Real-world example: Netflix's multi-layer architecture\n- Load balancer types: Hardware, Software, and Cloud\n- L4 vs L7 load balancing and their trade-offs\n\n### Key Concepts Covered:\n```\nLoad Balancing Strategies\n├── Primary Objectives\n│   ├── Scalability: Horizontal scaling with commodity servers\n│   └── High Availability: 99.99% uptime, health checks, failover\n├── Load Balancing Algorithms (9 total)\n│   ├── Round Robin: Simple, zero overhead, default\n│   ├── Weighted Round Robin: Heterogeneous hardware capacity\n│   ├── Least Connections: Dynamic, state-aware\n│   ├── Weighted Least Connections: Best of both worlds\n│   ├── Least Response Time: Latency + connections\n│   ├── Resource-Based: CPU/memory monitoring with agents\n│   ├── Geographic (GSLB): DNS-based, multi-region\n│   ├── IP Hash: Sticky sessions (L4)\n│   └── Cookie-Based: Sticky sessions (L7)\n├── Session Persistence\n│   ├── IP Hash: Simple but NAT/proxy issues\n│   └── Cookie-Based: Robust L7 solution\n├── Real-World Architecture\n│   ├── Netflix: GSLB → AWS ELB → Zuul → Microservices\n│   ├── 300M subscribers, 1000+ microservices\n│   └── Path-based routing (/play, /browse)\n├── Load Balancer Types\n│   ├── Hardware: F5 BIG-IP, specialized silicon\n│   ├── Software: HAProxy, NGINX (flexible, cheap)\n│   └── Cloud: AWS ALB/NLB (managed, auto-scaling)\n└── L4 vs L7 Load Balancing\n    ├── L4: IP/port level, fast (\u003c1ms), simple\n    └── L7: Content-based routing, SSL termination, microservices\n```\n\n## Episode 7: Caching\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/07-caching/) | [View Presentation](./episodes/07-caching/presentation/)**\n\n### What You'll Learn:\n- What caching is and why it's fundamental to high-performance systems\n- The benefits: reduced latency, increased throughput, lower costs\n- Client-side vs. server-side caching strategies\n- CDN caching for global content delivery\n- Application-level caching with Redis and Memcached\n- Database caching mechanisms\n- Cache eviction policies: LRU, LFU, FIFO\n- Cache invalidation strategies: TTL, Write-Through, Write-Behind\n- The Thundering Herd problem and mitigation strategies\n- Real-world case studies: Facebook, Netflix, and Amazon\n\n### Key Concepts Covered:\n```\nCaching = Smart Memory for Faster Systems\n├── Core Benefits\n│   ├── 10-100x faster response times\n│   ├── 50-90% reduction in database queries\n│   └── Lower infrastructure costs\n├── Caching Locations\n│   ├── Client-Side: Browser cache, DNS cache, mobile storage\n│   ├── CDN Caching: Edge locations globally\n│   ├── Application Cache: Redis, Memcached, local memory\n│   └── Database Cache: Buffer pool, query result cache\n├── Eviction Policies (decide what to remove)\n│   ├── LRU: Least Recently Used (most common)\n│   ├── LFU: Least Frequently Used\n│   └── FIFO: First In, First Out\n├── Invalidation Strategies (keep data fresh)\n│   ├── Write-Through: Write both cache and DB\n│   ├── Write-Behind: Async writes to DB\n│   ├── TTL: Time-based automatic expiration\n│   └── Active: Explicit invalidation on updates\n├── Design Patterns\n│   ├── Cache-Aside: Application manages cache\n│   ├── Read-Through: Cache loads itself on miss\n│   └── Refresh-Ahead: Proactive refresh\n└── Real-World Case Studies\n    ├── Facebook: Multi-level caching with Memcached + Tao\n    ├── Netflix: Cache everything, cache everywhere (EVCache)\n    └── Amazon: DAX for microsecond DynamoDB reads\n```\n\n## Episode 8: CDNs Explained\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/08-cdns/) | [View Presentation](./episodes/08-cdns/presentation/)**\n\n### What You'll Learn:\n- What a Content Delivery Network (CDN) is and its core purpose\n- How CDNs solve the latency problem caused by geographic distance\n- The CDN workflow: Cache hits, cache misses, and origin fetching\n- The key benefits: Performance, availability, security, and cost savings\n- What content is ideal for CDN caching (static vs dynamic)\n- Cache control mechanisms: Headers, purging, and versioning\n- How to choose a CDN provider based on features and pricing\n- Real-world case studies: Netflix Open Connect, Facebook's infrastructure\n- Edge computing concepts and modern CDN capabilities\n\n### Key Concepts Covered:\n```\nCDN = Global Network for Fast Content Delivery\n├── Core Concept\n│   ├── Edge Servers / PoPs (Points of Presence)\n│   ├── Geographic Distribution (global network)\n│   └── Content Caching at the Edge\n├── The Distance Problem\n│   ├── Latency increases with distance\n│   ├── Tokyo user from NYC: ~180ms without CDN\n│   └── Tokyo user with CDN: ~5ms (36x faster!)\n├── How CDNs Work\n│   ├── DNS-based routing to nearest edge\n│   ├── Cache HIT: Serve from edge (1-5ms)\n│   ├── Cache MISS: Fetch from origin, cache, serve\n│   └── TTL-based expiration with revalidation\n├── Key Benefits\n│   ├── Performance: 50-90% latency reduction\n│   ├── Availability: Redundancy, fail-over\n│   ├── Security: DDoS protection, WAF\n│   └── Cost: 80% bandwidth savings\n├── What to Cache\n│   ├── Static Assets (Images, CSS, JS, Fonts)\n│   ├── Videos (MP4, HLS, DASH streaming)\n│   └── Dynamic Content (with care!)\n├── Cache Control\n│   ├── Cache-Control headers (max-age, public/private)\n│   ├── ETags for conditional requests (304 Not Modified)\n│   ├── Manual Purging (API or dashboard)\n│   └── Filename Versioning (best practice)\n└── Popular Providers\n    ├── Cloudflare: Free tier, security included\n    ├── AWS CloudFront: Deep AWS integration\n    ├── Akamai: Enterprise scale (365K+ servers)\n    ├── Fastly: Real-time purging, developer focus\n    └── Google Cloud CDN: GCP ecosystem\n```\n\n## Episode 9: The Ultimate Guide to Databases\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/09-databases/) | [View Presentation](./episodes/09-databases/presentation/)**\n\n### What You'll Learn:\n- What a database and DBMS are\n- Database design fundamentals: entities, attributes, relationships\n- SQL (Relational) databases and ACID compliance\n- Four core types of NoSQL databases\n- Two emerging types: Time-Series and Vector databases\n- How to choose the right database for your use case\n- NewSQL: Bridging SQL and NoSQL\n- Polyglot persistence in modern applications\n\n### Key Concepts Covered:\n```\nDatabase Types Overview\n├── SQL (Relational Databases)\n│   ├── Data Model: Tables with strict schema\n│   ├── Key Feature: ACID Compliance (Atomicity, Consistency, Isolation, Durability)\n│   ├── Examples: PostgreSQL, MySQL, SQL Server\n│   └── Best For: Transactions, financial data, structured data\n├── NoSQL (Non-Relational) - Core Types\n│   ├── Key-Value Stores: Simplest model (Redis, DynamoDB)\n│   │   └── Best For: Caching, sessions, simple lookups\n│   ├── Document Databases: Flexible JSON documents (MongoDB)\n│   │   └── Best For: User profiles, product catalogs\n│   ├── Column-Family Stores: Columnar storage (Cassandra)\n│   │   └── Best For: Big data analytics, time-series, logging\n│   └── Graph Databases: Nodes and relationships (Neo4j)\n│       └── Best For: Social networks, recommendations, fraud detection\n├── NoSQL - Emerging Types\n│   ├── Time-Series Databases: Optimized for timestamps (InfluxDB)\n│   │   └── Best For: IoT sensors, DevOps metrics, stock data\n│   └── Vector Databases: Store embeddings for AI (Pinecone, Weaviate)\n│       └── Best For: AI recommendations, semantic search, chatbots\n├── NewSQL: Bridging SQL and NoSQL\n│   ├── Concept: ACID guarantees + horizontal scaling\n│   ├── Examples: CockroachDB, YugabyteDB, TiDB\n│   └── Best For: Cloud-native apps needing SQL at scale\n└── Polyglot Persistence\n    ├── Use multiple databases for different needs\n    ├── Example: PostgreSQL (orders) + MongoDB (catalog) + Redis (cache)\n    └── Modern approach: Choose right tool for each job\n```\n\n## Episode 10: Vector Databases\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/10-vector-databases/) | [View Presentation](./episodes/10-vector-databases/presentation/)**\n\n### What You'll Learn:\n- What vector embeddings are and why they matter\n- How vector databases differ from traditional databases\n- Approximate Nearest Neighbor (ANN) search algorithms\n- Core use cases: semantic search, recommendations, RAG\n- Popular vector databases: Pinecone, Weaviate, Milvus, Qdrant\n- Embedding generation and model selection\n- Hybrid search: combining vectors with metadata\n- Performance considerations: HNSW, quantization, filtering\n\n### Key Concepts Covered:\n```\nVector Databases Fundamentals\n├── Vector Embeddings\n│   ├── Numerical representations of meaning\n│   ├── 768-4096 dimensions (model-dependent)\n│   └── Similar meanings → similar vectors\n├── ANN Search Algorithms\n│   ├── HNSW: Fastest for online queries\n│   ├── IVF: Fast build, moderate memory\n│   └── PQ: Compressed storage (80% smaller)\n├── Use Cases\n│   ├── Semantic Search: Meaning-based, not keyword\n│   ├── RAG: Context for LLMs (most common!)\n│   └── Recommendations: Similar items/users\n├── Popular Vector DBs\n│   ├── Pinecone: Managed, easy scaling\n│   ├── Weaviate: Open source, multimodal\n│   ├── Milvus: High scale, flexible\n│   ├── Qdrant: Rust-based, fast, simple\n│   └── Chroma: Lightweight, Python-native\n└── Performance Optimization\n    ├── Index tuning (HNSW m, efConstruction)\n    ├── Quantization (Binary, Scalar, Product)\n    └── Hybrid search (Vector + Metadata filtering)\n```\n\n## Episode 11: Keys in DBMS\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/11-keys-in-dbms/) | [View Presentation](./episodes/11-keys-in-dbms/presentation/)**\n\n### What You'll Learn:\n- The purpose of database keys in ensuring data integrity\n- The difference between Super Keys, Candidate Keys, and Primary Keys\n- How Primary Keys differ from Unique Keys (null handling)\n- The role of Alternate Keys and Composite Keys\n- Foreign Keys and referential integrity\n- Surrogate vs Natural Keys and when to use each\n- Performance implications: indexing Foreign Keys\n- Modern distributed keys: ULIDs and Snowflake IDs\n- Temporal keys for auditing and compliance\n\n### Key Concepts Covered:\n```\nDatabase Keys Fundamentals\n├── Key Hierarchy\n│   ├── Super Key: Any unique set (may be redundant)\n│   ├── Candidate Key: Minimal unique set (potential PKs)\n│   ├── Primary Key: The chosen one (unique + NOT NULL)\n│   ├── Alternate Key: Backup candidate keys\n│   └── Foreign Key: Links tables, enforces referential integrity\n├── Key Comparison\n│   ├── Primary Key: 1 per table, NO nulls, auto-indexed\n│   ├── Unique Key: Multiple allowed, 1 null typically OK\n│   ├── Foreign Key: Can repeat, nullable, MUST index\n│   └── Composite Key: 2+ columns form unique identifier\n├── Key Types\n│   ├── Natural Key: From business data (SSN, Email)\n│   ├── Surrogate Key: System-generated (Auto-increment, UUID)\n│   └── Temporal Key: Tracks when data was valid (auditing)\n├── Modern Distributed Keys\n│   ├── Snowflake ID: 64-bit, time-sortable, distributed\n│   ├── ULID: Lexicographically sortable, 128-bit\n│   └── Problem: UUIDv4 causes index fragmentation\n└── Best Practices\n    ├── Index all Foreign Keys (prevents slow JOINs)\n    ├── Use Surrogate Keys for volatile data\n    ├── Keep Natural Keys as Unique/Alternate Keys\n    └── Every table needs a Primary Key\n```\n\n## Episode 12: Normalization vs Denormalization\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/12-normalization-denormalization/) | [View Presentation](./episodes/12-normalization-denormalization/presentation/)**\n\n### What You'll Learn:\n- What normalization is and why it ensures data integrity\n- The three data anomalies: Insertion, Update, and Deletion\n- Normal Forms progression: 1NF through 5NF\n- Functional, partial, and transitive dependencies\n- Boyce-Codd Normal Form (BCNF) and when to use it\n- What denormalization is and why it improves read performance\n- Denormalization techniques: redundant columns, table splitting, materialized views\n- When to choose normalization (OLTP) vs denormalization (OLAP)\n- The hybrid approach in modern system architecture\n\n### Key Concepts Covered:\n```\nDatabase Design: Structure vs Performance\n├── Data Anomalies (Why Normalize?)\n│   ├── Insertion Anomaly: Cannot add data without other data\n│   ├── Update Anomaly: Inconsistency from multiple copies\n│   └── Deletion Anomaly: Losing data when deleting unrelated record\n├── Normal Forms (Normalization)\n│   ├── 1NF: Atomic values, no repeating groups\n│   ├── 2NF: No partial dependency (full key required)\n│   ├── 3NF: No transitive dependency\n│   ├── BCNF: Every determinant is a superkey\n│   ├── 4NF: No multi-valued dependency\n│   └── 5NF: No join dependency (rare)\n├── Denormalization Techniques\n│   ├── Redundant Columns: Duplicate data to avoid JOINs\n│   ├── Derived Columns: Pre-computed values\n│   ├── Table Splitting: Horizontal/Vertical partitioning\n│   └── Materialized Views: Pre-computed query results\n└── Design Decisions\n    ├── Normalize for: OLTP, data integrity, write-heavy\n    ├── Denormalize for: OLAP, read-heavy, analytics\n    └── Hybrid: Normalized source + denormalized views\n```\n\n## Episode 13: Database Indexing Mastery\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/13-indexing/) | [View Presentation](./episodes/13-indexing/presentation/)**\n\n### What You'll Learn:\n- What database indexes are and why they are critical for performance\n- B-Tree and B+Tree data structures that power most database indexes\n- The difference between clustered and non-clustered indexes\n- How composite indexes work and the leftmost prefix rule\n- What covering indexes are and when to use them\n- Different index types: Hash, Bitmap, Full-Text, and PostgreSQL-specific indexes\n- How to read and interpret execution plans with EXPLAIN\n- Index maintenance: fragmentation, VACUUM, and fill factors\n- Best practices for index design and avoiding common anti-patterns\n\n### Key Concepts Covered:\n```\nDatabase Indexing Fundamentals\n├── Index Basics\n│   ├── B-Tree/B+Tree: Self-balancing tree data structure\n│   ├── Trade-offs: Speed up reads, slow down writes\n│   └── Cost: 20-30% storage overhead per index\n├── Index Types\n│   ├── Clustered: Data physically sorted, one per table\n│   ├── Non-Clustered: Separate structure with row pointers\n│   ├── Composite: Multiple columns, leftmost prefix rule\n│   └── Covering: Query satisfied entirely from index\n├── Advanced Index Types\n│   ├── Hash: O(1) equality lookups, no range support\n│   ├── Bitmap: Low-cardinality, data warehousing\n│   ├── Full-Text: Inverted index for text search\n│   └── PostgreSQL: GiST, GIN, BRIN specialized indexes\n├── Index Design\n│   ├── Selectivity: Higher is better (\u003e 20% ideal)\n│   ├── Column Order: Equality first, range last\n│   ├── Functional Indexes: Index on expressions\n│   └── Partial Indexes: Filtered conditions\n├── Execution Plans\n│   ├── EXPLAIN: Read query execution strategy\n│   ├── Seq Scan: Full table scan (warning sign!)\n│   ├── Index Scan: Index used + table lookup\n│   └── Index Only Scan: Fastest (covering index)\n└── Maintenance\n    ├── Fragmentation: Internal and external\n    ├── VACUUM: Reclaim space, update statistics\n    ├── FILLFACTOR: Balance storage vs updates\n    └── Monitoring: pg_stat_user_indexes usage stats\n```\n\n## Episode 14: Sharding \u0026 Partitioning\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/14-sharding-partitioning/) | [View Presentation](./episodes/14-sharding-partitioning/presentation/)**\n\n### What You'll Learn:\n- The critical distinction between partitioning and sharding\n- Types of partitions: Range, List, Hash, and Composite\n- How to select optimal shard keys (cardinality, uniformity, query alignment)\n- Directory-based sharding and lookup strategies\n- Cross-shard operations: transactions, joins, and aggregations\n- Resharding strategies and zero-downtime migrations\n- Real-world sharding patterns from Vitess, CockroachDB, and MongoDB\n\n### Key Concepts Covered:\n```\nScaling Databases: Distribution Strategies\n├── Sharding vs Partitioning\n│   ├── Partitioning: Splitting within ONE server/cluster\n│   ├── Sharding: Splitting ACROSS multiple servers/nodes\n│   └── Shared-Nothing Architecture\n├── Partition Types\n│   ├── Range: Consecutive ranges (dates, numeric)\n│   ├── List: Specific values (regions, categories)\n│   ├── Hash: Uniform distribution via hash function\n│   └── Composite: Range-Hash, Range-List combinations\n├── Shard Key Selection\n│   ├── Cardinality: High cardinality required\n│   ├── Uniformity: Even data and access distribution\n│   └── Query Alignment: Target single shard when possible\n├── Cross-Shard Operations\n│   ├── Transactions: Two-Phase Commit (2PC)\n│   ├── Joins: Data colocation vs scatter-gather\n│   └── Aggregations: MapReduce-style distributed queries\n├── Resharding\n│   ├── Triggers: Growth, hotspots, capacity planning\n│   ├── Strategies: Split vs Migrate approaches\n│   └── Zero-Downtime: Dual-write, blue-green deployment\n└── Platforms\n    ├── Vitess: MySQL sharding middleware (YouTube)\n    ├── CockroachDB: Automatic distributed SQL\n    └── MongoDB: Document database sharding\n```\n\n## Episode 15: Database Replication \u0026 Leader Election\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/15-replication-leader-election/) | [View Presentation](./episodes/15-replication-leader-election/presentation/)**\n\n### What You'll Learn:\n- Database replication: Master-Slave patterns, binary logs, and GTIDs\n- Synchronous vs asynchronous replication trade-offs\n- Practical MySQL/MariaDB replication setup\n- Leader election: Bully, Ring, Raft, and Paxos algorithms\n- Distributed locking with leases and heartbeats\n- Split brain prevention and fencing tokens\n- Sharding to scale beyond single-master limitations\n\n### Key Concepts Covered:\n```\nDistributed Coordination: Replication and Consensus\n├── Replication Fundamentals\n│   ├── Master-Slave: Single writer, multiple readers\n│   ├── Binary Logs (MySQL) vs WAL (PostgreSQL)\n│   ├── GTIDs for position tracking\n│   └── Cascading replication\n├── Consistency Trade-offs\n│   ├── Synchronous: Strong consistency, high latency\n│   ├── Asynchronous: High performance, eventual consistency\n│   └── Semi-Synchronous: Hybrid approach\n├── Leader Election\n│   ├── Bully Algorithm: Highest ID wins\n│   ├── Ring Algorithm: Ring topology messaging\n│   └── Raft: Leader, Follower, Candidate states\n├── Consensus Protocols\n│   ├── Raft: Terms, heartbeats, log replication\n│   ├── Paxos: Prepare and Accept phases\n│   └── Real-world: etcd, Consul, Spanner\n├── Distributed Coordination\n│   ├── ZooKeeper: ZAB protocol, ephemeral znodes\n│   ├── etcd: Strong consistency, Kubernetes backbone\n│   ├── Consul: Service discovery, KV store\n│   └── Cloud primitives: Azure Blob leases, DynamoDB Lock\n├── Failure Scenarios\n│   ├── Split brain: Dual leader prevention\n│   ├── Fencing tokens for safe leader handoff\n│   └── Idempotency for duplicate handling\n└── Scaling Patterns\n    ├── Sharding: Partitioning across masters\n    └── Multi-shard operations and rebalancing\n```\n\n## Episode 16: CAP Theorem is Not Enough - The Truth About Distributed Systems\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/16-cap-theorem/) | [View Presentation](./episodes/16-cap-theorem/presentation/)**\n\n### What You'll Learn:\n- The real meaning of CAP theorem and why \"Pick Two\" is misleading\n- Why CA systems do not actually exist in distributed computing\n- The critical difference between CAP Consistency and ACID Consistency\n- How to use the PACELC framework for real-world system design\n- How to classify databases as PA/EL vs PC/EC\n- Practical decision framework for choosing the right consistency model\n\n### Key Concepts Covered:\n```\nDistributed Consistency Trade-offs\n├── CAP Theorem Fundamentals\n│   ├── Consistency: Linearizability (everyone sees same data)\n│   ├── Availability: Every request gets response\n│   ├── Partition Tolerance: Non-negotiable in distributed systems\n│   └── Reality: CA systems don't exist (P is mandatory)\n├── The CAP Blind Spot\n│   ├── Partitions are rare (1% of time)\n│   ├── CAP says nothing about normal operations\n│   └── Missing metric: Latency\n├── PACELC Framework\n│   ├── PAC: If Partition, choose Availability or Consistency\n│   ├── ELC: Else (no partition), choose Latency or Consistency\n│   └── Daniel Abadi (Yale), 2010\n├── Database Classifications\n│   ├── PA/EL: DynamoDB, Cassandra (Availability + Low Latency)\n│   ├── PC/EC: BigTable, HBase, Spanner (Consistency + Consistency)\n│   └── Tunable: Cosmos DB, MongoDB (adjustable per query)\n└── Practical Decision Framework\n    ├── Money involved? -\u003e PC/EC (accuracy critical)\n    ├── Speed is product? -\u003e PA/EL (availability critical)\n    └── Global distribution? -\u003e Expect partitions\n```\n\n## Episode 17: The OSI Model\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/17-osi-model/) | [View Presentation](./episodes/17-osi-model/presentation/)**\n\n### What You'll Learn:\n- The 7 layers of the OSI model and their specific responsibilities\n- How data flows through the network stack (encapsulation and decapsulation)\n- The role of each layer in real-world protocols (Ethernet, IP, TCP, HTTP)\n- How troubleshooting tools like ping, traceroute, and Wireshark map to layers\n- Common misconceptions about the OSI model in modern networking\n- How the TCP/IP model maps to the OSI model\n\n### Key Concepts Covered:\n```\nOSI Model Fundamentals\n├── 7 Layers Overview\n│   ├── Physical (Layer 1): Bits, cables, signals\n│   ├── Data Link (Layer 2): Frames, MAC addresses, switches\n│   ├── Network (Layer 3): Packets, IP addresses, routers\n│   ├── Transport (Layer 4): Segments, TCP/UDP, end-to-end\n│   ├── Session (Layer 5): Connections, auth, sync\n│   ├── Presentation (Layer 6): Format, encryption, compression\n│   └── Application (Layer 7): HTTP, DNS, user protocols\n├── Encapsulation Process\n│   ├── Data: Application message\n│   ├── Segment: Transport header (TCP/UDP)\n│   ├── Packet: Network header (IP)\n│   ├── Frame: Data Link header + trailer (Ethernet)\n│   └── Bits: Physical transmission\n├── Protocol Mapping\n│   ├── Layer 7: HTTP, DNS, SMTP, FTP\n│   ├── Layer 6: SSL/TLS, JPEG, ASCII\n│   ├── Layer 5: RPC, SQL sessions\n│   ├── Layer 4: TCP, UDP\n│   ├── Layer 3: IP, ICMP, ARP\n│   ├── Layer 2: Ethernet, MAC, Switch\n│   └── Layer 1: Cables, Hubs, Signals\n└── Modern Relevance\n    ├── OSI as troubleshooting framework\n    ├── TCP/IP as practical implementation\n    ├── Why layers still matter\n    └── Network virtualization and SDN\n```\n\n## Episode 18: TCP vs UDP\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/18-tcp-udp/) | [View Presentation](./episodes/18-tcp-udp/presentation/)**\n\n### What You'll Learn:\n- How TCP and UDP operate at the Transport Layer (Layer 4)\n- The fundamental differences between connection-oriented and connectionless protocols\n- TCP's reliability mechanisms: handshakes, sequencing, acknowledgments, and retransmissions\n- UDP's minimalist design and why it excels in real-time scenarios\n- The Head-of-Line Blocking problem and its impact on performance\n- How QUIC and HTTP/3 leverage UDP for modern web performance\n- Practical decision frameworks for choosing the right protocol\n\n### Key Concepts Covered:\n```\nTransport Layer Protocols: TCP vs UDP\n├── TCP Fundamentals\n│   ├── Connection-oriented: 3-way handshake\n│   ├── Reliable: ACKs, retransmissions\n│   ├── Ordered: Guaranteed packet sequencing\n│   ├── Flow control: Sliding window\n│   └── Congestion control: Adapts to network\n├── UDP Fundamentals\n│   ├── Connectionless: No handshake\n│   ├── Fast: Minimal 8-byte header\n│   ├── Stateless: No connection tracking\n│   └── Best-effort: No guarantees\n├── Key Differences\n│   ├── Overhead: TCP 20-60 bytes vs UDP 8 bytes\n│   ├── Latency: TCP handshake cost vs UDP immediate\n│   ├── Ordering: TCP blocks vs UDP independent\n│   └── Use cases: TCP web/email vs UDP gaming/streaming\n├── Head-of-Line Blocking\n│   ├── TCP: Lost packet 1 blocks 2, 3, 4\n│   ├── Impact: Jitter in real-time apps\n│   └── Solution: QUIC per-stream reliability\n├── Modern Evolution\n│   ├── QUIC: Reliability on UDP (HTTP/3)\n│   ├── Benefits: 0-RTT, no HoL blocking\n│   ├── Migration: Survives WiFi to 5G\n│   └── Adoption: 30%+ of web traffic\n└── Decision Framework\n    ├── TCP: Web, email, files (reliability first)\n    ├── UDP: Gaming, VoIP, streaming (speed first)\n    └── QUIC: Best of both worlds\n```\n\n## Episode 19: REST vs gRPC\n\n**[Watch the Video](http://youtube.com/@ThatNotesGuy) | [Read the Notes](./episodes/19-rest-grpc/) | [View Presentation](./episodes/19-rest-grpc/presentation/)**\n\n### What You'll Learn:\n- What RPC (Remote Procedure Call) is and how it enables distributed communication\n- The definition and constraints of REST as an architectural style\n- The Richardson Maturity Model and HATEOAS principles\n- What gRPC is and why Google open-sourced it in 2015\n- Protocol Buffers (Protobuf) as a binary serialization format\n- HTTP/2 vs HTTP/1.1 and the performance implications\n- The four types of gRPC: Unary, Server Streaming, Client Streaming, Bidirectional\n- When to choose REST vs gRPC based on your use case\n- Load balancing complexities with both approaches\n- Real-world code examples for both paradigms\n\n### Key Concepts Covered:\n```\nCommunication Patterns: REST vs gRPC\n├── RPC Foundation\n│   ├── Remote Procedure Call: Execute code on remote machine\n│   ├── Client Stub: Proxy that hides network complexity\n│   ├── Marshalling: Serializing parameters\n│   └── Network Transparency: Looks like local function call\n├── REST Architecture\n│   ├── Roy Fielding (2000), architectural style, NOT protocol\n│   ├── Constraints: Stateless, Cacheable, Uniform Interface\n│   ├── Everything is a Resource (identified by URI)\n│   ├── HTTP Methods: GET, POST, PUT, PATCH, DELETE\n│   └── Richardson Maturity Model: Levels 0-3 (HATEOAS)\n├── gRPC Framework\n│   ├── Google (2015), part of CNCF\n│   ├── HTTP/2 for transport (multiplexing, binary)\n│   ├── Protocol Buffers for serialization (binary, efficient)\n│   ├── Contract-First: Define .proto before coding\n│   └── Four RPC Types: Unary, Server, Client, Bidirectional\n├── Performance Comparison\n│   ├── Payload: Protobuf 50-70% smaller than JSON\n│   ├── Speed: Protobuf 3-10x faster parsing\n│   ├── Latency: HTTP/2 multiplexing reduces overhead\n│   └── Winner: gRPC 7-10x faster for internal services\n├── Protocol Buffers\n│   ├── Binary serialization, not human-readable\n│   ├── Field tags (1, 2, 3...) identify fields\n│   ├── Single source of truth in .proto files\n│   └── Schema evolution support\n├── HTTP/2 Advantages\n│   ├── Multiplexing: Multiple streams over single TCP\n│   ├── Header Compression (HPACK)\n│   ├── Binary framing (not text)\n│   └── Bidirectional communication\n├── gRPC RPC Types\n│   ├── Unary: Simple request/response (like REST)\n│   ├── Server Streaming: Request, stream responses\n│   ├── Client Streaming: Stream requests, single response\n│   └── Bidirectional: Both sides stream independently\n├── Decision Framework\n│   ├── REST: Browser clients, public APIs, simple CRUD\n│   ├── gRPC: Internal services, polyglot, streaming, low latency\n│   ├── Flowchart: Browser? -\u003e REST | Latency critical? -\u003e gRPC\n└── Challenges\n    ├── REST: Universally supported, easy debugging\n    ├── gRPC: Browser support requires proxy (gRPC-Web)\n    ├── Load Balancing: gRPC needs L7, HTTP/2 persistent connections\n    └── Debugging: gRPC requires deserialization tools\n```\n\n## Contributing\n\nWe love contributions! Here's how you can help make this course even better:\n\n- [Report bugs or issues](./CONTRIBUTING.md)\n- [Suggest new topics or improvements](./CONTRIBUTING.md)\n- [Improve documentation](./CONTRIBUTING.md)\n- [Create better diagrams](./CONTRIBUTING.md)\n- [Add more examples](./CONTRIBUTING.md)\n\n## Follow the Journey\n\n- **YouTube**: [Subscribe for new episodes](http://youtube.com/@ThatNotesGuy)\n- **LinkedIn**: [Connect with Harsh](https://www.linkedin.com/in/harshpreet931/)\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n\u003cdiv align=\"center\"\u003eMade with ❤️ by Harshpreet Singh\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharshpreet931%2Fsystemdesignnotes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharshpreet931%2Fsystemdesignnotes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharshpreet931%2Fsystemdesignnotes/lists"}