Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dttung2905/kafka-in-production
:books: Tech blogs & talks by companies that run Kafka in production
https://github.com/dttung2905/kafka-in-production
distributed-systems kafka streaming
Last synced: 3 months ago
JSON representation
:books: Tech blogs & talks by companies that run Kafka in production
- Host: GitHub
- URL: https://github.com/dttung2905/kafka-in-production
- Owner: dttung2905
- License: mit
- Created: 2023-07-01T09:54:10.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-14T20:20:01.000Z (6 months ago)
- Last Synced: 2024-08-14T22:22:40.150Z (6 months ago)
- Topics: distributed-systems, kafka, streaming
- Homepage:
- Size: 88.9 KB
- Stars: 889
- Watchers: 19
- Forks: 72
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesomes - kafka in production
README
# kafka-in-production
![HitCount](http://hits.dwyl.com/dttung2905/kafka-in-production.svg)
![license](https://img.shields.io/github/license/dttung2905/kafka-in-production)
![stars](https://img.shields.io/github/stars/dttung2905/kafka-in-production)Curious to know how big companies are operating their kafka fleet in production? This might be the repo for you:
- **What** are the issues encountered when running kafka in production? đ
- **How** other organisations attempt to solve the issues? đ ď¸
- **Why** certain approaches are adopted over others? :balance_scale:
- **What** can we learn for our own use case?## Table of Contents
1. [Adobe](#adobe)
1. [Agoda](#agoda)
1. [Airbnb](#airbnb)
1. [Allegro](#allegro)
1. [Apple](#apple)
1. [AppsFlyer](#appsflyer)
1. [BigCommerce](#bigcommerce)
1. [Bitpanda](#bitpanda)
1. [Bloomberg](#bloomberg)
1. [Bolt](#bolt)
1. [Booking.com](#bookingcom)
1. [Brex](#brex)
1. [CERN](#cern)
1. [Cloudflare](#cloudflare)
1. [Cloudera](#cloudera)
1. [Coinbase](#coinbase)
1. [Criteo](#criteo)
1. [Datadog](#datadog)
1. [DoorDash](#doordash)
1. [Decathlon](#decathlon)
1. [Deliveroo](#deliveroo)
1. [GoTo](#goto)
1. [Grab](#grab)
1. [HelloFresh](#hellofresh)
1. [Honeycomb](#honeycomb)
1. [Hubspot](#hubspot)
1. [Indeed](#indeed)
1. [Klarna](#klarna)
1. [LinkedIn](#linkedin)
1. [Lyft](#lyft)
1. [Michelin](#Michelin)
1. [Monzo](#Monzo)
1. [Morgan Stanley](#morgan-stanley)
1. [Netflix](#netflix)
1. [New Relic](#new-relic)
1. [PayPal](#PayPal)
1. [Pinterest](#pinterest)
1. [Platformatory](#platformatory)
1. [Riskified](#riskified)
1. [Robinhood](#robinhood)
1. [Shopify](#shopify)
1. [Slack](#slack)
1. [Stripe](#stripe)
1. [Uber](#uber)
1. [Wise](#wise)
1. [Wix](#wix)
1. [Yelp](#yelp)
1. [Zalando](#zalando)
1. [Zendesk](#zendesk)
1. [Zopa Bank](#zopa-bank)## Adobe
- [How Adobe Experience Platform Pipeline Became the Cornerstone of In-Flight Processing for Adobe](https://blog.developer.adobe.com/how-adobe-experience-platform-pipeline-became-the-cornerstone-of-in-flight-processing-for-adobe-51c0e0a91521) - `2019` - :books:
- [Moving Beyond Newtonian Reductionism in the Management of Large-Scale Distributed Systems, Part 2](https://blog.developer.adobe.com/moving-beyond-newtonian-reductionism-in-the-management-of-large-scale-distributed-systems-part-2-35c3f91f96e3) - `2019` - :books:
- [Adobe Experience Platformâs Streaming Sources and Destinations Overview and Architecture](https://blog.developer.adobe.com/adobe-experience-platforms-streaming-sources-and-destinations-overview-and-architecture-ba0b4d3e7ded) - `2019` - :books:
- [Wins from Effective Kafka Monitoring at Adobe: Stability, Performance, and Cost Savings](https://blog.developer.adobe.com/wins-from-effective-kafka-monitoring-at-adobe-stability-performance-and-cost-savings-a3ecb701ee5b) - `2019` - :books:
- [Creating Adobe Experience Platform Pipeline with Kafka](https://blog.developer.adobe.com/creating-the-adobe-experience-platform-pipeline-with-kafka-4f1057a11ef) - `2018` - :books:## Agoda
- [How We Solve Load Balancing Challenges in Apache Kafka](https://medium.com/agoda-engineering/how-we-solve-load-balancing-challenges-in-apache-kafka-8cd88fdad02b) - `2024` - :books:
- [How Agoda manages 1.5 Trillion Events per day on Kafka](https://medium.com/agoda-engineering/how-agoda-manages-1-5-trillion-events-per-day-on-kafka-f0a27fc32ecb) - `2021` - :books:
- [Adding Time Lag to Monitor Kafka Consumer](https://medium.com/agoda-engineering/adding-time-lag-to-monitor-kafka-consumer-2c626fa61cfc) - `2021` - :books:
- [How our data scientists' petabytes of data is ingested into Hadoop (from Kafka)](https://medium.com/agoda-engineering/ingesting-petabytes-of-data-per-week-into-hadoop-from-kafka-457718cc308c) - `2021` - :books:## Airbnb
- [Migrating Kafka transparently between Zookeeper clusters](https://medium.com/airbnb-engineering/migrating-kafka-transparently-between-zookeeper-clusters-e68a75062f65) - `2021` - :books:
## Allegro
- [Unlocking Kafka's Potential: Tackling Tail Latency with eBPF](https://blog.allegro.tech/2024/03/kafka-performance-analysis.html) - `2024` - :books:
## Apple
- [Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Streaming Applications](https://www.confluent.io/events/kafka-summit-london-2024/leveraging-tiered-storage-in-strimzi-operated-kafka-for-cost-effective/) - `2024` - :studio_microphone:
- [Balance Kafka Cluster with Zero Data Movement](https://www.confluent.io/events/kafka-summit-london-2023/balance-kafka-cluster-with-zero-data-movement/) - `2023` - :studio_microphone:
- [Experiences Operating Apache KafkaÂŽ at Scale](https://www.confluent.io/kafka-summit-ny19/experiences-operating-apache-kafka-at-scale/) - `2019` - :studio_microphone:
- [Kafka as a Service A Tale of Security and Multi Tenancy](https://www.confluent.io/blog/rounding-up-kafka-summit-london-2018/) - `2018` - :studio_microphone:## AppsFlyer
- [Four Crucial Steps to Take Before Changing Kafka Partition Key at Scale](https://medium.com/appsflyerengineering/four-crucial-steps-to-take-before-changing-kafka-partition-key-at-scale-3c2e553c73b2) - `2023` - :books:
- [Kafka Lag Monitoring For Human Beings](https://www.confluent.io/resources/kafka-summit-2020/kafka-lag-monitoring-for-human-beings/) - `2020` - :studio_microphone:
- [Apache Kafka Lag Monitoring at AppsFlyer](https://www.confluent.io/blog/kafka-lag-monitoring-and-metrics-at-appsflyer/) - `2020` - :books:
- [Managing your Kafka in an explosive growth environment](https://www.youtube.com/watch?v=tjjeaCtsw_M) - `2019` - :studio_microphone:## BigCommerce
- [Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond](https://events.bizzabo.com/468544/agenda/session/1136931) - `2023` - :studio_microphone:
## Bitpanda
- [Bitpanda's new trade engine - Part #1 - asynchronous trading leveraging Kafka](https://blog.bitpanda.com/en/bitpandas-new-trade-engine-part-1) - `2023` - :books:
## Bloomberg
- [Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools](https://www.confluent.io/resources/kafka-summit-2020/fully-managed-multi-tenant-kafka-clusters-tips-tricks-and-tools/) - `2022` - :studio_microphone:
## Bolt
- [Using Apache Kafka and ksqlDB for Data Replication at Bolt](https://www.youtube.com/watch?v=ymx55BA8eQU&ab_channel=Confluent) - `2021` - :studio_microphone:
- [How Bolt Has Adopted Change Data Capture with Confluent Platform](https://www.confluent.io/blog/how-bolt-adopted-cdc-with-confluent-for-real-time-data-and-analytics/) - `2020` - :books:
- [Kewei Shang](https://medium.com/bolt-labs/streaming-vitess-at-bolt-f8ea93211c3f) - `2020` - :books:## Booking.com
- [Data Streaming Ecosystem Management at Booking.com](https://www.confluent.io/kafka-summit-sf18/data-streaming-ecosystem-management/) - `2018` - :books:
## Brex
- [Transactional Events Publishing At Brex](https://medium.com/brexeng/transactional-events-publishing-at-brex-66a5984f0726) - `2022` - :books:
## CERN
- [CERN IoT Kafka Pipelines](https://www.confluent.io/events/kafka-summit-london-2024/cern-iot-kafka-pipelines/) - `2024` - :studio_microphone:
## Cloudflare
- [All about Kafka](https://changelog.com/gotime/299) - `2024` - :studio_microphone:
- [Intelligent, automatic restarts for unhealthy Kafka consumers](https://blog.cloudflare.com/intelligent-automatic-restarts-for-unhealthy-kafka-consumers/) - `2023` - :books:
- [Using Apache Kafka to process 1 trillion inter-service messages](https://blog.cloudflare.com/using-apache-kafka-to-process-1-trillion-messages/) - `2022` - :books:## Cloudera
- [Using Streams Replication Manager Prefixless Replication for Kafka Topic Aggregation](https://blog.cloudera.com/using-streams-replication-manager-prefixless-replication-for-kafka-topic-aggregation/) - `2024` - :books:
- [Streams Replication Manager Prefixless Replication](https://blog.cloudera.com/streams-replication-manager-prefixless-replication-part-1/) - `2024` - :books:## Coinbase
- [Kafka infrastructure renovation at Coinbase](https://www.coinbase.com/blog/kafka-infrastructure-renovation) - `2022` - :books:
- [How we scaled data streaming at Coinbase using AWS MSK](https://www.coinbase.com/blog/how-we-scaled-data-streaming-at-coinbase-using-aws-msk) - `2021` - :books:## Criteo
- [Managing Kafka and Data Streams at Criteo](https://medium.com/criteo-engineering/managing-kafka-and-data-streams-at-criteo-566ffbfda6ba) - `2023` - :books:
- [Upgrading Kafka on a large infra, or: when moving at scale requires careful planning](https://medium.com/criteo-engineering/upgrading-kafka-on-a-large-infra-3ee99f56e970) - `2019` - :books:
- [How Criteo is managing one of the largest Kafka Infrastructure in Europe](https://www.slideshare.net/RicardoPaiva17/how-criteo-is-managing-one-of-the-largest-kafka-infrastructure-in-europe) - `2019` - :books:## Crowdstrike
- [Real-time Adaptive Controls for Kafka Consumers](https://current.confluent.io/2024-sessions/real-time-adaptive-controls-for-kafka-consumers) - `2024` - :studio_microphone:
## Datadog
- [Running Production Kafka Clusters in Kubernetes](https://www.confluent.io/kafka-summit-lon19/running-production-kafka-clusters-kubernetes/) - `2019` - :studio_microphone:
## Decathlon
- [Seamless data exchange with Kafka Connect and Strimzi on Kubernetes at Decathlon](https://medium.com/decathlondigital/seamless-data-exchange-with-kafka-connect-and-strimzi-on-kubernetes-at-decathlon-e6f81d034535) - `2024` - :books:
## Deliveroo
- [Improving Stream Data Quality With Protobuf Schema Validation](https://deliveroo.engineering/2019/02/05/improving-stream-data-quality-with-protobuf-schema-validation.html) - `2019` - :books:
## Doordash
- [DoorDash Empowers Engineers with Kafka Self-Serve](https://doordash.engineering/2024/08/13/doordash-engineers-with-kafka-self-serve/) - `2024` - :books:
- [API-First Approach to Kafka Topic Creation](https://doordash.engineering/2023/12/05/api-first-approach-to-kafka-topic-creation/) - `2023` - :books:
- [Building Scalable Real Time Event Processing with Kafka and Flink](https://doordash.engineering/2022/08/02/building-scalable-real-time-event-processing-with-kafka-and-flink/) - `2020` - :books:
- [Eliminating Task Processing Outages by Replacing RabbitMQ with Apache Kafka Without Downtime](https://doordash.engineering/2020/09/03/eliminating-task-processing-outages-with-kafka/) - `2020` - :books:## GoTo
- [Sink Kafka Messages to ClickHouse Using 'ClickHouse Kafka Ingestor'](https://blog.gojek.io/sink-kafka-messages-to-clickhouse-using-clickhouse-kafka-ingestor/) - `2022` - :books:
- [When Kafka Went Offshore](https://blog.gojek.io/when-kafka-went-offshore/) - `2021` - :books:
- [Enhancing Zigguratâ-âThe Backbone Of Gojek's Kafka Ecosystem](https://blog.gojek.io/enhancing-ziggurat-the-backbone-of-gojeks-kafka-ecosystem/) - `2021` - :books:
- [Handling Dead Letters in a Streaming System](https://blog.gojek.io/handling-dead-letters-in-a-streaming-system/) - `2020` - :books:
- [How Kafka Solved a Culture Problem at Gojek](https://blog.gojek.io/how-kafka-solved-a-culture-problem-at-gojek/) - `2019` - :books:
- [Fronting : An Armoured Car for Kafka Ingestion](https://blog.gojek.io/fronting-an-armoured-car-for-kafka-ingestion/) - `2018` - :books:
- [Sakaar: Taking Kafka data to cloud storage at GO-JEK](https://blog.gojek.io/sakaar-taking-kafka-data-to-cloud-storage-at-go-jek/) - `2018` - :books:## Grab
- [Kafka on Kubernetes: Reloaded for fault tolerance](https://engineering.grab.com/kafka-on-kubernetes) - `2023` - :books:
- [Zero trust with Kafka](https://engineering.grab.com/zero-trust-with-kafka) - `2022` - :books:
- [How Kafka Connect helps move data seamlessly](https://engineering.grab.com/kafka-connect) - `2022` - :books:
- [Exposing a Kafka Cluster via a VPC Endpoint Service](https://engineering.grab.com/exposing-kafka-cluster) - `2022` - :books:
- [Detect Fraud Successfully with GrabDefence!](https://www.confluent.io/events/kafka-summit-apac-2021/detect-fraud-successfully-with-grabdefence/) - `2021` - :studio_microphone:
- [Optimally Scaling Kafka Consumer Applications](https://engineering.grab.com/optimally-scaling-kafka-consumer-applications) - `2020` - :books:## HelloFresh
- [ProtoMock: Simple Kafka Testing by Generating Mock Data from Protobuf Schemas](https://engineering.hellofresh.com/simple-kafka-testing-by-generating-mock-data-from-protobuf-schemas-a1702abe1a8c) - `2023` - :books:
- [Renaming a Kafka topic](https://engineering.hellofresh.com/renaming-a-kafka-topic-d6ff3aaf3f03) - `2023` - :books:## Honeycomb
- [Scaling Telemetry Systems with Streaming](https://www.usenix.org/conference/srecon23americas/presentation/fong-jones) - `2023` - :studio_microphone:
- [Lessons Learned From the Migration to Confluent Kafka](https://www.honeycomb.io/blog/kafka-migration-lessons-learned) - `2021` - :books:
- [Scaling Kafka at Honeycomb](https://www.honeycomb.io/blog/scaling-kafka-observability-pipelines) - `2021` - :books:
- [Bitten by a Kafka Bug - Postmortem](https://www.honeycomb.io/blog/bitten-by-a-kafka-bug-postmortem) - `2019` - :books:## Hubspot
- [Our Journey to Multi-Region: Supporting Cross-Region Kafka Messaging](https://product.hubspot.com/blog/kafka-aggregation) - `2022` - :books:
## Indeed
- [Indeed Flex: The Story of a Revolutionary Recruitment Platform](https://events.bizzabo.com/468544/agenda/session/1136928) - `2023` - :studio_microphone:
## Klarna
- [Evolving a Real-time Fraud Barrier with Kafka](https://www.confluent.io/events/kafka-summit-london-2024/evolving-a-real-time-fraud-barrier-with-kafka/) - `2024` - :studio_microphone:
- [Load-balanced Brooklin Mirror Maker: Replicating large-scale Kafka clusters at LinkedIn](https://engineering.linkedin.com/blog/2022/load-balanced-brooklin-mirror-maker--replicating-large-scale-kaf) - `2022` - :books:
- [TopicGC: How LinkedIn cleans up unused metadata for its Kafka clusters](https://engineering.linkedin.com/blog/2022/topicgc_how-linkedin-cleans-up-unused-metadata-for-its-kafka-clu) - `2022` - :books:
- [How LinkedIn customizes Apache Kafka for 7 trillion messages per day](https://engineering.linkedin.com/blog/2019/apache-kafka-trillion-messages) - `2019` - :books:
- [URP? Excuse You! The Three Metrics You Have to Know](https://www.confluent.io/kafka-summit-london18/urp-excuse-you-the-three-metrics-you-have-to-know/) - `2018` - :studio_microphone:
- [Test Strategy for Samza/Kafka Services](https://engineering.linkedin.com/blog/2017/04/test-strategy-for-samza-kafka-services) - `2017` - :books:
- [Kafka Ecosystem at LinkedIn](https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin) - `2016` - :books:
- [Kafkaesque Days at LinkedIn â Part 1](https://engineering.linkedin.com/blog/2016/05/kafkaesque-days-at-linkedin--part-1) - `2016` - :books:
- [How Weâre Improving and Advancing Kafka at LinkedIn](https://engineering.linkedin.com/apache-kafka/how-we_re-improving-and-advancing-kafka-linkedin) - `2015` - :books:## Lyft
- [Evolution of Streaming Pipeline at Lyft](https://events.bizzabo.com/468544/agenda/session/1136878) - `2023` - :studio_microphone:
- [Building an Adaptive, Multi-Tenant Stream Bus with Kafka and Golang](https://eng.lyft.com/building-an-adaptive-multi-tenant-stream-bus-with-kafka-and-golang-5f1410bf2b40) - `2020` - :books:
- [Can Kafka Handle a Lyft Ride?](https://www.confluent.io/resources/kafka-summit-2020/can-kafka-handle-a-lyft-ride/) - `2020` - :studio_microphone:
- [Operating Apache Kafka Clusters 24/7 Without A Global Ops Team](https://eng.lyft.com/operating-apache-kafka-clusters-24-7-without-a-global-ops-team-417813a5ce70) - `2019` - :books:
- [Bulletproof Apache KafkaÂŽ with Fault Tree Analysis](https://www.confluent.io/kafka-summit-ny19/bulletproof-kafka-with-fault-tree-analysis/) - `2019` - :studio_microphone:
- [Production Ready Kafka on Kubernetes](https://www.confluent.io/kafka-summit-san-francisco-2019/production-ready-kafka-on-kubernetes/) - `2019` - :studio_microphone:## Monzo
- [How we built a queue on top of Kafka](https://monzo.com/blog/how-we-built-a-queue-on-top-of-kafka) - `2024` - :books:
## Michelin
- [Designing Kafka Streams Applications](https://blogit.michelin.io/dkafka-streams/) - `2024` - :books:
- [Contributing to open source software : AKHQ](https://blogit.michelin.io/michelin-loves-open-source-software-and-we-can-prove-it-2/) - `2024` - :books:
- [How to 'Kstreamplify' : your new way to develop Kafka Streams application](https://blogit.michelin.io/kstreamplify/) - `2023` - :books:
- [From Monolithic Orchestrator to Streaming with Microservices](https://www.confluent.io/events/kafka-summit-london-2023/from-monolithic-orchestrator-to-streaming-with-microservices/) - `2023` - :studio_microphone:
- [Migrate Applications from Kafka On-Premise to Confluent Cloud](https://blogit.michelin.io/migrate-your-applications-from-kafka-onprem-to-a-manage-service/) - `2022` - :books:
- [The Michelin Guide: an unexpected event driven use case](https://blogit.michelin.io/the-michelin-guide-an-unexpected-event-driven-use-case/) - `2022` - :books:
- [Moving from orchestration to choregraphy - Part 3](https://blogit.michelin.io/moving-from-orchestration-to-choreography-part-3/) - `2022` - :books:
- [Moving from orchestration to choregraphy - Part 2](https://blogit.michelin.io/moving-from-orchestration-to-choregraphy-part-2/) - `2021` - :books:
- [Moving from orchestration to choregraphy - Part 1](https://blogit.michelin.io/choregraphy-or-orchestration-thats-the-question/) - `2021` - :books:
- [âThe metamorphoseâ of our Information System by Implementing a distributed event streaming platform](https://blogit.michelin.io/the-metamorphose-of-our-information-system-by-implementing-a-distributed-event-streaming-platform/) - `2021` - :books:## Morgan Stanley
- [Consistent, High-throughput, Real-time Calculation Engines Using Kafka Streams](https://www.confluent.io/events/kafka-summit-london-2023/consistent-high-throughput-real-time-calculation-engines-using-kafka-streams/) - `2023` - :studio_microphone:
## Netflix
- [Self-Hosting Kafka at Scale: Netflix's Journey and Challenges](https://current.confluent.io/2024-sessions/self-hosting-kafka-at-scale-netflixs-journey-and-challenges) - `2024` - :studio_microphone:
- [Featuring Apache Kafka in the Netflix Studio and Finance World](https://www.confluent.io/blog/how-kafka-is-used-by-netflix/) - `2020` - :books:
- [Inca â Message Tracing and Loss Detection For Streaming Data @Netflix](https://netflixtechblog.medium.com/inca-message-tracing-and-loss-detection-for-streaming-data-netflix-de4836fc38c9) - `2019` - :books:
- [Evolution of the Netflix Data Pipeline](https://netflixtechblog.com/evolution-of-the-netflix-data-pipeline-da246ca36905) - `2016` - :books:
- [Kafka Inside Keystone Pipeline](https://netflixtechblog.com/kafka-inside-keystone-pipeline-dd5aeabaf6bb) - `2016` - :books:## New Relic
- [Scaling Data Ingestion: Overcoming Challenges with Cell Architecture](https://current.confluent.io/2024-sessions/scaling-data-ingestion-overcoming-challenges-with-cell-architecture) - `2024` - :studio_microphone:
- [Keep Your Kafka Cloud Costs in Check with Showbacks](https://www.confluent.io/events/kafka-summit-london-2024/keep-your-kafka-cloud-costs-in-check-with-showbacks/) - `2024` - :studio_microphone:
- [Tuning Apache Kafka Consumers to maximize throughput and reduce costs](https://newrelic.com/blog/how-to-relic/tuning-apache-kafka-consumers) - `2024` - :books:
- [20 best practices for Apache Kafka at scale](https://newrelic.com/blog/best-practices/kafka-best-practices) - `2018` - :books:
- [Using Apache Kafka for Real-Time Event Processing at New Relic](https://newrelic.com/blog/how-to-relic/apache-kafka-event-processing) - `2018` - :books:
- [Best practices and strategies for Kafka topic partitioning](https://newrelic.com/blog/best-practices/effective-strategies-kafka-topic-partitioning) - `2021` - :books:
- [AWS re:Invent 2020: How New Relic is migrating its Apache Kafka cluster to Amazon MSK](https://www.youtube.com/watch?v=Bod2yn16TXM) - `2021` - :studio_microphone:
- [New Relic case: "Huge scale, small clusters: Using Cells to scale in the Cloud"](https://www.youtube.com/watch?v=eMikCXiBlOA) - `2021` - :studio_microphone:
- [Monitoring Kafka without instrumentation using eBPF](https://archive.fosdem.org/2022/schedule/event/monitoring_kafka_using_ebpf/) - `2022` - :studio_microphone:
- [Key Metrics To Uncover the Root Cause of Kafka Performance Anomalies](https://www.confluent.io/events/current-2022/key-metrics-to-uncover-the-root-cause-of-kafka-performance-anomalies/) - `2022` - :studio_microphone:
- [Reducing Impact of Single Broker Failures in Kafka](https://www.confluent.io/events/kafka-summit-london-2023/reducing-impact-of-single-broker-failures-in-kafka/) - `2023` - :studio_microphone:
- [Go Big or Go Home: Approaching Kafka Replication at Scale](https://www.confluent.io/events/current/2023/go-big-or-go-home-approaching-kafka-replication-at-scale/) - `2023` - :studio_microphone:
- [Mitigating Kafka Broker âGrayâ Failures For Key Based Partitioners With Partition Multihoming](https://events.bizzabo.com/468544/agenda/session/1136888) - `2023` - :studio_microphone:
- [Monitoring Apache Kafka for cloud cost reduction](https://newrelic.com/blog/how-to-relic/monitoring-apache-kafka-for-cloud-cost-reduction) - `2023` - :books:## Paypal
- [Scaling Kafka to Support PayPalâs Data Growth](https://medium.com/paypal-tech/scaling-kafka-to-support-paypals-data-growth-a0b4da420fab) - `2023` - :books:
- [Scaling Kafka Consumer for Billions of Events](https://medium.com/paypal-tech/kafka-consumer-benchmarking-c726fbe4000) - `2021` - :books:
- [Marching Toward a Trillion Kafka Messages per Day: Running Kafka at scale at PayPal](https://www.confluent.io/resources/kafka-summit-2020/marching-toward-a-trillion-kafka-messages-per-day-running-kafka-at-scale-at-paypal/) - `2020` - :studio_microphone:- [Pinterest Tiered Storage for Apache KafkaÂŽď¸: A Broker-Decoupled Approach](https://medium.com/pinterest-engineering/pinterest-tiered-storage-for-apache-kafka-%EF%B8%8F-a-broker-decoupled-approach-c33c69e9958b) - `2024` - :books:
- [Pinterestâs Journey to a Automated, Efficient, and Low-Maintenance Kafka Platform](https://www.confluent.io/events/kafka-summit-london-2024/pinterests-journey-to-a-automated-efficient-and-low-maintenance-kafka/) - `2024` - :studio_microphone:
- [Lessons Learned from Running Apache Kafka at Scale at Pinterest](https://www.confluent.io/blog/running-kafka-at-scale-at-pinterest/) - `2021` - :books:
- [How Pinterest runs Kafka at scale](https://medium.com/pinterest-engineering/how-pinterest-runs-kafka-at-scale-ff9c6f735be) - `2018` - :books:
- [Open sourcing DoctorKafka: Kafka cluster healing and workload balancing](https://medium.com/pinterest-engineering/open-sourcing-doctorkafka-kafka-cluster-healing-and-workload-balancing-e51ad25b6b17) - `2017` - :books:## Platformatory
- [Kafka Latency Analyzer: Get Insights into Per-record, End-to-end Latency](https://events.bizzabo.com/468544/agenda/session/1136921) - `2023` - :studio_microphone:
## Riskified
- [How to Manage Schemas and Handle Standardization](https://medium.com/riskified-technology/how-riskified-manages-schemas-and-handles-standardization-fda9eb236e28) - `2023` - :books:
- [How to Roll Your Kafka Cluster With Zero Downtime and No Data Loss](https://medium.com/riskified-technology/how-to-roll-your-kafka-cluster-with-zero-downtime-and-no-data-loss-770fd0a35971) - `2023` - :books:
- [Know Your Limits: Cluster Benchmarks](https://medium.com/riskified-technology/know-your-limits-cluster-benchmarks-ecc6c3c77574) - `2022` - :books:
- [Letâs Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction](https://www.confluent.io/en-gb/events/kafka-summit-london-2022/lets-make-your-cfo-happy-a-practical-guide-for-kafka-cost-reduction/) - `2022` - :studio_microphone:
- [From AWS CloudFormation to Terraform: Migrating Apache Kafka](https://medium.com/riskified-technology/from-aws-cloudformation-to-terraform-migrating-apache-kafka-32bdabdbaa59) - `2021` - :books:## Robinhood
- [Robinhoodâs Kafka Journey from EC2 to Kubernetes](https://current.confluent.io/2024-sessions/robinhoods-kafka-journey-from-ec2-to-kubernetes) - `2024` - :studio_microphone:
- [Robinhoodâs Kafkaproxy: Decoupling Kafka Consumer Logic from Application Business Logic](https://events.bizzabo.com/468544/agenda/session/1136897) - `2023` - :studio_microphone:
- [Tackling Kafka, with a Small Team](https://www.confluent.io/kafka-summit-san-francisco-2019/tackling-kafka-with-a-small-team/) - `2019` - :studio_microphone:## Shopify
- [Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability](https://events.bizzabo.com/468544/agenda/session/1165883) - `2023` - :studio_microphone:
- [Capturing Every Change From Shopifyâs Sharded Monolith](https://shopify.engineering/capturing-every-change-shopify-sharded-monolith) - `2021` - :books:
- [Running Apache Kafka on Kubernetes at Shopify](https://shopify.engineering/running-apache-kafka-on-kubernetes-at-shopify) - `2018` - :books:
- [Kafka Producer Pipeline for Ruby on Rails](https://shopify.engineering/kafka-producer-pipeline-for-ruby-on-rails) - `2014` - :books:## Slack
- [Building Self-driving Kafka clusters using open source components](https://slack.engineering/building-self-driving-kafka-clusters-using-open-source-components/) - `2022` - :books:
- [Building Self-driving Kafka clusters using open source components](https://slack.engineering/building-self-driving-kafka-clusters-using-open-source-components/) - `2022` - :books:## Stripe
- [Mastering Kafka at Scale: Unleashing the Power of Temporal at Stripe | Replay 2023](https://www.youtube.com/watch?v=aF4SHzsxgSc) - `2023` - :studio_microphone:
- [6 Nines: How Stripe keeps Kafka highly-available across the globe](https://www.confluent.io/events/kafka-summit-london-2022/6-nines-how-stripe-keeps-kafka-highly-available-across-the-globe/) - `2022` - :studio_microphone:## Uber
- [Protobuf Support in Uber's Real-Time Data Stack](https://current.confluent.io/2024-sessions/protobuf-support-in-ubers-real-time-data-stack) - `2024` - :studio_microphone:
- [Topic Federation: Enhance Kafka Availabilty with Sharded Topics Across Clusters](https://current.confluent.io/2024-sessions/topic-federation-enhance-kafka-availabilty-with-sharded-topics-across-clusters) - `2024` - :studio_microphone:
- [Introduction to Kafka Tiered Storage at Uber](https://www.uber.com/en-GB/blog/kafka-tiered-storage/?uclick_id=ad416b56-ad9e-469e-9e47-edd1f5fd3ccd) - `2024` - :books:
- [Exactly-Once Stream Processing at Scale in Uber](https://www.confluent.io/events/kafka-summit-london-2024/exactly-once-stream-processing-at-scale-in-uber/) - `2024` - :studio_microphone:
- [Learnings of Running Kafka Tiered Storage at Scale](https://events.bizzabo.com/468544/agenda/session/1136841) - `2023` - :studio_microphone:
- [Securing KafkaÂŽ Infrastructure at Uber](https://www.uber.com/en-SG/blog/securing-kafka-infrastructure-at-uber/) - `2022` - :books:
- [Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot](https://www.uber.com/en-SG/blog/real-time-exactly-once-ad-event-processing/) - `2021` - :books:
- [Introducing uGroup: Uberâs Consumer Management Framework](https://www.uber.com/en-SG/blog/introducing-ugroup-ubers-consumer-management-framework/) - `2021` - :books:
- [Disaster Recovery for Multi-Region Kafka at Uber](https://www.uber.com/en-SG/blog/kafka/) - `2020` - :books:
- [Kafka Cluster Federation at Uber](https://www.confluent.io/kafka-summit-san-francisco-2019/kafka-cluster-federation-at-uber/) - `2019` - :studio_microphone:
- [Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka](https://www.uber.com/en-SG/blog/reliable-reprocessing/) - `2018` - :books:
- [Introducing Chaperone: How Uber Engineering Audits Apache Kafka End-to-End](https://www.uber.com/en-SG/blog/chaperone-audit-kafka-messages/) - `2016` - :books:
- [uReplicator: Uber Engineeringâs Robust Apache Kafka Replicator](https://www.uber.com/blog/ureplicator-apache-kafka-replicator/) - `2016` - :books:## Wise
- [Streaming Infrastructure at Wise](https://www.confluent.io/events/kafka-summit-london-2023/streaming-infrastructure-at-wise/) - `2023` - :studio_microphone:
- [Rack awareness in Kafka Streams](https://medium.com/wise-engineering/rack-awareness-in-kafka-streams-448d7e5225a3) - `2022` - :books:
- [Teamwork: Implementing a Kafka retry strategy at Wise](https://medium.com/wise-engineering/teamwork-implementing-a-kafka-retry-strategy-at-wise-82e0887e243b) - `2021` - :books:
- [Running Kafka in Kubernetes, Part 1: Why we migrated our Kafka clusters to Kubernetes.](https://medium.com/wise-engineering/running-kafka-in-kubernetes-part-1-why-we-migrated-our-kafka-clusters-to-kubernetes-722101a2e751) - `2021` - :books:
- [Running Kafka in Kubernetes, Part 2: How we migrated our Kafka clusters to Kubernetes.](https://medium.com/wise-engineering/running-kafka-in-kubernetes-part-2-how-we-migrated-our-kafka-clusters-to-kubernetes-69174cea1559) - `2021` - :books:
- [Securing Kafka with SPIFFE at TransferWise - Jonathan Oddy, Levani Kokhreidze](https://www.youtube.com/watch?v=4pfY0uFW7yk&ab_channel=CNCF%5BCloudNativeComputingFoundation%5D) - `2020` - :studio_microphone:
- [Achieving high availability with stateful Kafka Streams applications](https://medium.com/wise-engineering/achieving-high-availability-with-stateful-kafka-streams-applications-cba429ca7238) - `2018` - :books:## Wix
- [4 Steps for Kafka Rebalance - Notes From the Field](https://www.wix.engineering/post/4-steps-for-kafka-rebalance-notes-from-the-field) - `2021` - :books:
- [Wixâs Journey Into Data Streams](https://www.wix.engineering/post/wix-s-journey-into-data-streams) - `2021` - :books:
- [Building a High-level SDK for Kafka: Greyhound Unleashed](https://www.wix.engineering/post/building-a-high-level-sdk-for-kafka-greyhound-unleashed) - `2020` - :books:## Yelp
- [Kafka on PaaSTA: Running Kafka on Kubernetes at Yelp (Part 2 - Migration)](https://engineeringblog.yelp.com/2022/03/kafka-on-paasta-part-two.html) - `2022` - :books:
- [Kafka on PaaSTA: Running Kafka on Kubernetes at Yelp (Part 1 - Architecture)](https://engineeringblog.yelp.com/2021/12/kafka-on-paasta-part-one.html) - `2021` - :books:
- [Streams and Monk â How Yelp is Approaching Kafka in 2020](https://engineeringblog.yelp.com/2020/01/streams-and-monk-how-yelp-approaches-kafka-in-2020.html) - `2020` - :books:
- [Billions of Messages a Day â Yelpâs Real-time Data Pipeline](https://www.confluent.io/es-es/kafka-summit-nyc17/billions-messages-day-yelps-real-time-data-pipeline/) - `2017` - :studio_microphone:## Zalando
- [Rock Solid Kafka and ZooKeeper Ops on AWS](https://engineering.zalando.com/posts/2018/01/rock-solid-kafka.html) - `2018` - :books:
- [Many-to-Many Relationships Using Kafka](https://engineering.zalando.com/posts/2018/05/many-to-many-using-kafka.html) - `2018` - :books:
- [Event First Development - Moving Towards Kafka Pipeline Applications](https://engineering.zalando.com/posts/2017/10/event-first-development---moving-towards-kafka-pipeline-applications.html) - `2017` - :books:
- [Reattaching Kafka EBS in AWS](https://engineering.zalando.com/posts/2017/10/reattaching-kafka-ebs-in-aws.html) - `2017` - :books:
- [Real-time Ranking with Apache Kafkaâs Streams API](https://engineering.zalando.com/posts/2017/11/real-time-ranking-kafka.html) - `2017` - :books:
- [Running Kafka Streams applications in AWS](https://engineering.zalando.com/posts/2017/11/running-kafka-streams-applications-aws.html) - `2017` - :books:
- [A Recipe for Kafka Lag Monitoring](https://engineering.zalando.com/posts/2017/12/recipe-for-kafka-lag-monitoring.html) - `2017` - :books:
- [Surviving Data Loss](https://engineering.zalando.com/posts/2017/12/backing-up-kafka-zookeeper.html) - `2017` - :books:## Zendesk
- [No Access Denied: Our Transition to Kafka ACLs](https://zendesk.engineering/no-access-denied-our-transition-to-kafka-acls-5905d29fb7cf) - `2024` - :books:
- [Seamless Transition: Migrating Kafka Cluster to Kubernetes](https://medium.com/zendesk-engineering/seamless-transition-migrating-kafka-cluster-to-kubernetes-c8dc66594d1b) - `2024` - :books:
- [Kafka: Automating Root CA rotation with Vault](https://zendesk.engineering/kafka-automating-root-ca-rotation-with-vault-9bbbe07c7c6e) - `2023` - :books:
- [Implementing mTLS and Securing Apache Kafka at Zendesk](https://zendesk.engineering/implementing-mtls-and-securing-apache-kafka-at-zendesk-10f309db208d) - `2021` - :books:
- [An investigation into Kafka Log Compaction](https://zendesk.engineering/an-investigation-into-kafka-log-compaction-5e520f4291f0) - `2020` - :books:
- [Kafka on Ruby](https://zendesk.engineering/kafka-on-ruby-fdab12302146) - `2020` - :books:
- [Create a test data generator using Kafka Connect](https://zendesk.engineering/create-a-test-data-generator-using-kafka-connect-f0a2419af76a) - `2018` - :books:## Zopa Bank
- [Highly Available Kafka Consumers and Kafka Streams on Kubernetes](https://www.confluent.io/events/kafka-summit-london-2023/highly-available-kafka-consumers-and-kafka-streams-on-kubernetes/) - `2023` - :studio_microphone: