https://github.com/shantoroy/site-reliability-engineering-101
This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.
https://github.com/shantoroy/site-reliability-engineering-101
100daysofcode alerting automation chaos-engineering devops devsecops monitoring reliability-engineering service-level-agreement service-level-indicator service-level-objective site-reliability-engineering sre
Last synced: 3 months ago
JSON representation
This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.
- Host: GitHub
- URL: https://github.com/shantoroy/site-reliability-engineering-101
- Owner: shantoroy
- Created: 2023-04-14T17:14:19.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-03-23T15:25:57.000Z (about 1 year ago)
- Last Synced: 2025-12-30T14:34:33.890Z (6 months ago)
- Topics: 100daysofcode, alerting, automation, chaos-engineering, devops, devsecops, monitoring, reliability-engineering, service-level-agreement, service-level-indicator, service-level-objective, site-reliability-engineering, sre
- Homepage: https://medium.com/@shantoroy/learning-about-site-reliability-engineering-with-the-100daysofsre-challenge-66380323c0d1
- Size: 31.3 KB
- Stars: 11
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# #100daysofSRE - Site Reliability Engineering Notes (SRE-101)
I have worked as a Site Reliability Engineer (SRE) at Charles Schwab since 2024. Here, I plan to take the `#100dayschallenge` to note important SRE topics and resources.
I have planned the [contents for next 100 days](https://medium.com/@shantoroy/learning-about-site-reliability-engineering-with-the-100daysofsre-challenge-66380323c0d1), and I will be posting blog posts under the hashtag `#100daysofSRE`. ✌️
## Blog Posts
1. [#100daysofSRE (Day 01): Introduction to Site Reliability Engineering](https://shantoroy.com/sre/intro-to-site-reliability-engineering/)
2. [#100daysofSRE (Day 02): History of SRE and its Evolution](https://shantoroy.com/sre/site-reliability-engineering-history-&-evolution/)
3. [#100daysofSRE (Day 03): SLAs, SLOs, and SLIs — understanding the metrics of reliability](https://shantoroy.com/sre/sla-slo-sli-metrics-of-sre/)
4. [#100daysofSRE (Day 04): Chaos Engineering and SRE - Techniques and Tools to Break Things on Purpose](https://shantoroy.com/sre/chaos-engineering-techniques-and-tools-for-sre/)
5. [#100daysofSRE (Day 05): Automation Benefits, Techniques, and Tools in SRE](https://shantoroy.com/sre/automation-benefits-techniques-and-tools-in-SRE/)
6. [#100daysofSRE (Day 06): Incident Management and Response for Site Reliability Engineers](https://shantoroy.com/sre/incident-management-and-response-for-site-reliability-engineers/)
7. [#100daysofSRE (Day 07): Effective Communication during Incidents for Better Incident Response](https://shantoroy.com/sre/effective-communication-for-better-incident-response/)
8. [#100daysofSRE (Day 08): Root Cause Analysis and Post-Incident Reviews for SRE](https://shantoroy.com/sre/root-cause-analysis-and-post-incident-reviews/)
9. [#100daysofSRE (Day 09): Monitoring and Observability in SRE](https://shantoroy.com/sre/monitoring-and-observability-in-sre/)
10. [#100daysofSRE (Day 10): Grafana vs Splunk for Monitoring System and Applications](https://shantoroy.com/sre/grafana-vs-splunk-for-system-and-application-monitoring/)
11. [#100daysofSRE (Day 11): Logging and Log Analysis in Site Reliability Engineering- Techniques, Tools, and Best Practices](https://shantoroy.com/sre/logging-and-log-analysis-for-site-reliability-engineering/)
12. [#100daysofSRE (Day 12): Alerting and Notification Strategies and Best Practices in SRE](https://shantoroy.com/sre/alerting-and-notification-strategies-in-site-reliability-engineering/)
13. [#100daysofSRE (Day 13): Capacity Planning and Management in Site Reliability Engineering](https://shantoroy.com/sre/capacity-planning-and-management-in-sre/)
14. [#100daysofSRE (Day 14): Load Testing and Stress Testing in Site Reliability Engineering](https://shantoroy.com/sre/load-and-stress-testing-in-sre/)
15. [#100daysofSRE (Day 15): Disaster Recovery Planning and Testing in SRE](https://shantoroy.com/sre/disaster-recovery-planning-and-testing-in-sre/)
16. [#100daysofSRE (Day 16): High Availability and Redundancy Strategies for Data](https://shantoroy.com/sre/high-availability-and-redundancy-strategies-in-sre/)
17. [#100daysofSRE (Day 17): Techniques, Tools, and Best Practices for Performance Optimization and Tuning in Site Reliability Engineering](https://shantoroy.com/sre/performance-optimization-and-tuning-in-sre/)
18. [#100daysofSRE (Day 18): 25 Intermediate-level Linux Commands useful for SysAdmin, DevOps, and SRE](https://shantoroy.com/sre/top-25-intermediate-linux-commands-for-sysadmin-devops-sre/)
19. [#100daysofSRE (Day 19): Simplifying Log Analysis with Linux Sed Command: Basic and Templates](https://shantoroy.com/sre/sed-linux-command-for-log-extraction-and-analysis/)
20. [#100daysofSRE (Day 20): Simplifying Log Analysis with Linux awk Command: Basic and Templates](https://shantoroy.com/sre/awk-linux-command-for-log-extraction-and-analysis/)
21. [#100daysofSRE (Day 21): How to use Supervisor to manage a script on Linux](https://shantoroy.com/sre/supervisor-program-running-in-linux/)
22. [#100daysofSRE (Day 22): Essential /var/log Files for SREs and How to Analyze Them](https://shantoroy.com/sre/important-linux-log-files-for-troubleshooting-SRE-issues/)
23. [#100daysofSRE (Day 23): Modernize and Containerize your Applications or Microservices using Docker](https://shantoroy.com/sre/docker-is-gamechanger-write-dockerfile-how-to/)
24. [#100daysofSRE (Day 24): Writing a Dockerfile – Best Practices & Enhancements](https://shantoroy.com/sre/writing-dockerfile-best-practices-and-enhancements/)
25. [#100daysofSRE (Day 25): Writing a Production-Grade Dockerfile for Legacy Applications](https://shantoroy.com/sre/writing-production-grade-dockerfile-for-legacy-applications/)
26. [#100daysofSRE (Day 26): Docker Compose - Simplifying Multi-Container Deployments](https://shantoroy.com/sre/multi-container-deployment-using-docker-compose/)
27. [#100daysofSRE (Day 27): Building a Hacking Lab with Docker Compose](https://shantoroy.com/sre/build-hacking-lab-using-docker-compose/)
28. [#100daysofSRE (Day 28): Deploying an AI Chatbot with Docker Compose](https://shantoroy.com/sre/building-a-genai-chatbot-using-docker-compose/)
29. [#100daysofSRE (Day 29): Kubernetes over Docker-compose – Why It’s Better for Production](https://shantoroy.com/sre/kubernetes-for-production-grade-applications/)
30. [#100daysofSRE (Day 30): Learn Kubernetes Commands and Operations using Minikube](https://shantoroy.com/kubernetes/learn-kubernetes-commands-operations-using-minikube/)
31. [#100DaysOfSRE (Day 31): How to Write Kubernetes Manifest Files: Kubernetes vs Docker-Compose](https://shantoroy.com/kubernetes/how-to-write-kubernetes-manifest-files/)
32. [#100DaysOfSRE (Day 32): Advanced Kubernetes: Ingress, ConfigMaps, Secrets & Helm](https://shantoroy.com/kubernetes/advanced-kubernetes-ingress-configmap-helm/)
33. [#100DaysOfSRE (Day 33): Monitoring Kubernetes Apps with Prometheus & Grafana](https://shantoroy.com/kubernetes/kubernetes-monitoring-with-grafana-prometheus/)
34. [#100DaysOfSRE (Day 34): Automating Kubernetes Deployments with ArgoCD & GitOps](https://shantoroy.com/kubernetes/kubernetes-deployment-with-argocd-gitops/)
35. [#100DaysOfSRE (Day 35): Kubernetes CI/CD Pipeline with GitHub Actions & ArgoCD](https://shantoroy.com/kubernetes/kubernetes-ci-cd-with-github-actions-argocd/)
36. [#100DaysOfSRE (Day 36): Kubernetes Helm Charts – Package & Deploy Applications](https://shantoroy.com/kubernetes/kubernetes-helm-charts-to-package-deploy-app/)
## YouTube Channels for SREs
1. [TechWorld with Nana](https://www.youtube.com/@TechWorldwithNana)
2. [Anton Putra](https://www.youtube.com/@AntonPutra)
3. [freeCodeCamp.org](https://www.youtube.com/@freecodecamp)
4. [Professor Messer](https://www.youtube.com/@professormesser)
5. [Google Cloud Tech](https://www.youtube.com/@googlecloudtech)
6. [IBM Technology](https://www.youtube.com/@IBMTechnology)
7. [ByteByteGo](https://www.youtube.com/@ByteByteGo)
8. [Fireship](https://www.youtube.com/@Fireship)
9. [NetworkChuck](https://www.youtube.com/@NetworkChuck)
10. [Tech With Soleyman](https://www.youtube.com/@techwithsoleyman)
11. [ByteMonk](https://www.youtube.com/@ByteMonk)
12. [Christian Lempa](https://www.youtube.com/@christianlempa)
13. [David Ondrej](https://www.youtube.com/@DavidOndrej)
14. [DevOps Journey](https://www.youtube.com/@DevOpsJourney)