Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sebastianhaeni/srecon-dublin-2023
Notes taken during SREcon Dublin 2023
https://github.com/sebastianhaeni/srecon-dublin-2023
Last synced: about 12 hours ago
JSON representation
Notes taken during SREcon Dublin 2023
- Host: GitHub
- URL: https://github.com/sebastianhaeni/srecon-dublin-2023
- Owner: sebastianhaeni
- Created: 2023-10-15T14:03:48.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-15T14:05:17.000Z (over 1 year ago)
- Last Synced: 2023-10-16T15:21:41.543Z (about 1 year ago)
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Notes on [SREcon EMEA Dublin 2023](https://www.usenix.org/conference/srecon23emea) 2023
## Key Takeaways
- MLOps: Focus on monitoring outputs and user impact.
- Monitoring/SLOs: Utilize histograms for better analysis.
- SLOs: Maintain a balance between reliability, on-call health, and productivity with SLOs.
- Automation as a Team Player:
- Ensure automation processes are predictable and transparent.
- Avoid creating surprises with automation and maintain simplicity.
- FinOps: Use tools like OpenCost/Kubecost to analyze and optimize cloud resource usage.
- Profiling: Utilize tools like Parca for continuous profiling and analysis.## Favorite Quotes
> Looking at your average latency is like looking at the average temperature in the hospital.
> "Percentage availability" is for tourists.
> Instrument first, ask questions later.
> Try to build amplifiers and not prostheses.
## [Day 1](day1.md)
- Liz Rice's keynote highlighted the significance of eBPF technology.
- Symptom-based alerting for machine learning emphasized monitoring outputs, user impact, and input data distribution. Resources include a Google research paper and various metrics for monitoring ML systems.
- Reliability-enhancing procedures included steps such as service definition, SLOs, and operational response testing.
- Understanding QUIC vs HTTP/3 and its application mapping was also discussed, providing a GitHub resource for further exploration.
- Deploying and debugging HTTP/3 was briefly touched upon.## [Day 2](day2.md)
- SRE principles applied to cybersecurity highlighted the importance of SLOs for security.
- The significance of distributed tracing and the importance of statistics for engineers were discussed. Resources included explanations on histograms and SLO measurement tools.
- Incidents as a means to enhance reliability investment were emphasized, focusing on mapping system capabilities and understanding user context.## [Day 3](day3.md)
- Various talks covered topics such as OTel implementation, open-source observability, continuous profiling, Prometheus histograms, accountability engineering, and FinOps.
- The importance of automation being a good team player was stressed, citing the "Ironies of Automation" paper and the need to avoid creating surprises with automation.