Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

SRE

Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.

https://github.com/bregman-arie/devops-exercises

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

ansible aws azure coding containers devops docker git interview interview-questions kubernetes linux openstack production-engineer prometheus python sql sre terraform

Last synced: 20 Jan 2025

https://github.com/upgundecha/howtheysre

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

alerting chaos-engineering dev-ops devops hacktoberfest hacktoberfest-accepted incident-management incident-response infrastructure ml-ops monitoring observability on-call post-mortem reliability security site-reliability-engineering software-engineering sre sre-culture

Last synced: 21 Jan 2025

https://github.com/linkedin/school-of-sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

git hadoop linux mysql networking nosql python security sre system-design

Last synced: 21 Jan 2025

https://github.com/runatlantis/atlantis

Terraform Pull Request Automation

atlantis automation devops go golang hacktoberfest sre tacos terraform

Last synced: 20 Jan 2025

https://github.com/mxssl/sre-interview-prep-guide

Site Reliability Engineer Interview Preparation Guide

interview-preparation preparation site-reliability-engineer sre sre-interview study

Last synced: 05 Dec 2024

https://github.com/isno/thebytebook

⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue

cloud-native container devops distributed-systems finops kubernetes networking paas paxos raft service-mesh sre

Last synced: 21 Jan 2025

https://github.com/isno/theByteBook

⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue

cloud-native container devops distributed-systems finops kubernetes networking paas paxos raft service-mesh sre

Last synced: 02 Nov 2024

https://github.com/hjacobs/kubernetes-failure-stories

Compilation of public failure/horror stories related to Kubernetes

failures incidents kubernetes post-mortem postmortem production-engineering reliability sre

Last synced: 17 Jan 2025

https://github.com/stackstorm/st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html

auto-remediation automation chatops cicd deployment devops ifttt python sre st2 stackstorm workflows

Last synced: 20 Jan 2025

https://github.com/StackStorm/st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html

auto-remediation automation chatops cicd deployment devops ifttt python sre st2 stackstorm workflows

Last synced: 28 Oct 2024

https://github.com/k8sgpt-ai/k8sgpt

Giving Kubernetes Superpowers to everyone

ai devops kubernetes llama openai sre tooling

Last synced: 20 Jan 2025

https://github.com/rundeck/rundeck

Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts

ansible audit automation category-distributed deployment devops devops-team devops-tools hacktoberfest java operations ops orchestration runbook rundeck scheduler sre

Last synced: 21 Jan 2025

https://github.com/jonmosco/kube-ps1

Kubernetes prompt info for bash and zsh

bash containers kubectl kubernetes kubernetes-helper prompts sre zsh

Last synced: 21 Jan 2025

https://github.com/leandromoreira/cdn-up-and-running

CDN Up and Running - Building a CDN from Scratch to Learn about CDN, Nginx, Lua, Prometheus, Grafana, Load balancing, and Containers.

cdn docker-compose grafana load-balancer lua luajit nginx openresty prometheus sre tutorial wrk

Last synced: 17 Jan 2025

https://github.com/bregman-arie/sre-checklist

A checklist of anyone practicing Site Reliability Engineering

automation checklist gitops kubernetes reliability-engineering sre terraform

Last synced: 17 Jan 2025

https://github.com/alibaba/sreworks

Cloud Native DataOps & AIOps Platform | 云原生数智运维平台

aiops application cloudnative dataops devops engineering flink k8s kubernetes maintenance oam operation ops saas sre

Last synced: 17 Jan 2025

https://github.com/alibaba/SREWorks

Cloud Native DataOps & AIOps Platform | 云原生数智运维平台

aiops application cloudnative dataops devops engineering flink k8s kubernetes maintenance oam operation ops saas sre

Last synced: 30 Oct 2024

https://github.com/chame1eon/jnitrace

A Frida based tool that traces usage of the JNI API in Android apps.

android frida jni jni-api reverse-engineering sre tracer

Last synced: 17 Jan 2025

https://github.com/google/cloudprober

[Moved to cloudprober/cloudprober] An active monitoring software to detect failures before your customers do.

blackbox cloud cloudprober devops distributed-monitoring gcp golang google grafana k8s kubernetes monitoring observability ping probe prober prometheus sre stackdriver

Last synced: 22 Jan 2025

https://github.com/liquanzhou/ops_doc

运维简洁实用手册

ops python shell sre

Last synced: 16 Jan 2025

https://github.com/briefercloud/layerform

Layerform helps engineers create reusable environment stacks using plain .tf files. Ideal for multiple "staging" environments.

dev-environment developer-tools devops platform-engineering sre terraform

Last synced: 19 Jan 2025

https://github.com/idoavrah/terraform-tui

Terraform textual UI

devops iac productivity sre terraform tui

Last synced: 16 Jan 2025

https://github.com/unixorn/git-extra-commands

A collection of git utilities, useful extra git scripts, tutorials and other useful articles.

antigen bash collection devops devops-tools git hacktoberfest oh-my-zsh oh-my-zsh-plugin prezto shell-script shell-scripts sre zgenom zsh-plugin zsh-plugins

Last synced: 19 Jan 2025

https://github.com/azure/caf-terraform-landingzones

This solution, offered by the Open-Source community, will no longer receive contributions from Microsoft. Customers are encouraged to transition to Microsoft Azure Verified Modules for continued support and updates from Microsoft. Please note, this repository is scheduled for decommissioning and will be removed on July 1, 2025.

azure azure-resource-manager devops enterprise platform platform-engineering sre terraform

Last synced: 30 Sep 2024

https://github.com/Azure/caf-terraform-landingzones

This solution, offered by the Open-Source community, will no longer receive contributions from Microsoft. Customers are encouraged to transition to Microsoft Azure Verified Modules for continued support and updates from Microsoft. Please note, this repository is scheduled for decommissioning and will be removed on July 1, 2025.

azure azure-resource-manager devops enterprise platform platform-engineering sre terraform

Last synced: 13 Nov 2024

https://github.com/jetstack/version-checker

Kubernetes utility for exposing image versions in use, compared to latest available upstream, as metrics.

docker gcr go grafana grafana-dashboard image kubernetes prometheus quay sre utility version

Last synced: 16 Jan 2025

https://github.com/upgundecha/howtheyaws

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world use Amazon Web Services (AWS)

amazon-web-services automation aws cloud cloud-computing cloud-native devops hacktoberfest hacktoberfest-accepted hacktoberfest2021 infrastructure-as-code sre

Last synced: 22 Jan 2025

https://github.com/kaytu-io/kaytu

The Kaytu CLI improves the efficiency of cloud workloads by analyzing historical usage and providing tailored recommendations, such as changing instance sizes. This ensures you only pay for the resources you actually need without compromising stability.

cloud-optimization cloud-spend rightsizing sre workload-optimization

Last synced: 04 Nov 2024

https://github.com/ryan4yin/knowledge

(Chinese Only)Everything I know: DevOps & CloudNative, Linux, Embedded, Homelab, Music, Blockchain, AI, etc...

container devops devops-notes embedded kubernetes linux music sre

Last synced: 19 Jan 2025

https://github.com/squzy/squzy

Squzy - is a high-performance open-source monitoring, incident and alert system written in Golang with Bazel and love. Welcome to free SRE

bazel docker golang grpc monitoring opensource opensource-monitoring prometheus sitemap sre zabbix

Last synced: 28 Oct 2024

https://github.com/dingguodong/linuxbashshellscriptforops

Linux Bash Shell Script and Python Script For Ops and Devops

automation bash-script devops linux ops python-script repository-python repository-shell sre

Last synced: 17 Jan 2025

https://github.com/teivah/sre-roadmap

An Opinionated Roadmap to Become an SRE (Concepts > Tools)

roadmap sre

Last synced: 16 Dec 2024

https://github.com/googlecloudplatform/cloud-ops-sandbox

Cloud Operations Sandbox is an open source collection of tools that helps practitioners to learn O11y and R9y practices from Google and apply them using Cloud Operations suite of tools.

cloud cloud-native cloud-operations cloudops devops google-cloud opencensus opentelemetry operations ops-management profiler samples sre stackdriver stackdriver-logs stackdriver-monitoring stackdriver-sandbox stackdriver-trace

Last synced: 19 Jan 2025

https://github.com/GoogleCloudPlatform/cloud-ops-sandbox

Cloud Operations Sandbox is an open source collection of tools that helps practitioners to learn O11y and R9y practices from Google and apply them using Cloud Operations suite of tools.

cloud cloud-native cloud-operations cloudops devops google-cloud opencensus opentelemetry operations ops-management profiler samples sre stackdriver stackdriver-logs stackdriver-monitoring stackdriver-sandbox stackdriver-trace

Last synced: 03 Nov 2024

https://github.com/rishiloyola/sre-interviews

Curated list of good SRE interview questions.

interview interview-preparation site-reliability-engineering sre

Last synced: 11 Dec 2024

https://github.com/mehrdadrad/tcpprobe

Modern TCP tool and service for network performance observability.

docker http http2 https k8s kubernetes monitoring observability probe socket sre tcp

Last synced: 15 Jan 2025

https://github.com/ozontech/file.d

A blazing fast tool for building data pipelines: read, process and output events. Our community: https://t.me/file_d_community

actions clickhouse elasticsearch events file gelf go http input json kafka logs observability output pipeline processing reading sre throttle tracing

Last synced: 18 Jan 2025

https://github.com/k8sgpt-ai/k8sgpt-operator

Automatic SRE Superpowers within your Kubernetes cluster

devops kubernetes openai sre tooling

Last synced: 19 Jan 2025

https://github.com/waltenne/guiadevopsbrasil

Repositório para compartilhamento de conteúdo Gratuito sobre DevOps

devops devops-tools docker documentation iac linux python sre terraform windows

Last synced: 31 Oct 2024

https://github.com/notharshhaa/into-the-devops

𝖫𝗂𝗇𝗎𝗑, 𝖩𝖾𝗇𝗄𝗂𝗇𝗌, 𝖠𝖶𝖲, 𝖲𝖱𝖤, 𝖯𝗋𝗈𝗆𝖾𝗍𝗁𝖾𝗎𝗌, 𝖣𝗈𝖼𝗄𝖾𝗋, 𝖯𝗒𝗍𝗁𝗈𝗇, 𝖠𝗇𝗌𝗂𝖻𝗅𝖾, 𝖦𝗂𝗍, 𝖪𝗎𝖻𝖾𝗋𝗇𝖾𝗍𝖾𝗌, 𝖳𝖾𝗋𝗋𝖺𝖿𝗈𝗋𝗆, 𝖮𝗉𝖾𝗇𝖲𝗍𝖺𝖼𝗄, 𝖲𝖰𝖫, 𝖭𝗈𝖲𝖰𝖫, 𝖠𝗓𝗎𝗋𝖾, 𝖦𝖢𝖯, 𝖣𝖭𝖲, 𝖤𝗅𝖺𝗌𝗍𝗂𝖼, 𝖭𝖾𝗍𝗐𝗈𝗋𝗄, 𝖵𝗂𝗋𝗍𝗎𝖺𝗅𝗂𝗓𝖺𝗍𝗂𝗈𝗇. 𝖣𝖾𝗏𝖮𝗉𝗌 𝖨𝗇𝗍𝖾𝗋𝗏𝗂𝖾𝗐 𝖰𝗎𝖾𝗌𝗍𝗂𝗈𝗇𝗌

ansible aws azure coding containers devops docker git interview-preparation interview-questions kubernetes linux openstack production-engineer prometheus python sql sre terraform

Last synced: 21 Jan 2025

https://github.com/actionjack/so-you-want-to-onboard-a-devops-engineer

Guidance on how to make your environment easier to onboard for Web Ops Engineers, SRE's and DevOps Practitioners

culture devops devops-practitioners mentoring onboard ops-engineers sre starters

Last synced: 17 Nov 2024

https://github.com/steve-mt/awesome-slo

Curated list of resources on SLOs

awesome awesome-list sli slo sre

Last synced: 23 Nov 2024

https://github.com/chame1eon/jnitrace-engine

Engine used by jnitrace to intercept JNI API calls.

android frida jni jni-api reverse-engineering sre tracer

Last synced: 21 Jan 2025

https://github.com/datadog/chaos-controller

:monkey: :fire: Datadog Failure Injection System for Kubernetes

chaos chaos-engineering chaos-monkey k8s kubernetes sre

Last synced: 19 Jan 2025

https://github.com/kgoralski/microservice-production-readiness-checklist

The principles that help to deploy safely to the production environment. If you like it:

aws checklist cloud kubernetes microservices resiliency sre

Last synced: 01 Nov 2024

https://github.com/seznam/slo-exporter

Slo-exporter computes standardized SLI and SLO metrics based on events coming from various data sources.

alerting exporter grafana monitoring prometheus service-level-indicator service-level-objective sli slo slo-exporter sre sre-workbook

Last synced: 19 Jan 2025

https://github.com/windvalley/gossh

🚀🚀A high-performance and high-concurrency ssh tool written in Go. It is 10 times faster than Ansible. If you need much more performance and better ease of use, you will love it.

ansible batchssh cli devops gossh high-concurrency multissh ops parallel-ssh sa sre ssh ssh-client sshbatch

Last synced: 18 Jan 2025

https://github.com/last9/slo-computer

SLOs, Error windows and alerts are complicated. Here an attempt to make it easy

metrics observability service-level-indicator service-level-objective sla sli slo sre sre-team

Last synced: 16 Nov 2024

https://github.com/apiaryio/s3-streaming-upload

s3-streaming-upload is node.js library that listens to your stream and upload its data to Amazon S3 using ManagedUpload API.

sre

Last synced: 02 Nov 2024

https://github.com/adhorn/aws-chaos-scripts

DEPRECATED Collection of python scripts to run failure injection on AWS infrastructure

amazon-web-services aws chaos-engineering chaos-monkey deprecated software-engineering sre

Last synced: 13 Nov 2024

https://github.com/getstrake/developer-cost-guide

SQL code for developers to understand AWS cloud costs. Reduce time spent on billing, get back to engineering. Created and maintained by the team at Macroscope.

aws cloud cost-estimation finops sre

Last synced: 24 Nov 2024

https://github.com/thoughtbot/flightdeck

Terraform modules for rapidly building production-grade Kubernetes clusters following SRE practices

kubernetes sre terraform

Last synced: 11 Nov 2024

https://github.com/dentrax/falco-gpt

AI-generated remediations for Falco audit events

audit-log chatgpt devops falco golang kubernetes openai sre sysdig threat-hunting tooling

Last synced: 11 Oct 2024

https://github.com/Dentrax/falco-gpt

AI-generated remediations for Falco audit events

audit-log chatgpt devops falco golang kubernetes openai sre sysdig threat-hunting tooling

Last synced: 05 Nov 2024

https://github.com/adhorn/aws-fis-templates-cdk

Collection of AWS Fault Injection Simulator (FIS) experiment templates deploy-able via the AWS CDK

amazon-web-services automation aws aws-fis cdk-examples cdk-library chaos-engineering chaos-testing devops-tools sre testing

Last synced: 19 Nov 2024

https://github.com/bjarneo/rip

Rest in peace(s) - HTTP/UDP load testing tool

ddos go golang http learning-by-doing load-testing rip security-tools sre sre-infra udp-flood

Last synced: 10 Nov 2024

https://github.com/ari-hacks/command-line-cheat-sheet

📝 A place to quickly lookup commands (bash, vim, git, AWS, Docker, Terraform, Ansible, kubectl)

ansible aws bash command-line devops docker git k8s kubectl kubernetes sre terraform vim

Last synced: 15 Jan 2025

https://github.com/tedilabs/terraform-aws-account

🌳 A sustainable Terraform Package which creates Account & IAM resources on AWS

aws aws-iam devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 19 Dec 2024

https://github.com/microsoft/sqlcallstackresolver

A sample tool for users of Microsoft SQL Server to aid in troubleshooting otherwise difficult to diagnose issues. Provided AS-IS - see SUPPORT.md.

azuresql azuresqldb azuresqlmanagedinstance callstack debugging debugging-symbol msdia140 pdb pdb-files sqlserver sqlserver-2017 sqlserver-2019 sqlserver-2022 sre symbols tool xevent xevents

Last synced: 10 Jan 2025

https://github.com/blacklane/kiev

A set of tools to do distributed logging for Ruby web applications

distributed-tracing elk-stack logging ruby sre

Last synced: 16 Jan 2025

https://github.com/loftwah/loftwahs-cheatsheet

My own personal tech cheatsheet. This covers the stuff I use quite regularly.

bash devops hacktoberfest linux nodejs python sre typescript

Last synced: 09 Nov 2024

https://github.com/microsoft/SQLCallStackResolver

A sample tool for users of Microsoft SQL Server to aid in troubleshooting otherwise difficult to diagnose issues. Provided AS-IS - see SUPPORT.md.

azuresql azuresqldb azuresqlmanagedinstance callstack debugging debugging-symbol msdia140 pdb pdb-files sqlserver sqlserver-2017 sqlserver-2019 sqlserver-2022 sre symbols tool xevent xevents

Last synced: 06 Nov 2024

https://github.com/tedilabs/terraform-aws-container

🌳 A sustainable Terraform Package which creates resources for Container Services on AWS

aws aws-ecr aws-eks devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 08 Nov 2024

https://github.com/ory/jobs

Want to build the next generation identity stack? You've come to the right place!

go hiring jobs kubernetes open-source opensource ory react sre

Last synced: 27 Oct 2024

https://github.com/k8sgpt-ai/docs

Documentation for K8sGPT

ai chatgpt docs kubernetes sre

Last synced: 15 Jan 2025

https://github.com/sitectl/cuttle

Blue Box SRE Operations Platform

ansible bastion bluebox elk operations sensu sre

Last synced: 07 Nov 2024

https://github.com/icco/postmortems

Postmortem metadata from danluu/post-mortems.

hacktoberfest postmortem-metadata sre

Last synced: 28 Oct 2024

https://github.com/ramizpolic/sre-playground

A set of Site Reliability Engineering notes & challenges

cicd cloud guide infrastructure site-reliability-engineer sre tasks

Last synced: 15 Nov 2024

https://github.com/fkie-cad/logprep

log data pre processing, generation and shipping in python

etl kafka log logdata loggenerator logshipper opensearch preprocessing python soar sre

Last synced: 21 Jan 2025

https://github.com/Excoriate/go-terradagger

TerraDagger is a Go package for managing your infrastructure-as-code through containers.

cli devops ecs example sre tooling

Last synced: 09 Nov 2024

https://github.com/seveas/herd

Massively parallel ssh client

cli orchestration sre ssh sysadmin system-administration

Last synced: 26 Nov 2024

https://github.com/lwindolf/multi-status

Aggregator PWA for status pages of online services. Know which of your 3rd party SaaS/PaaS are having issues right now.

cloud devops monitoring paas pwa saas sre

Last synced: 08 Dec 2024

https://github.com/immobiliare/collectd-haproxy-plugin

Collectd plugin to pull metrics from HAProxy instances

collectd collectd-plugin grafana haproxy metrics monitoring sre

Last synced: 02 Nov 2024

https://github.com/tedilabs/terraform-aws-network

🌳 A sustainable Terraform Package which creates VPC resources (VPC, Subnet, NACL, NAT Gateway, Route Table) on AWS

aws aws-vpc devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 08 Nov 2024