An open API service indexing awesome lists of open source software.

https://github.com/charles-bucher/cloudopslab

CloudOpsLab: Hands-on AWS and cloud support scripts showcasing troubleshooting, automation, monitoring, and self-healing. Demonstrates practical CloudOps skills, diagnostics, and cloud problem-solving for entry-level and early-career professionals.
https://github.com/charles-bucher/cloudopslab

automation aws bash cloud-support cloudops cloudwatch devops ec2 iac incident-response lamda linux monitoring portfolio python s3 scripts sysops terraform troubleshooting

Last synced: about 1 month ago
JSON representation

CloudOpsLab: Hands-on AWS and cloud support scripts showcasing troubleshooting, automation, monitoring, and self-healing. Demonstrates practical CloudOps skills, diagnostics, and cloud problem-solving for entry-level and early-career professionals.

Awesome Lists containing this project

README

          

# CloudOpsLab ๐Ÿ”ง

![AWS](https://img.shields.io/badge/AWS-FF9900?style=for-the-badge&logo=amazon-aws&logoColor=white)
![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
![Bash](https://img.shields.io/badge/Bash-4EAA25?style=for-the-badge&logo=gnu-bash&logoColor=white)
![Linux](https://img.shields.io/badge/Linux-FCC624?style=for-the-badge&logo=linux&logoColor=black)

![License](https://img.shields.io/badge/license-MIT-blue.svg)
![Status](https://img.shields.io/badge/status-active-success.svg)
![Open to Work](https://img.shields.io/badge/Open%20To%20Work-00C853?style=flat-square)

**Hands-on AWS CloudOps practice lab demonstrating automation, monitoring, and troubleshooting**

*Self-taught cloud engineer learning operational excellence through real-world scenarios*

---

## ๐ŸŽฏ About This Lab

This is my personal CloudOps learning environment where I'm teaching myself AWS operations by **actually doing the work**โ€”not just following tutorials.

### What Makes This Different:

โœ… **Real AWS account** - I'm spending ~$20/month from my delivery job to run this
โœ… **Real problems** - I break things on purpose, then learn to fix them
โœ… **Real solutions** - Python and Bash scripts I actually wrote and tested
โœ… **Real documentation** - Everything is documented like production systems

### My Goal:

Break into cloud operations by proving I can **do the work**, even though I'm entry-level.

---

## ๐Ÿงช What I've Built

### 1. CloudWatch Monitoring & Alerting ๐Ÿ“Š

**What I learned:** How to set up automated monitoring that actually catches issues

![CloudWatch Alarm](docs/screenshots/cloudwatch-alarm-triggered.png)
*CloudWatch alarm I configured - it actually triggered when my test EC2 hit 80% CPU*

**Skills practiced:**
- Creating CloudWatch alarms with proper thresholds
- Setting up SNS topics for notifications
- Configuring email alerts
- Testing alarm logic

**Code:** [`scripts/cloudwatch_alarms.py`](scripts/cloudwatch_alarms.py)

---

### 2. EC2 Auto-Recovery ๐Ÿ”„

**What I learned:** How to make instances self-heal from failures

![EC2 Auto-Recovery](docs/screenshots/ec2-auto-recovery-test.png)
*Testing auto-recovery by simulating an instance failure*

**The scenario:**
1. Configured CloudWatch alarm to detect status check failures
2. Set up automatic recovery action
3. Intentionally broke my test instance
4. Watched it recover automatically
5. Documented the whole process

**Result:** Instance recovered in ~4 minutes without any manual intervention

**Skills practiced:**
- EC2 status checks (system vs instance)
- CloudWatch alarm actions
- Auto-recovery configuration
- Incident response timing

**Code:** [`scripts/ec2_auto_recovery.py`](scripts/ec2_auto_recovery.py)

---

### 3. EC2 Cost Optimization ๐Ÿ’ฐ

**What I learned:** How to automate EC2 scheduling to save money

![EC2 Scheduler](automation/screenshots/ec2-scheduler-iam-fix.png)
*Troubleshooting IAM permissions (common real-world problem!)*

**The problem:**
- My Lambda function kept failing with `AccessDenied`
- Had to debug IAM policies
- Fixed permissions
- Learned that IAM troubleshooting is a critical CloudOps skill

**Skills practiced:**
- Lambda function development
- IAM policy debugging
- CloudWatch Events/EventBridge
- Cost optimization strategies

**Code:** [`scripts/ec2_scheduler.py`](scripts/ec2_scheduler.py)

---

### 4. EC2 Management with Boto3 ๐Ÿ

**What I learned:** Using Python to programmatically manage AWS infrastructure

![EC2 Manager](automation/screenshots/ec2-boto3-client-list.png)
*My Python script listing and managing EC2 instances*

**What it does:**
- List all EC2 instances
- Filter by tags and state
- Start/stop instances in bulk
- Handle API rate limits gracefully

**Skills practiced:**
- Boto3 SDK for Python
- AWS API interaction
- Error handling
- Pagination for large result sets

**Code:** [`scripts/ec2_manager.py`](scripts/ec2_manager.py)

---

### 5. S3 Security Auditing ๐Ÿ”’

**What I learned:** How to detect and fix security misconfigurations

![S3 Public Check](automation/screenshots/s3-public-access-detection.png)
*Script detecting publicly accessible S3 buckets*

**The scenario:**
1. Scan all S3 buckets for public access
2. Identify misconfigured bucket policies
3. Automatically remediate (block public access)
4. Generate audit report

**Result:** Prevented potential data exposure through automated compliance checks

**Skills practiced:**
- S3 security best practices
- Boto3 S3 operations
- Policy analysis
- Security automation

**Code:** [`scripts/s3_public_check.py`](scripts/s3_public_check.py)

---

### 6. Security Auditing ๐Ÿ›ก๏ธ

**What I learned:** How to audit AWS accounts for security issues

![Security Audit](monitoring/screenshots/security-audit-findings.png)
*Security audit script showing compliance findings*

**What it checks:**
- โœ… IAM users without MFA
- โœ… Overly permissive Security Groups (0.0.0.0/0)
- โœ… S3 buckets with public access
- โœ… Root account usage
- โœ… Unused access keys

**Skills practiced:**
- Security auditing methodology
- Compliance frameworks (CIS, AWS Well-Architected)
- Python reporting
- Remediation tracking

**Code:** [`monitoring/security_audit.py`](monitoring/security_audit.py)

---

### 7. GuardDuty Threat Monitoring ๐Ÿšจ

**What I learned:** How to use GuardDuty for threat detection

![GuardDuty](monitoring/screenshots/guardduty-enabled.png)
*GuardDuty actively monitoring my AWS account*

**Setup:**
- Enabled GuardDuty across account
- Configured severity levels
- Set up automated alerts
- Practiced incident response

**Skills practiced:**
- Threat detection setup
- Security monitoring
- Finding analysis
- Incident response basics

---

### 8. CloudHealth Monitoring ๐Ÿ“ˆ

**What I learned:** Building infrastructure health checks

![Health Check](monitoring/screenshots/cloud-health-monitoring.png)
*Health monitoring script detecting infrastructure issues*

**What it monitors:**
- Instance health status
- Disk usage
- Memory utilization
- Application errors from logs

**Skills practiced:**
- Multi-service monitoring
- Health check automation
- Log analysis
- Alert threshold tuning

**Code:** [`monitoring/health_check.py`](monitoring/health_check.py)

---

## ๐Ÿ”„ Self-Healing Infrastructure

**Concept:** Infrastructure that fixes itself automatically

**My learning process:**

```
Issue Occurs โ†’ Detection (Alarm) โ†’ Automated Remediation โ†’ Validation (Testing)
```

### Real Examples I've Implemented:

**1. EC2 Instance Failure**
- **Detection:** CloudWatch status check fails
- **Action:** Automatic instance recovery
- **Result:** 99.9% uptime maintained

**2. High CPU Usage**
- **Detection:** CloudWatch alarm at 80% CPU
- **Action:** SNS alert to me
- **Result:** I can investigate before outage

**3. S3 Bucket Made Public**
- **Detection:** Script finds public bucket
- **Action:** Lambda auto-remediates to private
- **Result:** Data exposure prevented

**4. Idle Resources**
- **Detection:** Script finds unused EC2 instances
- **Action:** Tag for review
- **Result:** Cost savings

**Code:** [`self_healing/`](self_healing/)

---

## ๐Ÿ” Troubleshooting I've Done

**Real problems I created and solved** (learning by breaking things)

### Problem โ†’ Investigation โ†’ Solution โ†’ Prevention

#### 1. IAM Permission Denied
**Problem:** My automation script kept failing with `AccessDenied`
**Investigation:** Reviewed IAM policies and CloudTrail logs
**Solution:** Added missing S3 permissions to role
**Learning:** Always check CloudTrail for the exact denied action

#### 2. Lambda Timeout
**Problem:** EC2 start/stop Lambda timing out
**Investigation:** Analyzed CloudWatch Logs
**Solution:** Increased timeout and optimized code
**Learning:** Lambda has hard limits, design accordingly

#### 3. CloudWatch Alarm Not Firing
**Problem:** No alerts received for known issue
**Investigation:** Checked alarm configuration and SNS
**Solution:** Fixed alarm metric query and SNS subscription
**Learning:** Test your monitoring before you need it

**Documentation:** [`troubleshooting/`](troubleshooting/)

---

## ๐Ÿ’ป Skills I'm Demonstrating

### AWS Services I've Actually Used:

**Compute & Networking:**
- โœ… EC2 (instance management, auto-recovery, scheduling)
- โœ… VPC (security groups, network monitoring)
- โœ… Lambda (automation functions)

**Storage:**
- โœ… S3 (security auditing, access control)
- โœ… EBS (volume monitoring)

**Security & Compliance:**
- โœ… IAM (policy troubleshooting, least privilege)
- โœ… GuardDuty (threat detection)
- โœ… CloudTrail (audit logging)

**Monitoring:**
- โœ… CloudWatch (logs, metrics, alarms, dashboards)
- โœ… SNS (notifications and alerting)
- โœ… Config (compliance rules)

---

### Technical Skills:

**Programming & Scripting:**
- **Python** - Boto3 SDK, automation scripts
- **Bash** - Linux administration, shell scripting
- **Git** - Version control for all code

**CloudOps Practices:**
- Infrastructure monitoring
- Automated remediation
- Security auditing
- Cost optimization
- Incident response
- Documentation

**Tools:**
- Boto3 (AWS SDK for Python)
- AWS CLI
- CloudWatch Logs Insights
- Linux command line
- VS Code

---

## ๐Ÿš€ Quick Start

### Prerequisites

```bash
# Required
- AWS Account (Free Tier works)
- Python 3.8+
- AWS CLI configured
- pip install boto3
```

### Setup

```bash
# 1. Clone the repository
git clone https://github.com/charles-bucher/CloudOpsLab.git
cd CloudOpsLab

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure AWS credentials
aws configure

# 4. Run a script
cd scripts/
python ec2_manager.py --list

# 5. Run security audit
cd ../monitoring/
python security_audit.py
```

### Example: Testing EC2 Auto-Recovery

```bash
# Deploy EC2 with auto-recovery
cd scripts/
python ec2_auto_recovery.py --deploy

# Simulate instance failure
python ec2_auto_recovery.py --simulate-failure

# Monitor recovery
python ec2_auto_recovery.py --check-status

# Verify recovery completed
python ec2_auto_recovery.py --validate
```

---

## ๐Ÿ“ Project Structure

```
CloudOpsLab/
โ”œโ”€โ”€ scripts/ # Main automation scripts
โ”‚ โ”œโ”€โ”€ cloudwatch_alarms.py
โ”‚ โ”œโ”€โ”€ ec2_auto_recovery.py
โ”‚ โ”œโ”€โ”€ ec2_manager.py
โ”‚ โ”œโ”€โ”€ ec2_scheduler.py
โ”‚ โ””โ”€โ”€ s3_public_check.py
โ”œโ”€โ”€ monitoring/ # Security & monitoring
โ”‚ โ”œโ”€โ”€ screenshots/ # Proof of monitoring work
โ”‚ โ”œโ”€โ”€ security_audit.py
โ”‚ โ”œโ”€โ”€ health_check.py
โ”‚ โ”œโ”€โ”€ guardduty_handler.py
โ”‚ โ””โ”€โ”€ issue_tracker.py
โ”œโ”€โ”€ self_healing/ # Auto-remediation logic
โ”‚ โ”œโ”€โ”€ ec2_recovery.py
โ”‚ โ”œโ”€โ”€ s3_remediation.py
โ”‚ โ””โ”€โ”€ lambda_functions/
โ”œโ”€โ”€ automation/ # Additional automation
โ”‚ โ””โ”€โ”€ screenshots/ # Proof of automation work
โ”œโ”€โ”€ troubleshooting/ # Problem scenarios & solutions
โ”‚ โ”œโ”€โ”€ iam_debugging.md
โ”‚ โ”œโ”€โ”€ lambda_timeout.md
โ”‚ โ””โ”€โ”€ cloudwatch_alarms.md
โ”œโ”€โ”€ docs/ # Documentation
โ”‚ โ”œโ”€โ”€ screenshots/ # Portfolio screenshots
โ”‚ โ”œโ”€โ”€ architecture.md
โ”‚ โ””โ”€โ”€ runbooks/ # Operational runbooks
โ””โ”€โ”€ README.md # You are here
```

---

## ๐Ÿ“Š My Learning Journey

### What I've Learned:

**Automation:**
- Python + Boto3 makes AWS operations programmable
- Error handling is critical for production automation
- IAM permissions require careful planning
- Testing automation is as important as writing it

**Monitoring:**
- You can't fix what you can't see
- Alerts must be actionable, not noisy
- CloudWatch Logs Insights is powerful for debugging
- GuardDuty catches things humans miss

**Self-Healing:**
- Automate detection before remediation
- Start with simple recovery, add complexity gradually
- Always have manual override capability
- Test failure scenarios regularly

**Operations:**
- Documentation saves time during incidents
- Cost optimization requires continuous monitoring
- Security is a daily practice, not a checkbox
- CloudTrail is your best friend for troubleshooting

---

## ๐ŸŽฏ What I'm Working On Next

**Planned improvements:**
- [ ] ECS container monitoring
- [ ] RDS backup automation
- [ ] Cost optimization reports
- [ ] Multi-region health checks
- [ ] Systems Manager integration
- [ ] Config compliance rules

**Skills I'm practicing:**
- [ ] Lambda with EventBridge
- [ ] Step Functions for workflows
- [ ] Advanced CloudWatch Logs Insights
- [ ] Container orchestration basics

---

## ๐Ÿ’ฐ Cost Transparency

**Monthly AWS Costs for This Lab:**
```
EC2 (2 ร— t3.micro): ~$15.00
S3 Storage: ~$1.00
Data Transfer: ~$2.00
CloudWatch Logs: ~$2.00
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Total: ~$20.00/month
```

**Funded by:** My part-time delivery job while learning cloud

**Worth it?** Absolutely. I'm building proof, not just theory.

---

## ๐Ÿ“ธ Screenshots & Evidence

All screenshots in this repo are from **my actual AWS account**. No stock images, no tutorial screenshots.

**Screenshot locations:**
- `docs/screenshots/` - General portfolio screenshots
- `automation/screenshots/` - Automation project screenshots
- `monitoring/screenshots/` - Monitoring & security screenshots

---

## ๐Ÿ™‹โ€โ™‚๏ธ About Me

**Charles Bucher**
Self-Taught Cloud Engineer | Career Transition from Delivery Driving

**My Story:**

I'm 40 years old, working as a delivery driver, teaching myself cloud engineering to provide better for my family. Instead of just watching tutorials, I'm actually **building things in AWS** and documenting everything.

**Why trust my work?**
- โœ… Every screenshot is from MY AWS account
- โœ… I spend my own money running these labs ($20/month)
- โœ… I work on these projects after 10-hour delivery shifts
- โœ… I document everything like production systems

**What I'm NOT:**
- โŒ A senior engineer pretending to be entry-level
- โŒ Someone who just copied tutorials
- โŒ A paper cert chaser with no hands-on

**What I AM:**
- โœ… Self-taught and proud of it
- โœ… Honest about being entry-level
- โœ… Willing to start small and prove myself
- โœ… Ready to outwork anyone for this opportunity

---

## ๐ŸŽฏ Current Status

**Studying for:** AWS SysOps Administrator Associate
**Looking for:** Entry-level Cloud Support / SysOps / DevOps roles
**Location:** Florida (remote preferred)
**Salary expectations:** $50k+ (realistic for entry-level)

### What I'm Open To:
- Full-time W2 positions
- Contract work through staffing agencies
- Remote opportunities
- Hybrid roles in Tampa Bay area

---

## ๐Ÿ“ž Let's Connect

[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/charles-bucher-cloud)
[![Email](https://img.shields.io/badge/Email-D14836?style=for-the-badge&logo=gmail&logoColor=white)](mailto:quietopscb@gmail.com)
[![Portfolio](https://img.shields.io/badge/Portfolio-FF5722?style=for-the-badge&logo=google-chrome&logoColor=white)](https://charles-bucher.github.io/)

---

## ๐Ÿ“ Quick Facts

```yaml
name: Charles Bucher
role: Self-Taught Cloud Engineer
location: Florida
status: Open to Work
focus: AWS CloudOps

skills:
cloud: [AWS, CloudWatch, EC2, S3, Lambda, IAM]
scripting: [Python, Bash]
tools: [Boto3, AWS CLI, Git, Linux]
practices: [Automation, Monitoring, Security, Troubleshooting]

currently_learning:
- AWS SysOps Administrator Associate
- Advanced CloudWatch patterns
- Infrastructure automation

ideal_role:
- AWS Cloud Support Associate
- Junior SysOps Administrator
- Cloud Operations Engineer
- Entry-level DevOps Engineer

motivation: "Family deserves better than paycheck-to-paycheck living"
```

---

## ๐Ÿ† Why This Lab Matters

### What This Proves:

**For Hiring Managers:**
- โœ… I can actually use AWS (not just theory)
- โœ… I troubleshoot systematically
- โœ… I document professionally
- โœ… I'm self-motivated (teaching myself after work)

**For Me:**
- โœ… Built confidence in AWS operations
- โœ… Created reusable automation scripts
- โœ… Developed systematic debugging approach
- โœ… Have portfolio proof of hands-on work

**For Other Learners:**
- โœ… Error-driven learning works
- โœ… You don't need expensive courses
- โœ… Free tier + determination = real skills
- โœ… Document everything!

---

## ๐Ÿค Contributing

This is a personal learning project, but I'm open to suggestions!

**Ways to help:**
- ๐Ÿ› Report issues or bugs
- ๐Ÿ’ก Suggest new scenarios
- ๐Ÿ“ Improve documentation
- โญ Star the repo if you find it useful

---

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.

---

## ๐Ÿ™ Acknowledgments

**Learning Resources:**
- AWS Documentation
- AWS Well-Architected Framework
- Boto3 Documentation
- Real-world experience from this lab

**Inspiration:**
- My family depending on this career change
- The need to prove skills through actual work
- Love for solving technical problems
- This community of self-taught engineers

---

## โญ If This Helped You...

If this repo helped you learn CloudOps or gave you ideas for your own portfolio, please give it a star! It helps others find it too.

---

**Built with โ˜•, Python, and a lot of trial and error**

### Charles Bucher | Self-Taught Cloud Engineer

*"I can't fake experience, so I'm building proof instead"*

![Profile Views](https://komarev.com/ghpvc/?username=charles-bucher&color=0e75b6&style=flat-square&label=Repo+Views)

---

**CloudOpsLab** | Learning operational excellence one automation at a time

[โฌ† Back to Top](#cloudopslab-)