https://github.com/charles-bucher/cloudopslab
CloudOpsLab: Hands-on AWS and cloud support scripts showcasing troubleshooting, automation, monitoring, and self-healing. Demonstrates practical CloudOps skills, diagnostics, and cloud problem-solving for entry-level and early-career professionals.
https://github.com/charles-bucher/cloudopslab
automation aws bash cloud-support cloudops cloudwatch devops ec2 iac incident-response lamda linux monitoring portfolio python s3 scripts sysops terraform troubleshooting
Last synced: about 1 month ago
JSON representation
CloudOpsLab: Hands-on AWS and cloud support scripts showcasing troubleshooting, automation, monitoring, and self-healing. Demonstrates practical CloudOps skills, diagnostics, and cloud problem-solving for entry-level and early-career professionals.
- Host: GitHub
- URL: https://github.com/charles-bucher/cloudopslab
- Owner: charles-bucher
- License: mit
- Created: 2025-12-25T07:20:17.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-01-03T10:52:22.000Z (about 1 month ago)
- Last Synced: 2026-01-03T20:16:52.670Z (about 1 month ago)
- Topics: automation, aws, bash, cloud-support, cloudops, cloudwatch, devops, ec2, iac, incident-response, lamda, linux, monitoring, portfolio, python, s3, scripts, sysops, terraform, troubleshooting
- Language: Python
- Homepage:
- Size: 598 KB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CloudOpsLab ๐ง







**Hands-on AWS CloudOps practice lab demonstrating automation, monitoring, and troubleshooting**
*Self-taught cloud engineer learning operational excellence through real-world scenarios*
---
## ๐ฏ About This Lab
This is my personal CloudOps learning environment where I'm teaching myself AWS operations by **actually doing the work**โnot just following tutorials.
### What Makes This Different:
โ
**Real AWS account** - I'm spending ~$20/month from my delivery job to run this
โ
**Real problems** - I break things on purpose, then learn to fix them
โ
**Real solutions** - Python and Bash scripts I actually wrote and tested
โ
**Real documentation** - Everything is documented like production systems
### My Goal:
Break into cloud operations by proving I can **do the work**, even though I'm entry-level.
---
## ๐งช What I've Built
### 1. CloudWatch Monitoring & Alerting ๐
**What I learned:** How to set up automated monitoring that actually catches issues

*CloudWatch alarm I configured - it actually triggered when my test EC2 hit 80% CPU*
**Skills practiced:**
- Creating CloudWatch alarms with proper thresholds
- Setting up SNS topics for notifications
- Configuring email alerts
- Testing alarm logic
**Code:** [`scripts/cloudwatch_alarms.py`](scripts/cloudwatch_alarms.py)
---
### 2. EC2 Auto-Recovery ๐
**What I learned:** How to make instances self-heal from failures

*Testing auto-recovery by simulating an instance failure*
**The scenario:**
1. Configured CloudWatch alarm to detect status check failures
2. Set up automatic recovery action
3. Intentionally broke my test instance
4. Watched it recover automatically
5. Documented the whole process
**Result:** Instance recovered in ~4 minutes without any manual intervention
**Skills practiced:**
- EC2 status checks (system vs instance)
- CloudWatch alarm actions
- Auto-recovery configuration
- Incident response timing
**Code:** [`scripts/ec2_auto_recovery.py`](scripts/ec2_auto_recovery.py)
---
### 3. EC2 Cost Optimization ๐ฐ
**What I learned:** How to automate EC2 scheduling to save money

*Troubleshooting IAM permissions (common real-world problem!)*
**The problem:**
- My Lambda function kept failing with `AccessDenied`
- Had to debug IAM policies
- Fixed permissions
- Learned that IAM troubleshooting is a critical CloudOps skill
**Skills practiced:**
- Lambda function development
- IAM policy debugging
- CloudWatch Events/EventBridge
- Cost optimization strategies
**Code:** [`scripts/ec2_scheduler.py`](scripts/ec2_scheduler.py)
---
### 4. EC2 Management with Boto3 ๐
**What I learned:** Using Python to programmatically manage AWS infrastructure

*My Python script listing and managing EC2 instances*
**What it does:**
- List all EC2 instances
- Filter by tags and state
- Start/stop instances in bulk
- Handle API rate limits gracefully
**Skills practiced:**
- Boto3 SDK for Python
- AWS API interaction
- Error handling
- Pagination for large result sets
**Code:** [`scripts/ec2_manager.py`](scripts/ec2_manager.py)
---
### 5. S3 Security Auditing ๐
**What I learned:** How to detect and fix security misconfigurations

*Script detecting publicly accessible S3 buckets*
**The scenario:**
1. Scan all S3 buckets for public access
2. Identify misconfigured bucket policies
3. Automatically remediate (block public access)
4. Generate audit report
**Result:** Prevented potential data exposure through automated compliance checks
**Skills practiced:**
- S3 security best practices
- Boto3 S3 operations
- Policy analysis
- Security automation
**Code:** [`scripts/s3_public_check.py`](scripts/s3_public_check.py)
---
### 6. Security Auditing ๐ก๏ธ
**What I learned:** How to audit AWS accounts for security issues

*Security audit script showing compliance findings*
**What it checks:**
- โ
IAM users without MFA
- โ
Overly permissive Security Groups (0.0.0.0/0)
- โ
S3 buckets with public access
- โ
Root account usage
- โ
Unused access keys
**Skills practiced:**
- Security auditing methodology
- Compliance frameworks (CIS, AWS Well-Architected)
- Python reporting
- Remediation tracking
**Code:** [`monitoring/security_audit.py`](monitoring/security_audit.py)
---
### 7. GuardDuty Threat Monitoring ๐จ
**What I learned:** How to use GuardDuty for threat detection

*GuardDuty actively monitoring my AWS account*
**Setup:**
- Enabled GuardDuty across account
- Configured severity levels
- Set up automated alerts
- Practiced incident response
**Skills practiced:**
- Threat detection setup
- Security monitoring
- Finding analysis
- Incident response basics
---
### 8. CloudHealth Monitoring ๐
**What I learned:** Building infrastructure health checks

*Health monitoring script detecting infrastructure issues*
**What it monitors:**
- Instance health status
- Disk usage
- Memory utilization
- Application errors from logs
**Skills practiced:**
- Multi-service monitoring
- Health check automation
- Log analysis
- Alert threshold tuning
**Code:** [`monitoring/health_check.py`](monitoring/health_check.py)
---
## ๐ Self-Healing Infrastructure
**Concept:** Infrastructure that fixes itself automatically
**My learning process:**
```
Issue Occurs โ Detection (Alarm) โ Automated Remediation โ Validation (Testing)
```
### Real Examples I've Implemented:
**1. EC2 Instance Failure**
- **Detection:** CloudWatch status check fails
- **Action:** Automatic instance recovery
- **Result:** 99.9% uptime maintained
**2. High CPU Usage**
- **Detection:** CloudWatch alarm at 80% CPU
- **Action:** SNS alert to me
- **Result:** I can investigate before outage
**3. S3 Bucket Made Public**
- **Detection:** Script finds public bucket
- **Action:** Lambda auto-remediates to private
- **Result:** Data exposure prevented
**4. Idle Resources**
- **Detection:** Script finds unused EC2 instances
- **Action:** Tag for review
- **Result:** Cost savings
**Code:** [`self_healing/`](self_healing/)
---
## ๐ Troubleshooting I've Done
**Real problems I created and solved** (learning by breaking things)
### Problem โ Investigation โ Solution โ Prevention
#### 1. IAM Permission Denied
**Problem:** My automation script kept failing with `AccessDenied`
**Investigation:** Reviewed IAM policies and CloudTrail logs
**Solution:** Added missing S3 permissions to role
**Learning:** Always check CloudTrail for the exact denied action
#### 2. Lambda Timeout
**Problem:** EC2 start/stop Lambda timing out
**Investigation:** Analyzed CloudWatch Logs
**Solution:** Increased timeout and optimized code
**Learning:** Lambda has hard limits, design accordingly
#### 3. CloudWatch Alarm Not Firing
**Problem:** No alerts received for known issue
**Investigation:** Checked alarm configuration and SNS
**Solution:** Fixed alarm metric query and SNS subscription
**Learning:** Test your monitoring before you need it
**Documentation:** [`troubleshooting/`](troubleshooting/)
---
## ๐ป Skills I'm Demonstrating
### AWS Services I've Actually Used:
**Compute & Networking:**
- โ
EC2 (instance management, auto-recovery, scheduling)
- โ
VPC (security groups, network monitoring)
- โ
Lambda (automation functions)
**Storage:**
- โ
S3 (security auditing, access control)
- โ
EBS (volume monitoring)
**Security & Compliance:**
- โ
IAM (policy troubleshooting, least privilege)
- โ
GuardDuty (threat detection)
- โ
CloudTrail (audit logging)
**Monitoring:**
- โ
CloudWatch (logs, metrics, alarms, dashboards)
- โ
SNS (notifications and alerting)
- โ
Config (compliance rules)
---
### Technical Skills:
**Programming & Scripting:**
- **Python** - Boto3 SDK, automation scripts
- **Bash** - Linux administration, shell scripting
- **Git** - Version control for all code
**CloudOps Practices:**
- Infrastructure monitoring
- Automated remediation
- Security auditing
- Cost optimization
- Incident response
- Documentation
**Tools:**
- Boto3 (AWS SDK for Python)
- AWS CLI
- CloudWatch Logs Insights
- Linux command line
- VS Code
---
## ๐ Quick Start
### Prerequisites
```bash
# Required
- AWS Account (Free Tier works)
- Python 3.8+
- AWS CLI configured
- pip install boto3
```
### Setup
```bash
# 1. Clone the repository
git clone https://github.com/charles-bucher/CloudOpsLab.git
cd CloudOpsLab
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure AWS credentials
aws configure
# 4. Run a script
cd scripts/
python ec2_manager.py --list
# 5. Run security audit
cd ../monitoring/
python security_audit.py
```
### Example: Testing EC2 Auto-Recovery
```bash
# Deploy EC2 with auto-recovery
cd scripts/
python ec2_auto_recovery.py --deploy
# Simulate instance failure
python ec2_auto_recovery.py --simulate-failure
# Monitor recovery
python ec2_auto_recovery.py --check-status
# Verify recovery completed
python ec2_auto_recovery.py --validate
```
---
## ๐ Project Structure
```
CloudOpsLab/
โโโ scripts/ # Main automation scripts
โ โโโ cloudwatch_alarms.py
โ โโโ ec2_auto_recovery.py
โ โโโ ec2_manager.py
โ โโโ ec2_scheduler.py
โ โโโ s3_public_check.py
โโโ monitoring/ # Security & monitoring
โ โโโ screenshots/ # Proof of monitoring work
โ โโโ security_audit.py
โ โโโ health_check.py
โ โโโ guardduty_handler.py
โ โโโ issue_tracker.py
โโโ self_healing/ # Auto-remediation logic
โ โโโ ec2_recovery.py
โ โโโ s3_remediation.py
โ โโโ lambda_functions/
โโโ automation/ # Additional automation
โ โโโ screenshots/ # Proof of automation work
โโโ troubleshooting/ # Problem scenarios & solutions
โ โโโ iam_debugging.md
โ โโโ lambda_timeout.md
โ โโโ cloudwatch_alarms.md
โโโ docs/ # Documentation
โ โโโ screenshots/ # Portfolio screenshots
โ โโโ architecture.md
โ โโโ runbooks/ # Operational runbooks
โโโ README.md # You are here
```
---
## ๐ My Learning Journey
### What I've Learned:
**Automation:**
- Python + Boto3 makes AWS operations programmable
- Error handling is critical for production automation
- IAM permissions require careful planning
- Testing automation is as important as writing it
**Monitoring:**
- You can't fix what you can't see
- Alerts must be actionable, not noisy
- CloudWatch Logs Insights is powerful for debugging
- GuardDuty catches things humans miss
**Self-Healing:**
- Automate detection before remediation
- Start with simple recovery, add complexity gradually
- Always have manual override capability
- Test failure scenarios regularly
**Operations:**
- Documentation saves time during incidents
- Cost optimization requires continuous monitoring
- Security is a daily practice, not a checkbox
- CloudTrail is your best friend for troubleshooting
---
## ๐ฏ What I'm Working On Next
**Planned improvements:**
- [ ] ECS container monitoring
- [ ] RDS backup automation
- [ ] Cost optimization reports
- [ ] Multi-region health checks
- [ ] Systems Manager integration
- [ ] Config compliance rules
**Skills I'm practicing:**
- [ ] Lambda with EventBridge
- [ ] Step Functions for workflows
- [ ] Advanced CloudWatch Logs Insights
- [ ] Container orchestration basics
---
## ๐ฐ Cost Transparency
**Monthly AWS Costs for This Lab:**
```
EC2 (2 ร t3.micro): ~$15.00
S3 Storage: ~$1.00
Data Transfer: ~$2.00
CloudWatch Logs: ~$2.00
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total: ~$20.00/month
```
**Funded by:** My part-time delivery job while learning cloud
**Worth it?** Absolutely. I'm building proof, not just theory.
---
## ๐ธ Screenshots & Evidence
All screenshots in this repo are from **my actual AWS account**. No stock images, no tutorial screenshots.
**Screenshot locations:**
- `docs/screenshots/` - General portfolio screenshots
- `automation/screenshots/` - Automation project screenshots
- `monitoring/screenshots/` - Monitoring & security screenshots
---
## ๐โโ๏ธ About Me
**Charles Bucher**
Self-Taught Cloud Engineer | Career Transition from Delivery Driving
**My Story:**
I'm 40 years old, working as a delivery driver, teaching myself cloud engineering to provide better for my family. Instead of just watching tutorials, I'm actually **building things in AWS** and documenting everything.
**Why trust my work?**
- โ
Every screenshot is from MY AWS account
- โ
I spend my own money running these labs ($20/month)
- โ
I work on these projects after 10-hour delivery shifts
- โ
I document everything like production systems
**What I'm NOT:**
- โ A senior engineer pretending to be entry-level
- โ Someone who just copied tutorials
- โ A paper cert chaser with no hands-on
**What I AM:**
- โ
Self-taught and proud of it
- โ
Honest about being entry-level
- โ
Willing to start small and prove myself
- โ
Ready to outwork anyone for this opportunity
---
## ๐ฏ Current Status
**Studying for:** AWS SysOps Administrator Associate
**Looking for:** Entry-level Cloud Support / SysOps / DevOps roles
**Location:** Florida (remote preferred)
**Salary expectations:** $50k+ (realistic for entry-level)
### What I'm Open To:
- Full-time W2 positions
- Contract work through staffing agencies
- Remote opportunities
- Hybrid roles in Tampa Bay area
---
## ๐ Let's Connect
[](https://www.linkedin.com/in/charles-bucher-cloud)
[](mailto:quietopscb@gmail.com)
[](https://charles-bucher.github.io/)
---
## ๐ Quick Facts
```yaml
name: Charles Bucher
role: Self-Taught Cloud Engineer
location: Florida
status: Open to Work
focus: AWS CloudOps
skills:
cloud: [AWS, CloudWatch, EC2, S3, Lambda, IAM]
scripting: [Python, Bash]
tools: [Boto3, AWS CLI, Git, Linux]
practices: [Automation, Monitoring, Security, Troubleshooting]
currently_learning:
- AWS SysOps Administrator Associate
- Advanced CloudWatch patterns
- Infrastructure automation
ideal_role:
- AWS Cloud Support Associate
- Junior SysOps Administrator
- Cloud Operations Engineer
- Entry-level DevOps Engineer
motivation: "Family deserves better than paycheck-to-paycheck living"
```
---
## ๐ Why This Lab Matters
### What This Proves:
**For Hiring Managers:**
- โ
I can actually use AWS (not just theory)
- โ
I troubleshoot systematically
- โ
I document professionally
- โ
I'm self-motivated (teaching myself after work)
**For Me:**
- โ
Built confidence in AWS operations
- โ
Created reusable automation scripts
- โ
Developed systematic debugging approach
- โ
Have portfolio proof of hands-on work
**For Other Learners:**
- โ
Error-driven learning works
- โ
You don't need expensive courses
- โ
Free tier + determination = real skills
- โ
Document everything!
---
## ๐ค Contributing
This is a personal learning project, but I'm open to suggestions!
**Ways to help:**
- ๐ Report issues or bugs
- ๐ก Suggest new scenarios
- ๐ Improve documentation
- โญ Star the repo if you find it useful
---
## ๐ License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.
---
## ๐ Acknowledgments
**Learning Resources:**
- AWS Documentation
- AWS Well-Architected Framework
- Boto3 Documentation
- Real-world experience from this lab
**Inspiration:**
- My family depending on this career change
- The need to prove skills through actual work
- Love for solving technical problems
- This community of self-taught engineers
---
## โญ If This Helped You...
If this repo helped you learn CloudOps or gave you ideas for your own portfolio, please give it a star! It helps others find it too.
---
**Built with โ, Python, and a lot of trial and error**
### Charles Bucher | Self-Taught Cloud Engineer
*"I can't fake experience, so I'm building proof instead"*

---
**CloudOpsLab** | Learning operational excellence one automation at a time
[โฌ Back to Top](#cloudopslab-)