{"id":35142174,"url":"https://github.com/tysker/cloud_devops_lab","last_synced_at":"2026-04-07T08:01:33.730Z","repository":{"id":327227513,"uuid":"1108409219","full_name":"tysker/cloud_devops_lab","owner":"tysker","description":"This repository is a complete end-to-end DevOps learning project built around a small Python Flask application. The goal is to gradually build a realistic production-like environment .","archived":false,"fork":false,"pushed_at":"2026-01-17T07:19:52.000Z","size":88,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-17T18:19:10.689Z","etag":null,"topics":["ansible","api","cloudflare","devops","dns","docker","dockerfile","github-actions","grafana","linode","prometheus","python","terraform"],"latest_commit_sha":null,"homepage":"","language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tysker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-02T12:11:21.000Z","updated_at":"2026-01-17T07:19:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/tysker/cloud_devops_lab","commit_stats":null,"previous_names":["tysker/cloud_devops_lab"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tysker/cloud_devops_lab","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tysker%2Fcloud_devops_lab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tysker%2Fcloud_devops_lab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tysker%2Fcloud_devops_lab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tysker%2Fcloud_devops_lab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tysker","download_url":"https://codeload.github.com/tysker/cloud_devops_lab/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tysker%2Fcloud_devops_lab/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31504897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansible","api","cloudflare","devops","dns","docker","dockerfile","github-actions","grafana","linode","prometheus","python","terraform"],"created_at":"2025-12-28T12:03:22.234Z","updated_at":"2026-04-07T08:01:33.722Z","avatar_url":"https://github.com/tysker.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DevOps Project\n\n## Project Description\n\nThis repository is a complete end-to-end DevOps learning project built around a small Python Flask\napplication. All access follows a bastion-based, non-root security model. \n\nThe goal is to gradually build a realistic production-like environment that includes:\n\n- containerization with Docker\n- CI/CD pipelines (GitHub Actions)\n- artifact registries (Docker Registry \u0026 GitHub Packages)\n- infrastructure provisioning (Terraform)\n- configuration management (Ansible roles)\n- monitoring and visualization (Prometheus \u0026 Grafana)\n- security best practices (jump host, SSH hardening, TLS certificates)\n\nThe project grows in clear stages. Each stage is documented with **what was done**, **why it matters**,\nand **how it was implemented**, so it becomes both a learning journal and a portfolio project.\n\n**Current status:** Stages 1–11 completed. Application is deployed and monitored (Prometheus + Grafana), and served via HTTPS using Caddy + Let’s Encrypt. SSH access is restricted to a bastion host and allow listed source IPs.\n\n## Structure\n\nCurrent project layout:\n\n```\ncloud_devops_lab/\n    ├── ansible\n    │   ├── ansible.cfg\n    │   ├── ansible.log\n    │   ├── group_vars\n    │   │   ├── all\n    │   │   │   └── vars.yml\n    │   │   ├── app\n    │   │   │   └── vars.yml\n    │   │   └── monitoring\n    │   │       ├── vars.yml\n    │   │       └── vault.yml\n    │   ├── hosts.ini\n    │   ├── playbooks\n    │   │   ├── bootstrap_1.yml\n    │   │   ├── bootstrap_2.yml\n    │   │   ├── caddy.yml\n    │   │   ├── deploy_app.yml\n    │   │   ├── monitoring_grafana.yml\n    │   │   ├── monitoring_node_exporter.yml\n    │   │   ├── monitoring_prometheus.yml\n    │   │   ├── security_fail2ban.yml\n    │   │   └── unattended_upgrades.yml\n    │   ├── README.md\n    │   └── roles\n    │       ├── bootstrap_user\n    │       ├── caddy\n    │       ├── common\n    │       ├── deploy_app\n    │       ├── docker\n    │       ├── fail2ban\n    │       ├── grafana\n    │       ├── node_exporter\n    │       ├── prometheus\n    │       ├── ssh_hardening\n    │       └── unattended_upgrades\n    ├── app\n    │   ├── Dockerfile\n    │   ├── gunicorn.conf.py\n    │   ├── requirements.txt\n    │   ├── src\n    │   │   ├── app.py\n    │   │   ├── routes\n    │   │   │   ├── health.py\n    │   │   │   ├── metrics.py\n    │   │   │   └── root.py\n    │   │   └── utils\n    │   │       ├── counters.py\n    │   └── venv\n    ├── docs\n    │   └── project-checklist.md\n    ├── IAAS.md\n    ├── infrastructure\n    │   └── terraform\n    │       ├── main.tf\n    │       ├── modules\n    │       │   └── compute\n    │       ├── outputs.tf\n    │       ├── providers.tf\n    │       ├── terraform.tfstate\n    │       ├── terraform.tfstate.backup\n    │       ├── terraform.tfvars\n    │       └── variables.tf\n    ├── LICENSE\n    └── README.md\n\n```\n\n## Requirements (current)\n\n- Python 3.12+\n- pip / venv\n- Git\n- Ansible\n- Terraform\n- Linode for server hosting\n- Cloudflare (DNS)\n- Domain registrar\n- Grafana\n- Prometheus\n\n## Running the Application Locally\n\n```\npython -m venv venv\nsource venv/bin/activate\npip install -r app/requirements.txt\npython -m app.src.app\n```\n\n```\nApplication runs at:\nhttp://localhost:5000/\n```\n\n## Stages\n\nThe project is built in incremental stages. Each stage adds a new DevOps capability on top of the existing system.\n\n### Stages Overview\n\n- \u003cs\u003eStage 1: Flask application\u003c/s\u003e\n- \u003cs\u003eStage 2: Containerization with Docker\u003c/s\u003e\n- \u003cs\u003eStage 3: CI/CD pipeline (GitHub Actions \u0026 GHCR)\u003c/s\u003e\n- \u003cs\u003eStage 4: Infrastructure (Terraform – servers, networking, firewalls)\u003c/s\u003e\n- \u003cs\u003eStage 5: DNS \u0026 domain management (Cloudflare)\u003c/s\u003e\n- \u003cs\u003eStage 6: Ansible bootstrap \u0026 access control\u003c/s\u003e\n- \u003cs\u003eStage 7: SSH hardening\u003c/s\u003e\n- \u003cs\u003eStage 8: Docker installation (via Ansible)\u003c/s\u003e\n- \u003cs\u003eStage 9: Application deployment\u003c/s\u003e\n- \u003cs\u003eStage 10: Monitoring stack (Prometheus \u0026 Grafana)\u003c/s\u003e\n- \u003cs\u003eStage 11: TLS certificates \u0026 reverse proxy (Caddy))\u003c/s\u003e\n\n### Stage 1 — Flask Application\n\n**What:** Implemented a minimal Flask API with initial routing.  \n**Why:** A simple application is required before adding Docker, CI/CD, infrastructure and monitoring.  \n**How:** Created project folder structure, used Blueprints, tested locally with Python.\n\n- Basic Flask application runs locally.\n- Endpoints:\n  - `/` – root\n  - `/health`\n- Foundation for Dockerization, CI/CD, monitoring and future infrastructure work.\n\n### Stage 2 — Containerization with Docker\n\n**What:**  \nCreated a production-ready Dockerfile for the Flask application using a multi-stage build.\n\n**Why:**  \nContainerizing the application allows consistent deployment across environments and provides the\nfoundation for CI/CD pipelines, registries, deployment automation, and infrastructure scaling.\n\n**How:**\n\n- Implemented a two-stage Dockerfile (builder + runtime).\n- Installed dependencies in an isolated build layer.\n- Copied only necessary runtime dependencies into a slim final image.\n- Added a non-root application user for security.\n- Added a Docker HEALTHCHECK hitting `/health`.\n- Exposed port 5000 and used Gunicorn as the production WSGI(Web Server Gateway Interface) server.\n- Built and ran the image locally to verify functionality.\n\n**How to build and run**\n\n1. Build image: `docker build -t cloud-devops-app:0.1 .`\n2. Run container: `docker run -p 5000:5000 cloud-devops-app:0.1`\n3. Test health endpoint: `curl http://localhost:5000/health`\n\n### Stage 3 — CI/CD Pipeline (GHCR Integration)\n\n**What:**  \nExtended the GitHub Actions workflow to build Docker images with tags and push them to\nGitHub Container Registry (GHCR).\n\n**Why:**  \nA registry is required for deployment automation and ensures versioned, reproducible artifacts\nthat can be pulled by servers during deployment.\n\n**How:**\n\n- Added permissions for GitHub Actions to write to GHCR.\n- Logged in to GHCR using `GITHUB_TOKEN`.\n- Created two image tags (`latest` and short commit SHA).\n- Pushed images automatically on changes to `develop` and `main`.\n\n### Stage 4 - Infrastructure (Terraform – servers, networking, firewalls)\n\nInfrastructure is provisioned using Terraform on Linode (Akamai).\n\n#### Architecture Overview\n\n- **Jump Server**\n  - Public + private IP\n  - SSH entry point (bastion host)\n\n- **Application Server**\n  - Private network only\n  - Runs application containers\n\n- **Monitoring Server**\n  - Private network only\n  - Runs Prometheus and Grafana\n\nAll servers share a private network.  \nOnly the jump server is reachable from the public internet.\n\n#### Security Model\n\n- Bastion (jump server) pattern\n- SSH key authentication only\n- No private keys stored on servers\n- App and monitoring servers accessible only via private network\n- Network access enforced using Linode Firewalls\n- SSH agent forwarding used for hop-based access\n\n#### Terraform Structure\n\n```\ninfrastructure\n└── terraform\n    ├── main.tf\n    ├── modules\n    │   └── compute\n    │       ├── main.tf\n    │       ├── outputs.tf\n    │       ├── providers.tf\n    │       └── variables.tf\n    ├── outputs.tf\n    ├── providers.tf\n    ├── terraform.tfstate\n    ├── terraform.tfstate.backup\n    ├── terraform.tfvars\n    └── variables.tf\n```\n\nThis stage establishes the baseline infrastructure but does not yet deploy applications.\n\n### Stage 5 - DNS \u0026 Domain Management\n\nThe domain `clouddevopslab.eu` is registered at simply.com and delegated to Cloudflare\nfor DNS management and security features.\n\n### Stage 6 — Ansible Bootstrap \u0026 Access Control\n\n**What:**  \nIntroduced Ansible to centrally manage all servers using a bastion (jump host) model.\nBootstrapped a non-root `devops` user with SSH key access and sudo privileges.\n\n**Why:**  \nManual server configuration does not scale and is error-prone.\nAnsible provides reproducible, auditable configuration management and enforces\nleast-privilege access by avoiding root logins.\n\n**How:**\n\n- Configured Ansible inventory with a jump host (bastion pattern).\n- Enabled SSH agent forwarding for secure multi-hop access.\n- Created a reusable `common` role for connectivity checks.\n- Added a `bootstrap_users` role to:\n  - create a `devops` user\n  - configure passwordless sudo\n  - install SSH public keys\n- Switched Ansible to run as `devops` with privilege escalation (`become`).\n\n### Stage 7 — SSH Hardening\n\n**What:**  \nHardened SSH access across all servers by disabling insecure authentication\nmethods and enforcing least-privilege access.\n\n**Why:**  \nSSH is the primary attack surface on servers. Hardening reduces the risk of\nbrute-force attacks, credential abuse, and privilege escalation.\n\n**How:**\n\n- Disabled password-based SSH authentication.\n- Disabled challenge-response authentication.\n- Restricted SSH access using an explicit `AllowUsers` list.\n- Disabled root SSH login entirely.\n- Enforced bastion-only access using a jump host.\n- Ensured Ansible operates as a non-root user with controlled privilege escalation.\n\n### Stage 8 — Docker Installation\n\n**What:**  \nInstalled Docker Engine and Docker Compose plugin on application and monitoring servers.\n\n**Why:**  \nContainers provide consistent, reproducible runtime environments and are the foundation\nfor application deployment and monitoring.\n\n**How:**\n\n- Installed Docker from the official Docker APT repository.\n- Enabled and started the Docker service.\n- Added the non-root `devops` user to the `docker` group.\n- Verified operation using `docker run hello-world`.\n\nDocker is intentionally not installed on the jump server.\n\n### Stage 9 — Application Deployment (Docker + Ansible)\n\n**What:**  \nDeployed the Flask application container to the application server using Ansible.\n\n**Why:**  \nA repeatable deployment reduces manual steps and ensures consistent environments.\n\n**How:**  \n- Pulled a pinned image tag from GHCR (`ghcr.io/tysker/cloud_devops_app:77ecd38`).\n- Ran the container with `restart: unless-stopped`.\n- Exposed HTTP on port 80 mapped to container port 5000.\n- Added an Ansible health check against `/health`.\n\n### Stage 10 — Monitoring stack (Prometheus \u0026 Grafana)\n\nThis stage introduces full observability for both the infrastructure and the application.\n\n#### Part 1 — Node Exporter\n\n**What:**  \nDeployed Node Exporter on the application and monitoring servers.\n\n**Why:**  \nHost-level metrics (CPU, memory, disk, network) are essential for understanding system health and capacity.\n\n**How:**  \n- Installed Node Exporter via Docker using Ansible.\n- Metrics exposed on port `9100`.\n- Targets scraped via private IPs.\n\n---\n\n#### Part 2 — Prometheus\n\n**What:**  \nDeployed Prometheus on the monitoring server.\n\n**Why:**  \nPrometheus acts as the central metrics collection and storage system.\n\n**How:**  \n- Prometheus deployed via Docker using Ansible.\n- Configuration rendered from a template (`prometheus.yml`).\n- Scrapes:\n  - Node Exporter on app + monitoring servers\n  - Flask application metrics\n- Persistent data directory mounted on the host.\n\n---\n\n#### Part 3 — Grafana\n\n**What:**  \nDeployed Grafana for metrics visualization.\n\n**Why:**  \nMetrics are only useful if they can be explored and visualized effectively.\n\n**How:**  \n- Grafana deployed via Docker using Ansible.\n- Prometheus configured as a data source.\n- Access restricted to SSH port forwarding (no public exposure).\n- Imported **Node Exporter Full** dashboard (ID 1860).\n\n---\n\n#### Part 4 — Flask application metrics\n\n**What:**  \nExposed application metrics in Prometheus format.\n\n**Why:**  \nApplication-level observability enables insight into runtime behavior, performance, and stability.\n\n**How:**  \n- Added `/metrics` endpoint using `prometheus_client`.\n- Removed the earlier JSON-based metrics endpoint.\n- Prometheus scrapes the app at:\n  - `http://\u003capp_private_ip\u003e:80/metrics`\n- Metrics verified in Prometheus and visualized in Grafana.\n\n### Stage 11 — TLS certificates \u0026 reverse proxy (Caddy) + hardening\n\nThis stage secures the application with HTTPS and adds additional server hardening.\n\n#### Part 1 — Reverse proxy + HTTPS (Caddy)\n\n**What:**  \nDeployed Caddy on the application server to act as a reverse proxy and terminate TLS.\n\n**Why:**  \nHTTPS is required for production-like deployments. A reverse proxy enables secure traffic, clean routing, and allows the application container to stay private (localhost only).\n\n**How:**  \n- Opened inbound port 443 on the application firewall.\n- Deployed Caddy via Ansible using Docker (`network_mode: host`).\n- Configured Caddy to serve:\n  - `clouddevopslab.eu` and `www.clouddevopslab.eu` via HTTPS (Let’s Encrypt)\n  - private-IP HTTP access for Prometheus scraping\n- Added basic security headers in the Caddyfile.\n- Updated app deployment so the Flask container is bound to `127.0.0.1:5000` (not publicly reachable).\n\n#### Part 2 — Stage 11 hardening (Option A)\n\n**What:**  \nImplemented baseline security hardening for the environment.\n\n**Why:**  \nReduce attack surface and align with least-privilege and operational security practices.\n\n**How:**  \n- Restricted SSH access to the jump server using a Terraform allowlist (`ssh_allowed_ips`).\n- Installed and enabled Fail2ban on the jump server (`sshd` jail).\n- Enabled automatic security updates (`unattended-upgrades`) on all servers.\n- Moved Grafana admin password into **Ansible Vault** (no secrets stored in Git).\n\n### Access Model\n\n- Direct SSH access is allowed only to the jump server.\n- All internal servers are accessed via the jump server using SSH agent forwarding.\n- Ansible connects as a non-root `devops` user and escalates privileges only when required.\n- Root SSH login is fully disabled.\n- All access is performed via the non-root `devops` user with sudo escalation.\n\n#### DNS Flow\n\n- Domain registered at simply.com\n- Nameservers delegated to Cloudflare\n- DNS records managed in Cloudflare\n- Application traffic will later be proxied via Cloudflare\n\n#### Current Records\n\n- `clouddevopslab.eu` → A record → application server\n- `www.clouddevopslab.eu` → A record → application server\n\nAt this stage, DNS records exist and the application is reachable via HTTPS through Caddy. Cloudflare proxy is still disabled (DNS-only).\n\nNote: During early stages, application IP addresses may change when infrastructure\nis recreated. A reserved IPv4 address will be introduced later to provide a stable\nDNS target.\n\n## Learning Log\n\nA chronological log describing the work done in each stage.\n\n## Next Step\n\n- \u003cs\u003eProceed to Stage 2: Containerization With Docker, Where The Application Will Be Packaged Into A Production-Ready Container Image.\u003c/s\u003e\n- \u003cs\u003eProcced to Stage 3: CI/CD pipeline (GitHub Actions \u0026 GHCR Integration)\u003c/s\u003e\n- \u003cs\u003eProcced to Stage 4: Infrastructure (Terraform – servers, networking, firewalls)\u003c/s\u003e\n- \u003cs\u003eProcced to Stage 5: DNS \u0026 domain management (Cloudflare)\u003c/s\u003e\n- \u003cs\u003eProcced to Stage 6: Ansible bootstrap \u0026 access control\u003c/s\u003e\n- \u003cs\u003eProcced to Stage 7: SSH hardening\u003c/s\u003e\n- \u003cs\u003eProcced to Stage 8: Docker installation (via Ansible)\u003c/s\u003e\n- \u003cs\u003eProcced to Stage 9: Application deployment using Docker and GHCR\u003c/s\u003e\n- \u003cs\u003eProcced to Stage 10: Stage 10: Monitoring stack (Prometheus \u0026 Grafana)\u003c/s\u003e\n- \u003cs\u003eProceeded to Stage 11: TLS certificates \u0026 reverse proxy (Caddy)\u003c/s\u003e\n- Next: Stage 12: Cloudflare proxy + restrict origin access to Cloudflare IP ranges\n\nStage 11 will introduce HTTPS, automatic TLS certificates, and a reverse proxy\nin front of the application. This enables secure traffic, prepares the setup\nfor Cloudflare proxying, and allows stricter firewall rules on the application server.\n\n## Git Workflow \u0026 Conventions\n\nThis repository uses a simple branching and commit strategy to keep the history clean and understandable.\n\n### Branches\n\n- `main`  \n  Always deployable and represents the most stable state. Release tags will be created from this branch.\n\n- `develop`  \n  Integration branch for day-to-day work. Features, fixes and infrastructure changes are merged here before going to `main`.\n\n- Short-lived branches  \n  All work is done on short-lived branches and merged via pull requests:\n  - `feature/\u003cshort-description\u003e` – new functionality\n  - `fix/\u003cshort-description\u003e` – bug fixes\n  - `infra/\u003cshort-description\u003e` – infrastructure (Terraform, Ansible, etc.)\n  - `docs/\u003cshort-description\u003e` – documentation updates\n  - `ci/\u003cshort-description\u003e` – CI/CD pipeline changes\n\n  Examples:\n  - `feature/add-metrics-endpoint`\n  - `infra/add-terraform-app-server`\n  - `docs/update-readme-git-strategy`\n  - `ci/add-docker-build-workflow`\n\n### Commit Messages\n\nCommit messages follow a light version of Conventional Commits:\n\n`\u003ctype\u003e(\u003coptional-scope\u003e): \u003cshort summary\u003e`\n\nTypes used in this project:\n\n- `feat` – new features (app or infra)\n- `fix` – bug fixes\n- `docs` – documentation changes\n- `infra` – infrastructure code changes\n- `ci` – CI/CD configuration\n- `refactor` – code changes that don’t change behaviour\n- `chore` – maintenance tasks, formatting, small cleanups\n\nExamples:\n\n- `feat(api): add /metrics endpoint`\n- `docs(readme): document phase 1 (Flask app)`\n- `infra(terraform): create linode instances for app and monitoring`\n- `ci(docker): add image build and push workflow`\n\n## Infrastructure Changes\n\nAll infrastructure and configuration changes are performed via:\n\n- Terraform (provisioning)\n- Ansible (configuration)\n\nManual changes on servers are avoided to ensure reproducibility.\n\n## License\n\nTBD – Will be added later in the project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftysker%2Fcloud_devops_lab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftysker%2Fcloud_devops_lab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftysker%2Fcloud_devops_lab/lists"}