{"id":50329141,"url":"https://github.com/tuni56/aws-data-lake-workshop","last_synced_at":"2026-05-29T08:32:30.726Z","repository":{"id":350332528,"uuid":"1206344370","full_name":"tuni56/aws-data-lake-workshop","owner":"tuni56","description":"Workshop nivel 200 en la consola de AWS: construí un Data Lake serverless con Amazon S3, AWS Glue y Amazon Athena. Paso a paso en español, dataset de ventas incluido.","archived":false,"fork":false,"pushed_at":"2026-04-09T22:09:15.000Z","size":24,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-09T23:21:48.936Z","etag":null,"topics":["amazon-athena","amazon-s3","aws","aws-glue","aws-workshop","cloud-computing","data-analytics","data-engineering","data-lake","datalake","espanol","etl","parquet","serverless","sql","workshop"],"latest_commit_sha":null,"homepage":"","language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tuni56.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-09T20:26:54.000Z","updated_at":"2026-04-09T22:09:21.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/tuni56/aws-data-lake-workshop","commit_stats":null,"previous_names":["tuni56/aws-data-lake-workshop"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/tuni56/aws-data-lake-workshop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuni56%2Faws-data-lake-workshop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuni56%2Faws-data-lake-workshop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuni56%2Faws-data-lake-workshop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuni56%2Faws-data-lake-workshop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tuni56","download_url":"https://codeload.github.com/tuni56/aws-data-lake-workshop/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuni56%2Faws-data-lake-workshop/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33644305,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon-athena","amazon-s3","aws","aws-glue","aws-workshop","cloud-computing","data-analytics","data-engineering","data-lake","datalake","espanol","etl","parquet","serverless","sql","workshop"],"created_at":"2026-05-29T08:32:29.397Z","updated_at":"2026-05-29T08:32:30.717Z","avatar_url":"https://github.com/tuni56.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🏗️ Data Lake en AWS con S3, Glue y Athena\n\n**Workshop nivel 200 — AWS Console**\n\nAprende a construir un Data Lake serverless en AWS usando S3 como almacenamiento, Glue para catalogar y transformar datos, y Athena para consultarlos con SQL estándar — sin gestionar ningún servidor.\n\n---\n\n## 🎯 ¿Qué vas a construir?\n\n```mermaid\nflowchart TD\n    subgraph Storage[\"🗄️ Amazon S3 — Data Lake Storage\"]\n        RAW[\"raw/sales/\\nsales.csv\"]\n        CURATED[\"curated/sales/\\n*.parquet\"]\n        RESULTS[\"athena-results/\"]\n    end\n\n    subgraph Catalog[\"🗂️ AWS Glue\"]\n        CRAWLER[\"Crawler\\nDescubre esquema\"]\n        DC[\"Data Catalog\\nsales_db\"]\n        JOB[\"ETL Job\\nCSV → Parquet\"]\n    end\n\n    CSV[\"📄 sales.csv\"] --\u003e|Upload| RAW\n    RAW --\u003e CRAWLER --\u003e DC\n    RAW --\u003e JOB --\u003e CURATED --\u003e DC\n    DC --\u003e ATHENA[\"💬 Amazon Athena\\nSQL serverless\"]\n    CURATED --\u003e ATHENA\n    ATHENA --\u003e RESULTS\n```\n\n\u003e 📐 [Ver diagrama completo de arquitectura](assets/architecture.md)\n\n---\n\n## 📋 Pre-requisitos\n\n- Cuenta de AWS activa\n- Acceso a la consola de AWS\n- Permisos para crear recursos en S3, Glue, Athena e IAM\n- Conocimientos básicos de SQL\n\n\u003e ⚠️ **Costos estimados:** Este workshop usa servicios con capa gratuita. El costo total es menor a **$1 USD** si haces cleanup al finalizar.\n\n---\n\n## 📚 Módulos\n\n| # | Módulo | Tiempo |\n|---|--------|--------|\n| [01](workshop/01-setup.md) | Setup inicial e IAM | 10 min |\n| [02](workshop/02-s3-data-lake.md) | S3: estructura del Data Lake | 10 min |\n| [03](workshop/03-glue-catalog.md) | Glue: Crawler y Data Catalog | 20 min |\n| [04](workshop/04-athena-queries.md) | Athena: consultas SQL | 15 min |\n| [05](workshop/05-cleanup.md) | Cleanup de recursos | 5 min |\n| [🏠 Bonus](workshop/bonus-particionado.md) | Particionado en S3 *(para hacer en casa)* | 25 min |\n\n**Duración total estimada: ~1 hora** *(+ 25 min bonus opcional)*\n\n---\n\n## 🗂️ Estructura del repositorio\n\n```\naws-data-lake-workshop/\n├── README.md\n├── presentation/\n│   └── data-lake-workshop.md\n├── workshop/\n│   ├── 01-setup.md\n│   ├── 02-s3-data-lake.md\n│   ├── 03-glue-catalog.md\n│   ├── 04-athena-queries.md\n│   └── 05-cleanup.md\n├── data/\n│   └── sales.csv\n└── assets/\n```\n\n---\n\n## 🚀 Empezar\n\nVe al [Módulo 01 → Setup](workshop/01-setup.md)\n\n---\n\n## 📎 Recursos adicionales\n\n- [Documentación Amazon S3](https://docs.aws.amazon.com/s3/)\n- [Documentación AWS Glue](https://docs.aws.amazon.com/glue/)\n- [Documentación Amazon Athena](https://docs.aws.amazon.com/athena/)\n- [AWS Well-Architected Framework](https://aws.amazon.com/architecture/well-architected/)\n\n---\n\n*Workshop creado para la comunidad AWS hispanohablante 🌎*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuni56%2Faws-data-lake-workshop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftuni56%2Faws-data-lake-workshop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuni56%2Faws-data-lake-workshop/lists"}