{"id":29029140,"url":"https://github.com/camilajaviera91/dag-first-approach","last_synced_at":"2026-04-13T09:31:29.178Z","repository":{"id":292565519,"uuid":"981282120","full_name":"CamilaJaviera91/dag-first-approach","owner":"CamilaJaviera91","description":"This project automates the extraction, transformation, and export of sales data from a PostgreSQL database, enhances the data with exchange rate information, and exports the results in CSV and Google Sheets formats. It uses a Directed Acyclic Graph (DAG) to manage task dependencies and execute them in order.","archived":false,"fork":false,"pushed_at":"2025-06-11T01:08:46.000Z","size":645,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-26T08:05:37.047Z","etag":null,"topics":["airflow","dag","dotenv","faker","googlesheets","gspread","gspread-dataframe","matplotlib-pyplot","network","oauth2","os","pandas","postgresql","psycopg2","python","random","requests","sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CamilaJaviera91.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-10T18:44:11.000Z","updated_at":"2025-06-11T01:08:50.000Z","dependencies_parsed_at":"2025-05-10T19:33:34.994Z","dependency_job_id":"51dcba5c-bb23-4248-ae85-770bd3b80f81","html_url":"https://github.com/CamilaJaviera91/dag-first-approach","commit_stats":null,"previous_names":["camilajaviera91/dag-first-approach"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CamilaJaviera91/dag-first-approach","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CamilaJaviera91%2Fdag-first-approach","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CamilaJaviera91%2Fdag-first-approach/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CamilaJaviera91%2Fdag-first-approach/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CamilaJaviera91%2Fdag-first-approach/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CamilaJaviera91","download_url":"https://codeload.github.com/CamilaJaviera91/dag-first-approach/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CamilaJaviera91%2Fdag-first-approach/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31746291,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T09:16:15.125Z","status":"ssl_error","status_checked_at":"2026-04-13T09:16:05.023Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","dag","dotenv","faker","googlesheets","gspread","gspread-dataframe","matplotlib-pyplot","network","oauth2","os","pandas","postgresql","psycopg2","python","random","requests","sql"],"created_at":"2025-06-26T08:05:36.144Z","updated_at":"2026-04-13T09:31:29.160Z","avatar_url":"https://github.com/CamilaJaviera91.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧭 Table of Contents\n\n- [💡 Project Description](#-project-description)\n- [🚀 Project Structure](#-project-structure)\n- [🧩 What This Project Does?](#-what-this-project-does)\n- [🛠️ Technologies Used](#️-technologies-used)\n- [🗂️ What's DAG?](#️-whats-dag)\n- [🚀 Installation and Execution](#-installation-and-execution)\n- [🧩 Sales ETL Pipeline](#-sales-etl-pipeline)\n- [🖼️ DAG Graph View](#️-dag-graph-view)\n- [🗓️ DAG Configuration](#️-dag-configuration)\n- [🗂️ Output Files](#️-output-files)\n- [✅ Sample Output](#-sample-output)\n- [❗ Troubleshooting](#-troubleshooting)\n- [📝 Notes](#-notes)\n- [📘 How to Add a DAG to Apache Airflow](#-how-to-add-a-dag-to-apache-airflow)\n- [🤝 Contributing](#-contributing)\n- [🗒️ Roadmap](#️-roadmap)\n- [📧 Questions?](#-questions)\n- [👩‍💻 Author](#-author)\n- [📚 Useful Resources](#-useful-resources)\n- [✨ Coding Standards and Best Practices](#-coding-standards-and-best-practices)\n- [📄 License](#-license)\n- [📚 Wiki](#-wiki)\n\n---\n\n## 🧪 Test Badge\n\n[![Python Tests](https://github.com/CamilaJaviera91/dag-first-approach/actions/workflows/python-tests.yml/badge.svg)](https://github.com/CamilaJaviera91/dag-first-approach/actions/workflows/python-tests.yml)\n\n---\n\n## 🧪 Continuous Integration (CI)\n\nThis project uses [GitHub Actions](https://github.com/features/actions) to automatically run tests on every push.\n\nCI file: [python-tests.yml](.github/workflows/python-tests.yml)\n\n✅ Ensures code quality  \n✅ Prevents regressions\n\n---\n\n# 💡 DAG-Based ETL Pipeline for Sales Reporting\n\n## 🧠 Project Description:\n\nThis project automates the **extraction**, **transformation**, and **export** of sales data using `Apache Airflow`. It pulls **data** from a **PostgreSQL** database, enriches it with **USD to CLP exchange rate** information, and **exports** the final dataset to both a **CSV file** and a **Google Sheet**.\n\n\u003cbr\u003e\n\nThe **pipeline** is designed as a **Directed Acyclic Graph** (`DAG`) to manage task dependencies and ensure a reliable and repeatable workflow.\n\n---\n\n## 🚀 Project Structure:\n\n```bash\ndag-first-approach/\n├── project_airflow_etl/\n\n│   ├── dags/\n│   │   └── etl_sales_report.py       # Airflow DAG definition\n│   ├── data/\n│   │   ├── dag.csv\n│   │   ├── report.csv                # Final report file\n│   │   └── sales.png                 # Yearly sales\n│   ├── logs/                         # Airflow logs\n│   ├── plugins/                      # Custom Airflow plugins\n│   ├── src/\n│   │   └── etl_modules/              # ETL module scripts\n│   │       ├── __init__.py\n│   │       ├── connection.py\n│   │       ├── enrich.py\n│   │       ├── export.py\n│   │       ├── extract.py\n│   │       ├── generate_sales_plot.py\n│   │       ├── google_sheets.py\n│   │       └── usd_to_clp.py\n│   ├── test/                         # For testing on Github\n│   │   ├── test_connection.py\n│   │   ├── test_enrich.py\n│   │   ├── test_export.py\n│   │   ├── test_extract.py\n│   │   ├── test_generate_sales_plot.py\n│   │   ├── test_google_sheets.py\n│   │   └── test_usd_to_clp.py\n│   ├── airflow.cfg                   # Airflow configuration file\n│   ├── airflow.db                    # Airflow database (SQLite for local use)\n│   ├── docker-compose.yaml           # Docker setup for Airflow\n│   ├── requirements.txt              # Python dependencies\n└── README.md\n```\n\n---\n\n## 🧩 What This Project Does?:\n\n- **Extracts** data from a PostgreSQL database using a custom SQL query.\n\n- **Fetches** the current USD to CLP exchange rate from a public API.\n\n- **Enriches** the data by converting sales totals from USD to CLP.\n\n- **Exports** the final dataset:\n\n    - as a CSV file (`report.csv`)\n\n    - to a Google Sheet\n\n---\n\n## 🛠️ Technologies Used:\n\n- Python\n\n- Apache Airflow\n\n- PostgreSQL\n\n- Google Sheets API\n\n- Docker (via `docker-compose`)\n\n- Pandas, Requests, Matplotlib\n\n---\n\n## 🗂️ What's DAG?:\n\nA **Directed Acyclic Graph (DAG)** is a graph where:\n\n1. **Directed:** All edges have a direction (from one node to another)\n\n2. **Acyclic:** No cycles exist—you can’t loop back to a previous node\n\n---\n\n### 📋 Common Uses of DAGs:\n\n- Task scheduling (e.g., Airflow, build systems like Make)\n\n- Version control systems (`e.g., Git`)\n\n- Data processing pipelines\n\n- Compilers and expression trees\n\n---\n\n## 🚀 Installation and Execution:\n\n1. Clone the repository:\n\n```\ngit clone https://github.com/CamilaJaviera91/dag-first-approach.git\ncd dag-first-approach\n```\n\n2. Create a Virtual Environment:\n\n```\npython3 -m venv venv\nsource venv/bin/activate # On Windows: venv\\Scripts\\activate\n```\n\n3. Install the required dependencies:\n\n```\npip install -r requirements.txt\n```\n\n4. Configure Environment Variables:\n\nCreate a `.env` file in the root directory and add the following:\n\n```\nDB_HOST=your_database_host\nDB_PORT=your_database_port\nDB_NAME=your_database_name\nDB_USER=your_database_user\nDB_PASSWORD=your_database_password\nDB_SCHEMA=your_database_schema #optional\n\nGOOGLE_SHEET_ID=your_google_sheet_id\nGOOGLE_SERVICE_ACCOUNT_FILE=path/to/your/service_account.json\n```\n\n5. Initialize the Airflow Database:\n\n```\nairflow db init\n```\n\n6. Set Up Google Sheets API\n\n    - Follow this [guide](https://developers.google.com/workspace/sheets/api/quickstart/python?hl=es-419) to:\n\n        1. Create a project in Google Developers Console.\n\n        2. Enable the **Google Sheets API** and **Google Drive API**.\n\n        3. Download the service account JSON credentials\n\n        4. Set the path to this file in `GOOGLE_CREDENTIALS_PATH`.\n\n    - Make sure to share your target Google Sheet with the service account email.\n\n7. Start Postgres Services:\n\n```\nsudo systemctl start postgresql\n```\n\n8. Start Airflow Services:\n\n```\nairflow webserver --port 8080\nairflow scheduler\n```\n\n9. Access the Airflow Web Interface:\n\nNavigate to http://localhost:8080 in your web browser.\n\n---\n\n## 🧩 Sales ETL Pipeline\n\nThis project defines an Apache Airflow DAG that automates a complete ETL process:\n\n- 📥 Extracts sales data from a PostgreSQL database.\n\n- 💱 Fetches the current USD to CLP exchange rate.\n\n- 🧪 Enriches the data by converting USD totals into CLP.\n\n- 📊 Generates a sales plot by year.\n\n- 💾 Exports the enriched data to a CSV file.\n\n- ☁️ Sends the data to Google Sheets for easy access.\n\n---\n\n## 🖼️ DAG Graph View\n\nThis is the task flow as represented in Airflow:\n\n![DAG Screenshot](project_airflow_etl/data/dag.png)\n\n---\n\n## 🗓️ DAG Configuration\n\n| Parameter      | Value           |\n| -------------- | --------------- |\n| **DAG ID**     | `sales_etl_dag` |\n| **Schedule**   | `@daily`        |\n| **Catchup**    | `False`         |\n| **Start Date** | `2024-01-01`    |\n| **Owner**      | `Camila`        |\n\n---\n\n## 🗂️ Output Files:\n\n- data/report.csv\n\n- data/sales.png\n\n- Google Spreadsheet: `Sales Report → ReportSheet`\n\n---\n\n## ✅ Sample Output:\n\n### 📊 sales.png\n\n![sales](project_airflow_etl/data/sales.png)\n\n### 📂 report.csv \n\n| year | store          | total       | total_clp     |\n|------|----------------|-------------|---------------|\n| 2020 | Teno-3\t        |1,292,370.99 | 1,219,364,953 |\n| 2020 | Cauquenes-5\t|1,298,515.67 | 1,225,162,520 |\n| 2020 | Villa Alegre-2\t|1,325,040.86 | 1,250,189,302 |\n| 2020 | Longaví-9      |1,353,795.29 | 1,277,319,394 |\n| 2020 | Constitución-4 |1,353,981.94 | 1,277,495,500 |\n\n### 📄 Sales Report (GoogleSheets)\n\n- [Link to Sales Report](https://docs.google.com/spreadsheets/d/1PszowXv_7GFDleYF1D-4mhRgbrHmDlGkVTWiCCJm2lk/edit?gid=435698635#gid=435698635)\n\n---\n\n## ❗Troubleshooting:\n\n- **Connection Errors:** Check your database credentials and network access.\n\n- **Google Sheets Permissions:** Make sure the service account has access to edit the target sheet.\n\n- **Missing Environment Variables:** Ensure `.env` is properly set and loaded.\n\n---\n\n## 📝 Notes:\n\n- Ensure the database is accessible and credentials are valid\n\n- The service account must have permission to edit the target Google Sheet\n\n- You can customize the SQL query, filenames, and sheet names\n\n---\n\n## 📘 How to Add a DAG to Apache Airflow and Display It in the Webserver\n\nFollow these steps to add your `DAG` to Apache Airflow and make it visible in the Airflow web interface.\n\n1. 📂 **Place Your DAG in the dags Directory**\n\nAirflow loads DAGs from a specific folder, typically located at:\n\n```\n~/airflow/dags/\n```\n\n- If you've changed the path in your `airflow.cfg` (`dags_folder`), use that custom directory instead.\n\n2. 📝 **Create Your DAG File**\n\nCreate a new Python file inside the dags folder. For example:\n\n```\n~/airflow/dags/my_example_dag.py\n```\n\n3. 🔁 **Restart Airflow Services**\n\nAfter placing your DAG file, restart the Airflow scheduler and webserver:\n\n```\nairflow scheduler\nairflow webserver\n```\n\n4. 🌐 **Open the Airflow Web UI**\n\nVisit the Airflow UI in your browser:\n\n```\nhttp://localhost:8080\n```\n\n- You should see your DAG (`my_example_dag`) listed. Enable it and trigger it as needed.\n\n### 🛠️ Troubleshooting in how to Add a DAG to Apache Airflow\n\nIf your DAG doesn't appear:\n\n- ✅ Ensure the file ends with .py\n\n- ✅ Make sure dag_id is unique and the syntax is valid\n\n- ✅ Confirm it's located in the correct dags_folder\n\n- ✅ Check the Airflow scheduler logs for errors:\n\n```\nairflow scheduler --log-level INFO\n```\n\n---\n\n## 🤝 Contributing:\n\nContributions are welcome! Please follow these steps:\n\n- Fork the repository.\n\n- Create a new branch: `git checkout -b feature/YourFeatureName`\n\n- Commit your changes: `git commit -m 'Add some feature'`\n\n- Push to the branch: `git push origin feature/YourFeatureName`\n\n- Open a pull request.\n\n---\n\n## 🗒️ Roadmap\n\n- [x] Extraction from PostgreSQL.\n- [x] Enrichment with exchange rate.\n- [x] Export to CSV and Google Sheets.\n- [ ] Integration with other data sources (e.g., S3, external APIs).\n- [ ] Real-time dashboard visualization.\n\n---\n\n## 📧 Questions?:\n\nIf you get stuck or need help customizing the pipeline, feel free to open an issue or reach out!\n\n---\n\n## 👩‍💻 Author:\n\n**Camila Javiera Muñoz Navarro**  \n[🔗 LinkedIn](https://www.linkedin.com/in/camilajmn/)  \n[🐙 GitHub](https://github.com/CamilaJaviera91)\n\n---\n\n## 📚 Useful Resources\n\n- [Apache Airflow Docs](https://airflow.apache.org/docs/)\n- [Google Sheets API Python](https://developers.google.com/sheets/api/quickstart/python)\n- [Docker Compose for Airflow](https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml)\n\n---\n\n## ✨ Coding Standards and Best Practices\n\n- Modular structure for maintainability\n- Separation of concerns in ETL steps\n- CI with unit tests using `pytest`\n- Secure use of `.env` for credentials\n- Logging and exception handling in DAGs\n\n---\n\n## 📄 License:\n\n- This project is licensed under the **MIT License**. \n\n- ⚠️ **Note:** Never commit your `.env` or `service_account.json` file. Use `.gitignore` and GitHub secrets for CI/CD.\n\n\n---\n\n## 📚 Wiki\n\n- [Extended usage guide](https://github.com/CamilaJaviera91/dag-first-approach/wiki)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamilajaviera91%2Fdag-first-approach","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcamilajaviera91%2Fdag-first-approach","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamilajaviera91%2Fdag-first-approach/lists"}