{"id":30087884,"url":"https://github.com/hadiuzzaman524/python-clean-architecture","last_synced_at":"2026-05-03T10:34:27.560Z","repository":{"id":306621657,"uuid":"1026270630","full_name":"hadiuzzaman524/python-clean-architecture","owner":"hadiuzzaman524","description":"A scalable COVID-19 ETL pipeline built with Python, Airflow, and BigQuery, following Clean Architecture and Domain-Driven Design principles. Designed for modularity, testability, and production-ready data workflows in a Dockerized environment.","archived":false,"fork":false,"pushed_at":"2025-07-27T04:52:41.000Z","size":2005,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-17T19:44:49.021Z","etag":null,"topics":["airflow","airflow-dags","bigquery","clean-architecture","clean-code","data-engineering","docker","domain-driven-design","etl-pipeline","postgresql","python","sqlalchemy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hadiuzzaman524.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-25T15:39:55.000Z","updated_at":"2025-07-27T04:52:45.000Z","dependencies_parsed_at":"2025-09-17T19:34:29.195Z","dependency_job_id":"679f19f7-5264-4aea-b713-da981c05438e","html_url":"https://github.com/hadiuzzaman524/python-clean-architecture","commit_stats":null,"previous_names":["hadiuzzaman524/python-clean-architecture"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hadiuzzaman524/python-clean-architecture","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadiuzzaman524%2Fpython-clean-architecture","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadiuzzaman524%2Fpython-clean-architecture/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadiuzzaman524%2Fpython-clean-architecture/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadiuzzaman524%2Fpython-clean-architecture/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hadiuzzaman524","download_url":"https://codeload.github.com/hadiuzzaman524/python-clean-architecture/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadiuzzaman524%2Fpython-clean-architecture/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32566444,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","airflow-dags","bigquery","clean-architecture","clean-code","data-engineering","docker","domain-driven-design","etl-pipeline","postgresql","python","sqlalchemy"],"created_at":"2025-08-09T04:01:56.954Z","updated_at":"2026-05-03T10:34:27.535Z","avatar_url":"https://github.com/hadiuzzaman524.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# COVID-19 ETL Pipeline\n\nA production-grade ETL pipeline for processing COVID-19 data from BigQuery public datasets, featuring robust data transformation and storage capabilities.\n\n---\n\n## 🏗️ Architecture Overview\n\nThis project is built using **Clean Architecture** and **Domain-Driven Design (DDD)** principles.  \n**Why?**  \n- **Separation of concerns:** Each layer has a clear responsibility, making code easier to maintain and extend.\n- **Testability:** Business logic is isolated from infrastructure, so you can write fast, reliable unit tests.\n- **Scalability:** You can swap out databases, APIs, or other integrations with minimal changes to core logic.\n- **Inspiration for developers:** This structure is ideal for learning how to build robust, production-grade data pipelines.\n\n**Layers:**\n- **Domain Layer:**  \n  Contains business logic, use cases, and value objects (Calculating COVID statistics, validating data). \n- **Data Layer:**  \n  Models, repositories, and data sources (Mapping BigQuery results to Python objects, saving records to PostgreSQL).\n- **Infrastructure Layer:**  \n  Integrations with external systems (BigQuery, PostgreSQL).\n- **Presentation Layer:**  \n  Orchestration, dependency injection, and entry points(Main pipeline runner, Airflow DAGs).\n\n---\n\n## 📊 Data Source\n\n- **BigQuery Public Dataset:**  \n  `bigquery-public-data.covid19_open_data.covid19_open_data`\n\n---\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.11+\n- PostgreSQL database (Docker)\n- Google Cloud credentials (for BigQuery access)\n\n### 1. Environment Setup\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\n```\n\n### 2. Database Setup\n\nRun PostgreSQL using Docker:\n\n```bash\ndocker run -d \\\n  --name my-postgres \\\n  -p 5432:5432 \\\n  -e POSTGRES_PASSWORD=your_db_password \\\n  postgres\n```\n\n### 3. Configuration\n\n**BigQuery Setup Instructions:**\n\n1. Go to [Google Cloud Console](https://console.cloud.google.com/).\n2. Create or select your project (e.g., `carbon-zone-466205-r5`).\n3. Navigate to **IAM \u0026 Admin \u003e Service Accounts**.\n4. Create a new service account (or use an existing one).\n5. Grant it the necessary BigQuery permissions (e.g., BigQuery Data Viewer).\n6. Click on the service account, go to **Keys**, and create a new key (JSON).\n7. Download the JSON file.\n8. **Place the downloaded JSON file inside your project's `config/` folder.**\n   - Example: `config/carbon-zone-466205-r5-baaa1a665c04.json`\n9. Make sure the path in your config matches the filename.\n\nThis will allow your pipeline to authenticate and access BigQuery data.\n\n\nRename `config/app_config.toml.sample` file to `config/app_config.toml` and\nedit your config in [config/app_config.toml](config/app_config.toml.sample):\n\n```toml\n[postgres]\nHOST = \"192.168.0.236\"  # Your IP\nUSERNAME = \"db_user_name\"\nPASSWORD = \"db_password\"\nPORT = \"5432\"\nDB_NAME = \"your_db_name\"\n\n[bigquery]\nPROJECT_ID = \"carbon-zone-4603345386205-r5\"\nSERVICE_ACCOUNT_FILEPATH = \"config/carbon-zone-4uyt205-r5-baaa1a665c04.json\"\n```\n\n### 4. Create Database Table\n\n```sql\nCREATE TABLE covid_daily_records (\n    date DATE NOT NULL,\n    country_code VARCHAR(10) NOT NULL,\n    country_name TEXT,\n    new_confirmed INTEGER,\n    new_deceased INTEGER,\n    cumulative_deceased INTEGER,\n    cumulative_tested INTEGER,\n    population_male INTEGER,\n    population_female INTEGER,\n    smoking_prevalence NUMERIC,\n    diabetes_prevalence NUMERIC,\n    PRIMARY KEY (country_code, date)\n);\n```\n\n### 5. Run ETL Pipeline Locally\n\n```bash\npython main.py --cron-name covid_data_orchestrator --start-date 2020-08-01 --end-date 2020-08-02\n```\n\n---\n\n## 🗓️ Scheduling with Airflow (Optional)\n\n### 1. Start Airflow Locally\n\n```bash\ndocker-compose up --build\n```\n\n- Access Airflow UI at [http://localhost:8080](http://localhost:8080/dags/covid_data_pipeline)\n- Login with default credentials (username: `airflow`, password: `airflow`)\n\n**🔍 Covid Data Pipeline Overview**\n\n![Airflow DAGs Demo 1](images/airflow_demo.png)\n\n**⏱️ Scheduled Execution**\n\n![Airflow DAGs Demo 2](images/airflow_demo_2.png)\n\n### 2. Disable Example DAGs in Airflow UI (Optional)\n\n- If example DAGs appear in your Airflow UI,just restart Airflow usig.\n  ```bash\n  docker-compose down\n  docker volume rm airflow-data-pipeline_postgres-db-volume\n  docker-compose up --build\n  ```\n---\n\n\n## 🧪 Testing\n\nRun all tests and check coverage:\n\n```bash\nPYTHONPATH=$(pwd) pytest --cov=etl_covid_19 --cov-report=term-missing test/\n```\n\n---\n\n## 📁 Project Structure\n\n```\netl_covid_19/\n├── service_locator.py\n├── data/\n│   ├── data_source/\n│   ├── model/\n│   └── repository/\n├── domain/\n│   ├── services/\n│   ├── use_cases/\n│   └── value_objects/\n├── infrastructure/\n│   ├── bigquery/\n│   └── database/\n├── presentation/\n│   ├── base_cron_job.py\n│   └── covid_data_orchestrator.py\nconfig/\n│   ├── app_config.toml\n│   ├── config_loader.py\n│   ├── config_model.py\n│   └── airflow.cfg\ndags/\n│   ├── covid_data_etl.py\n│   └── .airflowignore\ntest/\n├── data/\n│   ├── data_source/\n│   ├── model/\n│   └── repository/\n├── domain/\n│   ├── services/\n│   ├── use_cases/\n│   └── value_objects/\n├── infrastructure/\n│   ├── bigquery/\n│   └── database/\n├── presentation/\n│   ├── test_base_cron_job.py\n│   └── test_covid_data_orchestrator.py\n├── setup.py\n├── main.py\n├── docker-compose.yaml\n├── dockerfile\n....\n```\n\n---\n\n## 🌟 Why Follow This Project?\n\n- **Learn Clean Architecture \u0026 DDD:**  \n  See how to structure real-world data pipelines for maintainability and scalability.\n- **Production-Ready Patterns:**  \n  Use dependency injection, configuration management, and robust error handling.\n- **Easy Testing:**  \n  Write unit and integration tests with clear separation of logic.\n- **Airflow Integration:**  \n  Schedule and monitor ETL jobs with industry-standard tools.\n- **Inspiration:**  \n  Adopt best practices for your own data engineering projects.\n\n---\n\n## 📞 Support\n\n- 📧 Email: hadiuzzaman908@gmail.com\n- 🐛 Issues: [GitHub Issues](https://github.com/hadiuzzaman524/python-clean-architecture/issues)\n\n## License\n\nThis project is licensed under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.  \n© 2025 MD Hadiuzzaman. For learning purposes only. Commercial use is prohibited.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhadiuzzaman524%2Fpython-clean-architecture","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhadiuzzaman524%2Fpython-clean-architecture","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhadiuzzaman524%2Fpython-clean-architecture/lists"}