{"id":21262994,"url":"https://github.com/ddzikri/mini-project","last_synced_at":"2026-05-15T21:36:54.729Z","repository":{"id":236877654,"uuid":"793168865","full_name":"ddzikri/mini-project","owner":"ddzikri","description":"Mini Project Data Engineer at Alterra Academy","archived":false,"fork":false,"pushed_at":"2024-05-31T07:00:11.000Z","size":10721,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-22T18:40:31.213Z","etag":null,"topics":["cleaning-data","dataset","etl-pipeline","firebase","gcp"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ddzikri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-28T16:12:24.000Z","updated_at":"2025-01-23T01:28:12.000Z","dependencies_parsed_at":"2024-05-31T08:23:33.589Z","dependency_job_id":"ea056a69-9162-45a5-9811-d167a587aa99","html_url":"https://github.com/ddzikri/mini-project","commit_stats":null,"previous_names":["ddzikri/mini-project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ddzikri/mini-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddzikri%2Fmini-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddzikri%2Fmini-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddzikri%2Fmini-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddzikri%2Fmini-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ddzikri","download_url":"https://codeload.github.com/ddzikri/mini-project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddzikri%2Fmini-project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33080777,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-15T20:25:35.270Z","status":"ssl_error","status_checked_at":"2026-05-15T20:25:34.732Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cleaning-data","dataset","etl-pipeline","firebase","gcp"],"created_at":"2024-11-21T04:59:55.287Z","updated_at":"2026-05-15T21:36:54.694Z","avatar_url":"https://github.com/ddzikri.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Extract, Transform, Load (ETL) Pipeline And Visualization\r\n\r\n## About Project\r\nMelakukan proses ETL kepada data Social-Economic Countries yang merupakan kumpulan data dari tahun 1960 hingga saat ini yang mencakup informasi ekonomi dan sosial dari berbagai negara di seluruh dunia. \r\n\r\n## Tech Stacks\r\nDaftar tools dan framework yang digunakan dalam project ini:\r\n- Python\r\n- Library python (pandas etc)\r\n- Vscode\r\n- Github\r\n- Jupyter Notebook\r\n- Firebase\r\n- MySQL\r\n- API World Bank\r\n- XML\r\n- JSON \r\n- CSV\r\n- DB\r\n- others\r\n\r\n## Architecture Diagram\r\n ![ETL Diagram](https://github.com/ddzikri/mini-project/blob/main/ETL_DIAGRAM.png?raw=true)\r\n\r\n## Setup \r\n### Langkah 1: Persiapan Lingkungan\r\nPastikan Anda telah menginstal Tools dan Library yang diperlukan:\r\n- Install python dan library sesuai kebutuhan.\r\n- Config Jupyter Notebook.\r\n- Config Firebase Admin.\r\n\r\n### Langkah 2: Ekstraksi Data\r\n1. Unduh file data dari [Link Berikut](https://github.com/yudhaislamisulistya/mini-project-de-alta).\r\n2. Kumpulkan data berupa csv, xml, api, db, xml dll dalam satu folder.\r\n3. Buka Jupyter Notebook.\r\n4. Import Library python yang diperlukan.\r\n5. Buat code Python menggunakan pandas, agar dataset yang di extract menjadi dataframe.\r\n\r\n### Langkah 3: Transformasi Data\r\n1. Cleaning Data\r\n    - Mengatasi Missing Values\r\n    - Menghapus Duplikasi Data\r\n    - Replace and Regex\r\n\r\n2. Penyesuaian Tipe data\r\n3. Drop Kolom yang tidak Diperlukan\r\n4. Imputasi pada gdp_data menggunakan teknik simpleimputer\r\n5. Menghapus Outliers\r\n6. Scaling fitur\r\n7. Buat dummy variabel jika ada dalam dataset\r\n8. Feature Engineering\r\n9. Menggabungkan dataset agar menjadi dataset final.\r\n\r\n### Langkah 4: Load Data\r\n1. Google Firebase Admin  \r\n    - Buat Script python untuk menyimpan data final kedalam firebase admin ketika muatan data sangat besar.\r\n2. WorkbenchMySQl\r\n    - Buat Script python untuk menyimpan data final kedalam database workbenchMySQl (local file) ketika muatan data kecil.\r\n\r\n\r\n### Langkah 5: Visualisasi Data\r\n1. Buat script prompt Implementasi AI untuk analisis visualisasi data (optional).\r\n2. Menggunakan Matplotlib, Seaborn dan plotly express agar tampilan visualisasi menarik.\r\n\r\n### Langkah 6: Apache Airflow WSL Ubuntu \r\n1. Install Apache airflow di WSL ubuntu. \r\n2. Buat Folder Dags dan masukan file yang diperlukan.\r\n3. Jalankan Airflow with command 'airflow standalone'\r\n4. Liat hasilnya dags di localhost:8080.\r\n\r\n### Kata-kata Terakhir\r\nJangan patah semangat!!!. \r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddzikri%2Fmini-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fddzikri%2Fmini-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddzikri%2Fmini-project/lists"}