{"id":48768514,"url":"https://github.com/mkaspulanwar/p7_bigdata_machine_learning","last_synced_at":"2026-04-13T09:01:48.435Z","repository":{"id":350969270,"uuid":"1208970622","full_name":"mkaspulanwar/p7_bigdata_machine_learning","owner":"mkaspulanwar","description":"Praktikum Big Data Week 7: Implementasi Machine Learning menggunakan Random Forest untuk prediksi traffic Smart City AI dengan pipeline data terintegrasi dan dashboard interaktif berbasis Streamlit.","archived":false,"fork":false,"pushed_at":"2026-04-13T03:05:55.000Z","size":530518,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-13T03:23:37.095Z","etag":null,"topics":["big-data","data-pipeline","random-forest","streamlit","time-series","traffic-prediction"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkaspulanwar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-13T01:04:16.000Z","updated_at":"2026-04-13T03:06:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mkaspulanwar/p7_bigdata_machine_learning","commit_stats":null,"previous_names":["mkaspulanwar/p7_bigdata_machine_learning"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/mkaspulanwar/p7_bigdata_machine_learning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkaspulanwar%2Fp7_bigdata_machine_learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkaspulanwar%2Fp7_bigdata_machine_learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkaspulanwar%2Fp7_bigdata_machine_learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkaspulanwar%2Fp7_bigdata_machine_learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkaspulanwar","download_url":"https://codeload.github.com/mkaspulanwar/p7_bigdata_machine_learning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkaspulanwar%2Fp7_bigdata_machine_learning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31746113,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T06:26:45.479Z","status":"ssl_error","status_checked_at":"2026-04-13T06:26:44.645Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-pipeline","random-forest","streamlit","time-series","traffic-prediction"],"created_at":"2026-04-13T09:01:44.355Z","updated_at":"2026-04-13T09:01:48.430Z","avatar_url":"https://github.com/mkaspulanwar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"﻿# Praktikum Big Data Week 7: Machine Learning untuk Prediksi Traffic (Smart City AI)\n\n![Python](https://img.shields.io/badge/Python-3.12-blue?logo=python\u0026logoColor=white)\n![Scikit-Learn](https://img.shields.io/badge/Scikit--Learn-RandomForest-F7931E?logo=scikitlearn\u0026logoColor=white)\n![Streamlit](https://img.shields.io/badge/Streamlit-Interactive%20Dashboard-FF4B4B?logo=streamlit\u0026logoColor=white)\n![Pandas](https://img.shields.io/badge/Pandas-Data%20Processing-150458?logo=pandas\u0026logoColor=white)\n![Joblib](https://img.shields.io/badge/Model-Serialized%20with%20Joblib-2E7D32)\n![Smart City](https://img.shields.io/badge/Use%20Case-Traffic%20Prediction-1565C0)\n\n## Tim Praktikum\n\n| Peran | Nama | NIM | Profil GitHub |\n| :--- | :--- | :--- | :--- |\n| **Pengembang Proyek** | M. Kaspul Anwar | 230104040212 | [![](https://img.shields.io/badge/GitHub-mkaspulanwar-181717?style=flat\u0026logo=github)](https://github.com/mkaspulanwar) |\n| **Dosen Pengampu** | Muhayat, M. IT | - | [![](https://img.shields.io/badge/GitHub-muhayat--lab-181717?style=flat\u0026logo=github)](https://github.com/muhayat-lab) |\n\n---\n\n## Ringkasan Praktikum Week 7\n\nPraktikum Week 7 berfokus pada implementasi **Machine Learning untuk Prediksi Traffic Smart City** sebagai kelanjutan dari Week 6 (real-time analytics dan visualisasi skala besar).\n\nPada Week 6, sistem sudah mampu melakukan ingestion, processing, dan visualisasi data transportation secara near real-time. Pada Week 7, proyek diperluas dengan lapisan **predictive analytics** untuk memprediksi kepadatan traffic menggunakan model machine learning.\n\n## Tujuan Praktikum\n\n1. Menyiapkan data traffic agar layak untuk training model.\n2. Membangun baseline model regresi untuk memprediksi jumlah kendaraan.\n3. Mengintegrasikan model ke dashboard interaktif untuk inferensi cepat.\n4. Mendokumentasikan alur end-to-end dari data mentah sampai prediksi.\n5. Menjaga kesinambungan arsitektur Week 6 ke Week 7.\n\n## Cakupan Fitur Week 7\n\n1. **Data cleaning pipeline** untuk dataset traffic smart city.\n2. **Feature engineering berbasis waktu** (`hour`, `day`) dan fitur historis (`lag1`).\n3. **Model training** menggunakan `RandomForestRegressor`.\n4. **Model persistence** ke artifact `models/traffic_model_v1.pkl`.\n5. **Dashboard prediksi** berbasis Streamlit untuk eksplorasi metrik dan simulasi prediksi.\n\n## Arsitektur Sistem (Week 6 -\u003e Week 7)\n\n```mermaid\nflowchart LR\n    A[\"Raw Traffic Dataset CSV\"] --\u003e B[\"Data Cleaning (Pandas)\"]\n    B --\u003e C[\"Clean Dataset (CSV)\"]\n    C --\u003e D[\"Feature Engineering (hour, day, lag1)\"]\n    D --\u003e E[\"Model Training (RandomForestRegressor)\"]\n    E --\u003e F[\"Model Artifact (.pkl)\"]\n    C --\u003e G[\"Traffic Dashboard (Streamlit)\"]\n    F --\u003e G\n    G --\u003e H[\"Interactive Prediction \u0026 Monitoring\"]\n\n    I[\"Week 6 Transportation Streaming\"] --\u003e G\n```\n\n## Struktur Project (Terbaru)\n\n```bash\nbigdata-project/\n├── .venv/                                         # Virtual environment lokal\n├── alerts/                                        # Modul alert untuk use case transportation\n│   ├── __init__.py\n│   └── transportation_alert.py                    # Rule-based alert (traffic/fare)\n├── analytics/                                     # Modul analytics \u0026 machine learning\n│   ├── __init__.py\n│   ├── transportation_analytics.py                # KPI, trend, anomaly detection (Week 6)\n│   └── traffic_ml_model_v1.py                     # Training model prediksi traffic (Week 7)\n├── dashboard/                                     # Aplikasi dashboard Streamlit\n│   ├── dashboard_streamlit.py                     # Dashboard real-time e-commerce\n│   ├── dashboard_transportation.py                # Dashboard decision-oriented transportation\n│   └── traffic_dashboard_v1.py                    # Dashboard prediksi traffic (Week 7)\n├── data/\n│   ├── checkpoints/                               # Spark streaming checkpoint\n│   │   └── transportation/\n│   ├── clean/                                     # Data hasil cleaning\n│   │   └── traffic_smartcity_clean_v1.csv\n│   ├── curated/                                   # Data agregasi bisnis (Week 6)\n│   ├── raw/\n│   │   ├── ecommerce_raw.csv                      # Dataset mentah utama batch\n│   │   └── traffic_smartcity_v1.csv               # Dataset traffic smart city (Week 7)\n│   └── serving/                                   # Data siap konsumsi dashboard\n│       ├── avg_transaction/\n│       ├── category_revenue/\n│       ├── stream/                                # Output streaming e-commerce\n│       ├── top_products/\n│       ├── total_revenue/\n│       └── transportation/                        # Output streaming transportation\n├── logs/\n│   ├── batch_pipeline.log                         # Log proses batch pipeline\n│   └── stream_checkpoint/                         # Checkpoint streaming e-commerce\n├── models/\n│   └── traffic_model_v1.pkl                       # Artifact model prediksi traffic (Week 7)\n├── screenshots/                                   # Screenshot dokumentasi hasil praktikum\n│   ├── struktur_project.png\n│   ├── scripts_cleaning.png\n│   ├── data_cleaning_selesai.png\n│   ├── scripts_modeling.png\n│   ├── model_berhasil_disimpan.png\n│   ├── scripts_dashboard.png\n│   ├── dashboard_berjalan.png\n│   ├── dashboard_1.png\n│   ├── dashboard_2.png\n│   └── nilai_prediksi.png\n├── scripts/                                       # Pipeline utama praktikum\n│   ├── analytics_layer.py                         # Analytics + serving layer (e-commerce)\n│   ├── batch_pipeline_enterprise.py               # Batch processing pipeline\n│   ├── streaming_layer.py                         # Streaming ingestion e-commerce\n│   ├── transaction_generator.py                   # Generator transaksi e-commerce\n│   ├── traffic_data_cleaning_v1.py                # Data cleaning traffic (Week 7)\n│   └── transportation/\n│       ├── streaming_trip_layer.py                # Streaming ingestion transportation\n│       └── trip_generator.py                      # Generator trip transportation\n├── stream_data/                                   # Input simulasi data streaming\n│   └── transportation/\n├── .gitignore\n├── CONTRIBUTING.md\n├── LICENSE\n└── README.md\n```\n\n## Bukti Screenshots\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eStruktur Project\u003c/b\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eScripts Cleaning\u003c/b\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/struktur_project.png\"/\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/scripts_cleaning.png\"/\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eData Cleaning Selesai\u003c/b\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eScripts Modeling\u003c/b\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/data_cleaning_selesai.png\"/\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/scripts_modeling.png\"/\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eModel Berhasil Disimpan\u003c/b\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eScripts Dashboard\u003c/b\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/model_berhasil_disimpan.png\"/\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/scripts_dashboard.png\"/\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eDashboard Berjalan\u003c/b\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eDashboard 1\u003c/b\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/dashboard_berjalan.png\"/\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/dashboard_1.png\"/\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eDashboard 2\u003c/b\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003cb\u003eNilai Prediksi\u003c/b\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/dashboard_2.png\"/\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cimg src=\"screenshots/nilai_prediksi.png\"/\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n## Dataset Week 7\n\nSumber dataset utama berada di:\n\n- `data/raw/traffic_smartcity_v1.csv`\n\nRingkasan data:\n\n1. Jumlah baris: **168 records**.\n2. Rentang waktu: **2025-01-01 00:00:00 sampai 2025-01-07 23:00:00**.\n3. Frekuensi: **per jam (hourly)**.\n\nSkema kolom:\n\n| Kolom | Tipe | Deskripsi |\n| :--- | :--- | :--- |\n| `datetime` | datetime | Timestamp observasi traffic |\n| `traffic` | int/float | Jumlah kendaraan pada timestamp tersebut |\n\n## Penjelasan Alur Week 7 (End-to-End)\n\n### 1) Data Cleaning\n\nScript: `scripts/traffic_data_cleaning_v1.py`\n\nProses utama:\n\n1. Membaca data mentah dari `data/raw/traffic_smartcity_v1.csv`.\n2. Parsing kolom `datetime` menjadi tipe waktu.\n3. Sort data berdasarkan waktu.\n4. Menghapus baris bernilai null (`dropna`).\n5. Menyimpan output ke `data/clean/traffic_smartcity_clean_v1.csv`.\n\nJalankan:\n\n```bash\npython scripts/traffic_data_cleaning_v1.py\n```\n\n### 2) Modeling Machine Learning\n\nScript: `analytics/traffic_ml_model_v1.py`\n\nProses utama:\n\n1. Membaca data bersih.\n2. Membuat fitur turunan:\n   - `hour` = jam (0-23)\n   - `day` = indeks hari (0-6)\n   - `lag1` = traffic satu periode sebelumnya\n3. Menentukan target `y = traffic`.\n4. Training model `RandomForestRegressor` (baseline).\n5. Menyimpan model ke `models/traffic_model_v1.pkl`.\n\nJalankan:\n\n```bash\npython analytics/traffic_ml_model_v1.py\n```\n\n### 3) Dashboard Prediksi\n\nScript: `dashboard/traffic_dashboard_v1.py`\n\nFitur dashboard:\n\n1. Menampilkan KPI sederhana (`Avg Traffic`, `Max Traffic`).\n2. Menampilkan grafik trend traffic.\n3. Menyediakan input interaktif (`hour`, `day`, `lag1`) untuk simulasi prediksi.\n4. Menampilkan output prediksi jumlah kendaraan.\n\nJalankan:\n\n```bash\nstreamlit run dashboard/traffic_dashboard_v1.py\n```\n\n## Setup Environment\n\n### 1) Prasyarat\n\n1. Python 3.10+ (disarankan 3.12).\n2. Pip dan virtual environment.\n3. Java (opsional, jika juga menjalankan pipeline Spark Week 6).\n\n### 2) Membuat Virtual Environment\n\nLinux/macOS:\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\n```\n\nPowerShell:\n\n```powershell\npython -m venv .venv\n.venv\\Scripts\\Activate.ps1\n```\n\n### 3) Install Dependency\n\n```bash\npip install pandas scikit-learn joblib streamlit matplotlib pyspark pyarrow\n```\n\n## Quick Run (Rekomendasi Praktikum 7)\n\nJalankan berurutan dari root project:\n\n```bash\npython scripts/traffic_data_cleaning_v1.py\npython analytics/traffic_ml_model_v1.py\nstreamlit run dashboard/traffic_dashboard_v1.py\n```\n\n## Validasi Keberhasilan\n\nPraktikum dianggap berhasil jika:\n\n1. File `data/clean/traffic_smartcity_clean_v1.csv` terbentuk setelah cleaning.\n2. File `models/traffic_model_v1.pkl` terbentuk setelah training.\n3. Dashboard terbuka tanpa error dan menampilkan metrik serta grafik trend.\n4. Tombol prediksi menghasilkan nilai prediksi kendaraan dari input pengguna.\n\n\n## Integrasi Dengan Praktikum 6\n\nKomponen Week 6 tetap dipertahankan di repo (streaming transportation, analytics, alert, dan dashboard). Dengan demikian, repository ini sekarang mencakup dua spektrum analitik:\n\n1. **Descriptive + Real-Time Analytics** (Week 6).\n2. **Predictive Analytics (Machine Learning)** (Week 7).\n\nPendekatan ini merepresentasikan alur smart city data platform yang lebih lengkap: dari observasi traffic hingga prediksi traffic.\n\n## Troubleshooting\n\n1. Jika error `No module named ...`, pastikan virtual environment aktif dan dependency sudah terinstall.\n2. Jika model tidak ditemukan di dashboard, jalankan ulang script training model.\n3. Jika data bersih tidak ditemukan, jalankan ulang script cleaning.\n4. Jika tampilan dashboard kosong, pastikan file `data/clean/traffic_smartcity_clean_v1.csv` berisi data valid.\n\n## Keterbatasan Baseline Saat Ini\n\n1. Model belum menggunakan train/test split dan evaluasi metrik formal (MAE/RMSE/R2).\n2. Fitur masih sederhana (`hour`, `day`, `lag1`) dan belum mencakup cuaca/event kota.\n3. Belum ada retraining otomatis periodik.\n\n## Rencana Pengembangan Lanjutan\n\n1. Menambahkan evaluasi model terukur (MAE, RMSE, R2) dan visual error analysis.\n2. Menambahkan fitur eksternal (cuaca, hari libur, event, kepadatan area).\n3. Menyusun pipeline retraining terjadwal dan versioning model.\n4. Integrasi prediksi ke layer alert real-time untuk proactive traffic management.\n\n## Penutup\n\nWeek 7 berhasil memperluas fondasi Week 6 dari sekadar monitoring real-time menjadi sistem yang mulai memiliki kemampuan prediktif. Hasilnya, repository ini kini lebih siap digunakan sebagai prototipe **Smart City AI** yang menggabungkan ingestion, analytics, visualisasi, dan prediksi traffic dalam satu alur terpadu.\r\n\r\n\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkaspulanwar%2Fp7_bigdata_machine_learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkaspulanwar%2Fp7_bigdata_machine_learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkaspulanwar%2Fp7_bigdata_machine_learning/lists"}