{"id":23845771,"url":"https://github.com/kabeera1007/bike_data_play","last_synced_at":"2026-04-10T07:46:05.291Z","repository":{"id":270148887,"uuid":"909336611","full_name":"kabeera1007/Bike_Data_Play","owner":"kabeera1007","description":"End to End Data Engineering project with multiple ETL \u0026 ELT pipelines.","archived":false,"fork":false,"pushed_at":"2025-01-27T19:04:28.000Z","size":357,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-22T06:14:45.084Z","etag":null,"topics":["airflow","anaconda","bigquery","cloud","dbt","docker","gcs","python","spark","terraform"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kabeera1007.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-28T11:57:31.000Z","updated_at":"2025-01-27T19:04:32.000Z","dependencies_parsed_at":"2025-06-29T14:32:31.801Z","dependency_job_id":null,"html_url":"https://github.com/kabeera1007/Bike_Data_Play","commit_stats":null,"previous_names":["kabeera1008/bike_data_play","kabeera1007/bike_data_play"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kabeera1007/Bike_Data_Play","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kabeera1007%2FBike_Data_Play","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kabeera1007%2FBike_Data_Play/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kabeera1007%2FBike_Data_Play/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kabeera1007%2FBike_Data_Play/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kabeera1007","download_url":"https://codeload.github.com/kabeera1007/Bike_Data_Play/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kabeera1007%2FBike_Data_Play/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262609114,"owners_count":23336631,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","anaconda","bigquery","cloud","dbt","docker","gcs","python","spark","terraform"],"created_at":"2025-01-02T20:26:21.581Z","updated_at":"2026-04-10T07:46:05.193Z","avatar_url":"https://github.com/kabeera1007.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Project Name: Bike Data Play - Divvy Bike-Sharing Analysis\n\n## Description\n\nThis project involves processing and analyzing Divvy bike-sharing data from Chicago (2020–2024) using various tools and technique.\n![Workflow](https://github.com/kabeera1007/Bike_data_play/blob/master/workflow001.png)\n\n## Workflow\n\nThis project integrates several tools and processes to manage the workflow:\n\n***Tools***:\n  \n- **DBT**: Data transformation and analysis.\n- **Airflow**: Task scheduling and orchestration.\n- **Spark**: Data processing.\n- **Docker**: Containerization.\n- **Terraform**: Infrastructure management.\n- **GCS**: Cloud computing.\n  \n  ***Steps***:\n  \n- **Step 1**:  ETL and ELT Using spark.\n- **Step 2**:  ELT Using dbt and gcs.\n- **Step 3**:  ELT Using docker, terraform, airflow .\n  \n\n## Project Structure\n\nThe project structure is organized as follows:\n\n- **analyses/**: Contains DBT analysis scripts.\n- **dags/**: Airflow DAGs for task scheduling.\n- **macros/**: Custom DBT macros.\n- **models/**: DBT models for data transformation.\n- **scripts/**: Project setup scripts.\n- **seeds/**: Raw data for seeding DBT models.\n- **snapshots/**: DBT snapshots for table versioning.\n- **spark_notebooks/**: Jupyter Notebooks for Spark-based analysis.\n- **terraf/**: Terraform configuration files.\n- **tests/**: DBT tests for data quality.\n- **.gitignore**: Git ignore file for unwanted files.\n- **Dockerfile**: Docker configuration for the project.\n- **docker-compose.yaml**: Docker Compose configuration for container orchestration.\n- **requirements.txt**: Python dependencies for the project.\n\n## Data\n\nThe dataset contains Divvy bike-sharing trip data from 2020 to 2024.\n\n- **Rows**: 20 million +\n\nThe columns include:\n\n- **ride_id**: Unique ID assigned to each Divvy trip.\n- **rideable_type**: Type of vehicle used (bike or scooter).\n- **started_at**: Start date and time of the trip.\n- **ended_at**: End date and time of the trip.\n- **start_station_name**: Name of the start station.\n- **start_station_id**: Unique ID of the start station.\n- **end_station_name**: Name of the end station.\n- **end_station_id**: Unique ID of the end station.\n- **start_lat**: Latitude of the start station.\n- **start_lng**: Longitude of the start station.\n- **end_lat**: Latitude of the end station.\n- **end_lng**: Longitude of the end station.\n- **member_casual**: Whether the rider is a Divvy member or a casual user.\n\n[Link to Dataset](https://divvy-tripdata.s3.amazonaws.com/index.html)\n\n## Installation\nThe complete project is hosted on google cloud. \n\n## Visualization : \n\n[Link to Visualization](https://lookerstudio.google.com/reporting/ccd00616-ec8b-443f-b6e3-c6e6446bfc8c) \n\n### Prerequisites\n\nTo run this project, ensure the following tools are installed:\n\n1. **Python** (version X.X.X)\n2. **Docker** (for containerization)\n3. **DBT** (for data transformation)\n4. **Terraform** (for infrastructure management)\n5. **Airflow** (for task scheduling)\n6. **Spark** (analytics engine)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkabeera1007%2Fbike_data_play","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkabeera1007%2Fbike_data_play","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkabeera1007%2Fbike_data_play/lists"}