{"id":23564081,"url":"https://github.com/sayed-ashfaq/delhivery-dataanalysis","last_synced_at":"2026-04-30T11:35:22.417Z","repository":{"id":269243040,"uuid":"906835776","full_name":"sayed-ashfaq/Delhivery-DataAnalysis","owner":"sayed-ashfaq","description":" In this project, I conducted basic analysis, feature engineering, normalization, and outlier handling, along with statistical and non-parametric testing to extract insights.","archived":false,"fork":false,"pushed_at":"2024-12-22T03:50:59.000Z","size":7156,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-17T16:31:50.106Z","etag":null,"topics":["feature-engineering","normalization","outlier-detection","pandas","python","scikit-learn","statistcal-tests","statistical-analysis"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sayed-ashfaq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-22T03:41:24.000Z","updated_at":"2024-12-22T03:51:02.000Z","dependencies_parsed_at":"2024-12-22T04:27:04.281Z","dependency_job_id":"bc8630ca-3fb0-432f-af81-4dfef8fcbea3","html_url":"https://github.com/sayed-ashfaq/Delhivery-DataAnalysis","commit_stats":null,"previous_names":["sayed-ashfaq/delhivery-dataanalysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sayed-ashfaq%2FDelhivery-DataAnalysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sayed-ashfaq%2FDelhivery-DataAnalysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sayed-ashfaq%2FDelhivery-DataAnalysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sayed-ashfaq%2FDelhivery-DataAnalysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sayed-ashfaq","download_url":"https://codeload.github.com/sayed-ashfaq/Delhivery-DataAnalysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254468690,"owners_count":22076349,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["feature-engineering","normalization","outlier-detection","pandas","python","scikit-learn","statistcal-tests","statistical-analysis"],"created_at":"2024-12-26T17:12:38.977Z","updated_at":"2026-04-30T11:35:17.355Z","avatar_url":"https://github.com/sayed-ashfaq.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Delhivery Data Analysis  \n\n## About Delhivery  \nDelhivery is the largest and fastest-growing fully integrated logistics provider in India as of Fiscal 2021. The company aims to build the operating system for commerce through a blend of world-class infrastructure, high-quality logistics operations, and cutting-edge engineering and technology capabilities.  \n\nThe data team at Delhivery leverages vast datasets to enhance business intelligence, drive operational efficiency, and maintain profitability, creating a significant competitive edge.  \n\n---\n\n## Objective  \nThe goal of this project is to process and analyze data generated by Delhivery's logistics operations to:  \n1. **Clean, sanitize, and manipulate raw data** to derive actionable insights.  \n2. **Create useful features** for the data science team to develop forecasting models.  \n\n---\n\n## Dataset  \nThe dataset consists of records from Delhivery's logistics and operational data pipeline.  \n\n### **Key Features**:  \n- **`data`**: Indicates if the record is training or testing data.  \n- **`trip_creation_time`**: Timestamp of trip creation.  \n- **`route_schedule_uuid`**: Unique identifier for a route schedule.  \n- **`route_type`**: Type of transportation (`FTL`, `Carting`).  \n  - **FTL**: Full Truck Load shipments, faster delivery as there are no intermediate pickups/drop-offs.  \n  - **Carting**: Delivery system using smaller vehicles (carts).  \n- **`trip_uuid`**: Unique identifier for a trip (a trip can involve multiple source and destination centers).  \n- **`source_center`**: ID of the trip's origin center.  \n- **`source_name`**: Name of the trip's origin center.  \n- **`destination_center`**: ID of the destination center.  \n- **`destination_name`**: Name of the destination center.  \n- **`od_start_time`**: Trip start time.  \n- **`od_end_time`**: Trip end time.  \n- **`start_scan_to_end_scan`**: Total time taken for delivery from source to destination.  \n- **`actual_distance_to_destination`**: Actual distance in kilometers between source and destination.  \n- **`actual_time`**: Cumulative time taken to complete the delivery.  \n- **`osrm_time`**: Time calculated by the Open-Source Routing Machine (OSRM) considering shortest paths and typical traffic conditions (cumulative).  \n- **`osrm_distance`**: Distance calculated by OSRM (cumulative).  \n- **`segment_actual_time`**: Time taken for a segment of the delivery.  \n- **`segment_osrm_time`**: OSRM-calculated time for a delivery segment.  \n- **`segment_osrm_distance`**: OSRM-calculated distance for a delivery segment.  \n\n### **Additional Fields**:  \nSome fields with currently unclear meanings, like `is_cutoff`, `cutoff_factor`, `cutoff_timestamp`, and `factor`, are included for completeness and may be explored further.  \n\n---\n\n## Process Overview  \n\n### 1. **Feature Engineering**:  \n- Derived meaningful metrics such as:  \n  - **`time_diff_hours`**: Time difference between `od_start_time` and `od_end_time`.  \n  - Extracted components from timestamps (e.g., month, year, day of the week).  \n  - Split and standardized source and destination names into city, place code, and state.  \n\n### 2. **Data Cleaning**:  \n- Handled missing values using appropriate imputation techniques.  \n- Addressed `outliers` with boxplots and the `IQR` method.  \n\n### 3. **Categorical Feature Handling**:  \n- Applied one-hot encoding to variables like `route_type` for better interpretability in downstream models.  \n\n### 4. **Normalization and Standardization**:  \n- Used MinMaxScaler and StandardScaler for numerical columns to align features to a uniform scale.  \n\n---\n\n## Key Insights  \n\n1. **Route Type Insights**:  \n   - FTL routes are faster and more efficient for long distances compared to Carting.  \n\n2. **Source and Destination Patterns**:  \n   - High-frequency routes indicate key operational hubs that could benefit from resource optimization.  \n\n3. **Time Efficiency**:  \n   - Delivery times vary significantly by route type, season, and traffic conditions.  \n\n4. **OSRM vs. Actual Metrics**:  \n   - Discrepancies between OSRM-calculated and actual times/distances highlight areas for improving routing algorithms.  \n\n---\n\n## Tools and Libraries  \nThis project utilized the following tools:  \n- **Python**:  \n  - `Pandas` for data manipulation.  \n  - `Matplotlib` and `Seaborn` for visualization.  \n  - `Sklearn` for preprocessing and scaling.  \n- **Jupyter Notebook**: For interactive analysis and documentation.  \n\n---\n\n## Repository Structure  \n- **`data/`**: Contains the dataset used for analysis.  \n- **`notebooks/`**: Jupyter Notebooks documenting the analysis process.  \n- **`visualizations/`**: Saved plots and charts.  \n- **`README.md`**: Overview of the project (this file).  \n\n---\n\n## Next Steps  \nFuture directions for this project include:  \n1. Developing predictive models for delivery time and distance.  \n2. Investigating patterns in the unknown fields (`is_cutoff`, `cutoff_factor`, etc.).  \n3. Implementing clustering techniques to identify high-demand routes.  \n\n---\n\n## Acknowledgments  \n- **Dataset Source**: Provided by Scaler for this analysis.  \n- **Python Libraries**: Thanks to the open-source Python community for providing versatile data analysis tools.  \n\n---\n\n## License  \nThis project is licensed for educational and non-commercial use only. If utilizing any part of this repository, please credit the author.  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsayed-ashfaq%2Fdelhivery-dataanalysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsayed-ashfaq%2Fdelhivery-dataanalysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsayed-ashfaq%2Fdelhivery-dataanalysis/lists"}