Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sayed-ashfaq/delhivery-dataanalysis
In this project, I conducted basic analysis, feature engineering, normalization, and outlier handling, along with statistical and non-parametric testing to extract insights.
https://github.com/sayed-ashfaq/delhivery-dataanalysis
feature-engineering normalization outlier-detection pandas python scikit-learn statistcal-tests statistical-analysis
Last synced: 13 days ago
JSON representation
In this project, I conducted basic analysis, feature engineering, normalization, and outlier handling, along with statistical and non-parametric testing to extract insights.
- Host: GitHub
- URL: https://github.com/sayed-ashfaq/delhivery-dataanalysis
- Owner: sayed-ashfaq
- Created: 2024-12-22T03:41:24.000Z (18 days ago)
- Default Branch: main
- Last Pushed: 2024-12-22T03:50:59.000Z (18 days ago)
- Last Synced: 2024-12-22T04:27:02.882Z (17 days ago)
- Topics: feature-engineering, normalization, outlier-detection, pandas, python, scikit-learn, statistcal-tests, statistical-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Delhivery Data Analysis
## About Delhivery
Delhivery is the largest and fastest-growing fully integrated logistics provider in India as of Fiscal 2021. The company aims to build the operating system for commerce through a blend of world-class infrastructure, high-quality logistics operations, and cutting-edge engineering and technology capabilities.The data team at Delhivery leverages vast datasets to enhance business intelligence, drive operational efficiency, and maintain profitability, creating a significant competitive edge.
---
## Objective
The goal of this project is to process and analyze data generated by Delhivery's logistics operations to:
1. **Clean, sanitize, and manipulate raw data** to derive actionable insights.
2. **Create useful features** for the data science team to develop forecasting models.---
## Dataset
The dataset consists of records from Delhivery's logistics and operational data pipeline.### **Key Features**:
- **`data`**: Indicates if the record is training or testing data.
- **`trip_creation_time`**: Timestamp of trip creation.
- **`route_schedule_uuid`**: Unique identifier for a route schedule.
- **`route_type`**: Type of transportation (`FTL`, `Carting`).
- **FTL**: Full Truck Load shipments, faster delivery as there are no intermediate pickups/drop-offs.
- **Carting**: Delivery system using smaller vehicles (carts).
- **`trip_uuid`**: Unique identifier for a trip (a trip can involve multiple source and destination centers).
- **`source_center`**: ID of the trip's origin center.
- **`source_name`**: Name of the trip's origin center.
- **`destination_center`**: ID of the destination center.
- **`destination_name`**: Name of the destination center.
- **`od_start_time`**: Trip start time.
- **`od_end_time`**: Trip end time.
- **`start_scan_to_end_scan`**: Total time taken for delivery from source to destination.
- **`actual_distance_to_destination`**: Actual distance in kilometers between source and destination.
- **`actual_time`**: Cumulative time taken to complete the delivery.
- **`osrm_time`**: Time calculated by the Open-Source Routing Machine (OSRM) considering shortest paths and typical traffic conditions (cumulative).
- **`osrm_distance`**: Distance calculated by OSRM (cumulative).
- **`segment_actual_time`**: Time taken for a segment of the delivery.
- **`segment_osrm_time`**: OSRM-calculated time for a delivery segment.
- **`segment_osrm_distance`**: OSRM-calculated distance for a delivery segment.### **Additional Fields**:
Some fields with currently unclear meanings, like `is_cutoff`, `cutoff_factor`, `cutoff_timestamp`, and `factor`, are included for completeness and may be explored further.---
## Process Overview
### 1. **Feature Engineering**:
- Derived meaningful metrics such as:
- **`time_diff_hours`**: Time difference between `od_start_time` and `od_end_time`.
- Extracted components from timestamps (e.g., month, year, day of the week).
- Split and standardized source and destination names into city, place code, and state.### 2. **Data Cleaning**:
- Handled missing values using appropriate imputation techniques.
- Addressed `outliers` with boxplots and the `IQR` method.### 3. **Categorical Feature Handling**:
- Applied one-hot encoding to variables like `route_type` for better interpretability in downstream models.### 4. **Normalization and Standardization**:
- Used MinMaxScaler and StandardScaler for numerical columns to align features to a uniform scale.---
## Key Insights
1. **Route Type Insights**:
- FTL routes are faster and more efficient for long distances compared to Carting.2. **Source and Destination Patterns**:
- High-frequency routes indicate key operational hubs that could benefit from resource optimization.3. **Time Efficiency**:
- Delivery times vary significantly by route type, season, and traffic conditions.4. **OSRM vs. Actual Metrics**:
- Discrepancies between OSRM-calculated and actual times/distances highlight areas for improving routing algorithms.---
## Tools and Libraries
This project utilized the following tools:
- **Python**:
- `Pandas` for data manipulation.
- `Matplotlib` and `Seaborn` for visualization.
- `Sklearn` for preprocessing and scaling.
- **Jupyter Notebook**: For interactive analysis and documentation.---
## Repository Structure
- **`data/`**: Contains the dataset used for analysis.
- **`notebooks/`**: Jupyter Notebooks documenting the analysis process.
- **`visualizations/`**: Saved plots and charts.
- **`README.md`**: Overview of the project (this file).---
## Next Steps
Future directions for this project include:
1. Developing predictive models for delivery time and distance.
2. Investigating patterns in the unknown fields (`is_cutoff`, `cutoff_factor`, etc.).
3. Implementing clustering techniques to identify high-demand routes.---
## Acknowledgments
- **Dataset Source**: Provided by Scaler for this analysis.
- **Python Libraries**: Thanks to the open-source Python community for providing versatile data analysis tools.---
## License
This project is licensed for educational and non-commercial use only. If utilizing any part of this repository, please credit the author.