Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/stephen-adwini-badu/02.-automatidata-project
The objective was to predict taxi fare prices in New York City using data provided by the New York City Taxi & Limousine Commission. The project included data exploration, cleaning, and visualization, as well as building machine learning models (Random Forest, Logistic Regression etc) to estimate fare prices and identify factors influencing them
https://github.com/stephen-adwini-badu/02.-automatidata-project
data-science jupyter-notebook linear-regression machine-learning
Last synced: 29 days ago
JSON representation
The objective was to predict taxi fare prices in New York City using data provided by the New York City Taxi & Limousine Commission. The project included data exploration, cleaning, and visualization, as well as building machine learning models (Random Forest, Logistic Regression etc) to estimate fare prices and identify factors influencing them
- Host: GitHub
- URL: https://github.com/stephen-adwini-badu/02.-automatidata-project
- Owner: Stephen-Adwini-Badu
- Created: 2025-01-13T15:57:36.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-01-22T00:19:11.000Z (30 days ago)
- Last Synced: 2025-01-22T01:22:55.525Z (30 days ago)
- Topics: data-science, jupyter-notebook, linear-regression, machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 1.16 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Automatidata Project
## Overview
This project focuses on analyzing taxi fare data to derive actionable insights. The project consists of two primary objectives:1. **Two-Sample T-Test Analysis**: Investigate the relationship between the total fare amounts of credit card payment users and cash payment users.
2. **Regression Model Development**: Build a predictive model to estimate taxi fares based on available data.## Objectives
### Objective 1: Two-Sample T-Test
- **Hypotheses**:
- **Null Hypothesis (H₀):** There is no significant difference in the average fare amounts between credit card and cash payment users.
- **Alternative Hypothesis (H₁):** There is a significant difference in the average fare amounts between credit card and cash payment users.
- **Insight Goal**: Determine whether promoting credit card payments could potentially generate more revenue for taxi drivers.### Objective 2: Regression Model
- **Goal**: Construct a regression model to predict taxi fares using the provided dataset.
- **Use Case**: Enhance pricing strategies and fare estimation systems for taxi services.## Methodology
### Data Preparation
- Importing necessary packages for statistical and machine learning analysis.
- Cleaning and preprocessing the data to ensure accuracy and consistency for analysis.### Analysis and Testing
1. **Two-Sample T-Test**:
- Compare the means of fare amounts between two groups: credit card and cash users.
- Assess the p-value to determine statistical significance.2. **Regression Modeling**:
- Feature engineering to identify relevant predictors for fare estimation.
- Split the data into training and testing subsets.
- Train and evaluate the model using metrics such as R-squared and mean squared error.## Insights
- Analyze the results of the t-test to guide payment method recommendations.
- Evaluate the performance of the regression model to ensure it provides accurate predictions.## Results
- Key findings from the statistical test and regression analysis will guide data-driven decisions for improving taxi service operations.### FEATURE IMPORTANCE
