Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/stephen-adwini-badu/02.-automatidata-project

The objective was to predict taxi fare prices in New York City using data provided by the New York City Taxi & Limousine Commission. The project included data exploration, cleaning, and visualization, as well as building machine learning models (Random Forest, Logistic Regression etc) to estimate fare prices and identify factors influencing them
https://github.com/stephen-adwini-badu/02.-automatidata-project

data-science jupyter-notebook linear-regression machine-learning

Last synced: 29 days ago
JSON representation

The objective was to predict taxi fare prices in New York City using data provided by the New York City Taxi & Limousine Commission. The project included data exploration, cleaning, and visualization, as well as building machine learning models (Random Forest, Logistic Regression etc) to estimate fare prices and identify factors influencing them

Awesome Lists containing this project

README

        

# Automatidata Project

## Overview
This project focuses on analyzing taxi fare data to derive actionable insights. The project consists of two primary objectives:

1. **Two-Sample T-Test Analysis**: Investigate the relationship between the total fare amounts of credit card payment users and cash payment users.
2. **Regression Model Development**: Build a predictive model to estimate taxi fares based on available data.

## Objectives
### Objective 1: Two-Sample T-Test
- **Hypotheses**:
- **Null Hypothesis (H₀):** There is no significant difference in the average fare amounts between credit card and cash payment users.
- **Alternative Hypothesis (H₁):** There is a significant difference in the average fare amounts between credit card and cash payment users.
- **Insight Goal**: Determine whether promoting credit card payments could potentially generate more revenue for taxi drivers.

### Objective 2: Regression Model
- **Goal**: Construct a regression model to predict taxi fares using the provided dataset.
- **Use Case**: Enhance pricing strategies and fare estimation systems for taxi services.

## Methodology
### Data Preparation
- Importing necessary packages for statistical and machine learning analysis.
- Cleaning and preprocessing the data to ensure accuracy and consistency for analysis.

### Analysis and Testing
1. **Two-Sample T-Test**:
- Compare the means of fare amounts between two groups: credit card and cash users.
- Assess the p-value to determine statistical significance.

2. **Regression Modeling**:
- Feature engineering to identify relevant predictors for fare estimation.
- Split the data into training and testing subsets.
- Train and evaluate the model using metrics such as R-squared and mean squared error.

## Insights
- Analyze the results of the t-test to guide payment method recommendations.
- Evaluate the performance of the regression model to ensure it provides accurate predictions.

## Results
- Key findings from the statistical test and regression analysis will guide data-driven decisions for improving taxi service operations.

### FEATURE IMPORTANCE
![Image](https://github.com/user-attachments/assets/5f25c477-6b72-494d-beb0-1058f0f87416)