Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/surayasumona/ford_used_car_analysis
Explanatory Data Analysis with Python
https://github.com/surayasumona/ford_used_car_analysis
data-science data-visualization matplotlib numpy pandas python seaborn
Last synced: 27 days ago
JSON representation
Explanatory Data Analysis with Python
- Host: GitHub
- URL: https://github.com/surayasumona/ford_used_car_analysis
- Owner: SurayaSumona
- Created: 2021-07-08T20:57:08.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-10-09T11:02:16.000Z (over 3 years ago)
- Last Synced: 2024-11-24T16:12:11.581Z (3 months ago)
- Topics: data-science, data-visualization, matplotlib, numpy, pandas, python, seaborn
- Language: Jupyter Notebook
- Homepage:
- Size: 4.79 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Analysis and Data Visualization of the Ford used cars
In this dataset, there has some ford used car's information. Here are the descriptions of the columns for the dataset:**Target variable:**
* **Price:** selling price of the cars**Features:**
* **model:** list of the Ford cars
* **year:** when the car was made
* **transmission:** transmission adapts the output of the internal combustion engine to the drive wheels
* **mileage:** The mileage of a vehicle is the number of miles that it can travel using one gallon or litre of fuel
* **fuelType:** different fuels a vehicle may use
* **mpg**: miles per gallon the vehicle can travel
* **engineSize:** engineSize is the volume of fuel and air that can be pushed through a car's cylinders### Goal of this project:
#### Learn Data visualization and predict the resale price of the used cars using Machine learning algorithm
**Exploratory Data Analysis**:
* Read the data as Pandas Dataframe
* Check the data types and missing values
* Check the basic statistics of numerical features
* Find the percentage of unique values and reset the index,rename and round the catergorical variables**Exploring the data using different data visualization plots**:
* Barplot
* Scatterplot
* Trendline or Regression plot
* Histogram
* Distribution plot
* ECDF ( Emperical Cumulative Distribution Function)
* Boxplot
* Violinplot**EDA using GroupBy/Pivot_Table and Barplot based on some features such as model, transmission, and fuelType**
* What are the top 5 selling car models in the dataset?
* What's the average selling price of the top 5 selling car models?
* What's the total sale of the top 5 selling car models?# Machine Learning Algorithms
**Supervised Learning: Linear Regression and Regression accuracy metrics**:
* Understanding the equation of a straight line
* feature coefficient (slope, gradient, m)
* bias coeffcient (y-intercept, c)
* loss function, cost function, objective function, error function
* Mean Absolute Error (MAE)
* Mean Absolute Percentage Error (MAPE)
* Mean Squared Error (MSE)
* Root Mean Squared Error (RMSE)
* R-squared or coefficient of determination
* Prediction result evaluation### Reference of this Dataset: https://www.kaggle.com/aishwaryamuthukumar/cars-dataset-audi-bmw-ford-hyundai-skoda-vw