https://github.com/imnotamr/datasets-used
A comprehensive collection of datasets for machine learning and data science projects, covering topics from advertising and sales to health and sports analytics
https://github.com/imnotamr/datasets-used
ai classification data-analysis data-science data-visualization deep-learning jupyter-notebook machine-learning models python regression-models
Last synced: 3 months ago
JSON representation
A comprehensive collection of datasets for machine learning and data science projects, covering topics from advertising and sales to health and sports analytics
- Host: GitHub
- URL: https://github.com/imnotamr/datasets-used
- Owner: imnotamr
- Created: 2024-10-29T00:32:16.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-10-29T00:52:29.000Z (7 months ago)
- Last Synced: 2025-01-02T13:18:34.437Z (5 months ago)
- Topics: ai, classification, data-analysis, data-science, data-visualization, deep-learning, jupyter-notebook, machine-learning, models, python, regression-models
- Homepage: https://github.com/imnotamr/Datasets-Used
- Size: 6.83 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 📊 Datasets for Machine Learning Projects
This directory contains various datasets used for machine learning, data analysis, and visualization tasks. Each dataset serves a specific purpose and covers different domains, providing a range of options for exploratory data analysis, regression, classification, and more.## "I haven't uploaded all of the datasets I've used yet :)" ##
# 📁 Dataset Descriptions
1) Advertising.csv
• Description: Contains data on advertising expenditures across different channels (TV, Radio, Newspaper) and their impact on sales.
• Usage: Great for regression analysis and predictive modeling to analyze the impact of advertising on sales.2) SVM_Data.csv
• Description: A dataset designed for Support Vector Machine (SVM) classification tasks.
• Usage: Suitable for experimenting with SVM algorithms, tuning hyperparameters, and evaluating model accuracy.4) Salary.csv
• Description: Contains information on individuals' salary levels based on factors like education, experience, and position.
• Usage: Ideal for regression modeling, predicting salary levels based on features, and understanding salary trends.6) adult.csv
• Description: UCI Adult dataset, used to predict income levels based on demographic features.
• Usage: Commonly used for classification tasks, especially binary classification on predicting income levels (e.g., above or below 50K).8) car_data.csv
• Description: Information on various car attributes and evaluations.
• Usage: Useful for classification tasks, such as evaluating car acceptability based on features like buying price, maintenance cost, and safety.10) insurance.csv
• Description: Insurance data, including demographic and medical history factors, along with insurance charges.
• Usage: Great for regression analysis, predicting insurance costs, and understanding factors influencing charges.12) kc_house_data.csv
• Description: Contains house prices and associated features from the King County housing dataset.
• Usage: Suitable for regression modeling, predicting house prices, and feature importance analysis.14) titanic.csv
• Description: Famous Titanic dataset containing details about passengers and their survival status.
• Usage: Ideal for binary classification, survival analysis, and feature engineering experiments.16) IBM Dataset.csv
• Description: IBM HR Analytics dataset, used to predict employee attrition and understand HR factors.
• Usage: Useful for classification tasks, especially for predicting employee turnover.18) Iris.csv
• Description: Classic Iris dataset containing measurements of different iris flower species.
• Usage: Commonly used for classification tasks and exploring clustering algorithms.20) SampleSuperstore.csv
• Description: Data from a sample superstore, including sales, profits, and product categories.
• Usage: Useful for analyzing sales performance, regional trends, and market segmentation.22) data.csv
• Description: General dataset for testing and exploratory analysis.
• Usage: Used for various data analysis tasks.24) diabetes.csv
• Description: Diabetes dataset containing health-related features used to predict diabetes onset.
• Usage: Suitable for binary classification and medical analysis tasks.26) fifa.csv
• Description: FIFA dataset with player statistics.
• Usage: Ideal for data exploration, player comparison, and sports analytics.28) pendigits-test.csv & pendigits-train.csv
• Description: Handwritten digit recognition dataset with training and testing splits.
• Usage: Commonly used for classification tasks and testing digit recognition algorithms.30) regression_project_data.csv
• Description: Dataset for regression analysis.
• Usage: Used for regression modeling and feature analysis.32) tips.csv
• Description: Dataset containing restaurant tips, including features like total bill, tip amount, and day.
• Usage: Great for exploring regression, correlation, and understanding factors affecting tips.34) titanic.csv (duplicate)
• Description: Duplicate of the Titanic dataset, used for different experimental setups.
• Usage: Same as above, for binary classification and survival analysis.# 📚 How to Use
Load the Data
Each dataset is in CSV format and can be loaded using pandas in Python:python
Copy code
import pandas as pddata = pd.read_csv('path/to/dataset.csv')
Exploratory Data Analysis
Perform initial analysis to understand the features, data types, and any necessary preprocessing steps:python
Copy code
data.info()
data.describe()
Machine Learning
These datasets are suitable for a variety of machine learning tasks such as regression, classification, clustering, and feature engineering.# 💡 Suggested Applications
Regression Models: Advertising.csv, insurance.csv, kc_house_data.csv
Classification Models: adult.csv, car_data.csv, titanic.csv, Iris.csv, diabetes.csv
Exploratory Data Analysis: SampleSuperstore.csv, fifa.csv
Time-Series & Sequential Data: pendigits-train.csv and pendigits-test.csv
# 📄 License
These datasets are publicly available for educational and research purposes. Please ensure proper citation when using these datasets for publications or sharing results.