{"id":23828561,"url":"https://github.com/imnotamr/datasets-used","last_synced_at":"2026-05-19T14:11:18.783Z","repository":{"id":259996797,"uuid":"880012325","full_name":"imnotamr/Datasets-Used","owner":"imnotamr","description":"A comprehensive collection of datasets for machine learning and data science projects, covering topics from advertising and sales to health and sports analytics","archived":false,"fork":false,"pushed_at":"2024-10-29T00:52:29.000Z","size":7165,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-02T13:18:34.437Z","etag":null,"topics":["ai","classification","data-analysis","data-science","data-visualization","deep-learning","jupyter-notebook","machine-learning","models","python","regression-models"],"latest_commit_sha":null,"homepage":"https://github.com/imnotamr/Datasets-Used","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/imnotamr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-29T00:32:16.000Z","updated_at":"2024-10-29T01:46:41.000Z","dependencies_parsed_at":"2024-10-29T01:38:46.107Z","dependency_job_id":null,"html_url":"https://github.com/imnotamr/Datasets-Used","commit_stats":null,"previous_names":["imnotamr/datasets-used"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imnotamr%2FDatasets-Used","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imnotamr%2FDatasets-Used/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imnotamr%2FDatasets-Used/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imnotamr%2FDatasets-Used/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/imnotamr","download_url":"https://codeload.github.com/imnotamr/Datasets-Used/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240115817,"owners_count":19750084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","classification","data-analysis","data-science","data-visualization","deep-learning","jupyter-notebook","machine-learning","models","python","regression-models"],"created_at":"2025-01-02T13:18:46.915Z","updated_at":"2025-10-12T06:38:53.447Z","avatar_url":"https://github.com/imnotamr.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📊 Datasets for Machine Learning Projects\nThis directory contains various datasets used for machine learning, data analysis, and visualization tasks. Each dataset serves a specific purpose and covers different domains, providing a range of options for exploratory data analysis, regression, classification, and more.\n\n## \"I haven't uploaded all of the datasets I've used yet :)\" ## \n\n# 📁 Dataset Descriptions\n1)\tAdvertising.csv\n•\tDescription: Contains data on advertising expenditures across different channels (TV, Radio, Newspaper) and their impact on sales.\n•\tUsage: Great for regression analysis and predictive modeling to analyze the impact of advertising on sales.\n\n2)\tSVM_Data.csv\n•\tDescription: A dataset designed for Support Vector Machine (SVM) classification tasks.\n•\tUsage: Suitable for experimenting with SVM algorithms, tuning hyperparameters, and evaluating model accuracy.\n\n4)\tSalary.csv\n•\tDescription: Contains information on individuals' salary levels based on factors like education, experience, and position.\n•\tUsage: Ideal for regression modeling, predicting salary levels based on features, and understanding salary trends.\n\n6)\tadult.csv\n•\tDescription: UCI Adult dataset, used to predict income levels based on demographic features.\n•\tUsage: Commonly used for classification tasks, especially binary classification on predicting income levels (e.g., above or below 50K).\n\n8)\tcar_data.csv\n•\tDescription: Information on various car attributes and evaluations.\n•\tUsage: Useful for classification tasks, such as evaluating car acceptability based on features like buying price, maintenance cost, and safety.\n\n10)\tinsurance.csv\n•\tDescription: Insurance data, including demographic and medical history factors, along with insurance charges.\n•\tUsage: Great for regression analysis, predicting insurance costs, and understanding factors influencing charges.\n\n12)\tkc_house_data.csv\n•\tDescription: Contains house prices and associated features from the King County housing dataset.\n•\tUsage: Suitable for regression modeling, predicting house prices, and feature importance analysis.\n\n14)\ttitanic.csv\n•\tDescription: Famous Titanic dataset containing details about passengers and their survival status.\n•\tUsage: Ideal for binary classification, survival analysis, and feature engineering experiments.\n\n16)\tIBM Dataset.csv\n•\tDescription: IBM HR Analytics dataset, used to predict employee attrition and understand HR factors.\n•\tUsage: Useful for classification tasks, especially for predicting employee turnover.\n\n18)\tIris.csv\n•\tDescription: Classic Iris dataset containing measurements of different iris flower species.\n•\tUsage: Commonly used for classification tasks and exploring clustering algorithms.\n\n20)\tSampleSuperstore.csv\n•\tDescription: Data from a sample superstore, including sales, profits, and product categories.\n•\tUsage: Useful for analyzing sales performance, regional trends, and market segmentation.\n\n22)\tdata.csv\n•\tDescription: General dataset for testing and exploratory analysis.\n•\tUsage: Used for various data analysis tasks.\n\n24)\tdiabetes.csv\n•\tDescription: Diabetes dataset containing health-related features used to predict diabetes onset.\n•\tUsage: Suitable for binary classification and medical analysis tasks.\n\n26)\tfifa.csv\n•\tDescription: FIFA dataset with player statistics.\n•\tUsage: Ideal for data exploration, player comparison, and sports analytics.\n\n28)\tpendigits-test.csv \u0026 pendigits-train.csv\n•\tDescription: Handwritten digit recognition dataset with training and testing splits.\n•\tUsage: Commonly used for classification tasks and testing digit recognition algorithms.\n\n30)\tregression_project_data.csv\n•\tDescription: Dataset for regression analysis.\n•\tUsage: Used for regression modeling and feature analysis.\n\n32)\ttips.csv\n•\tDescription: Dataset containing restaurant tips, including features like total bill, tip amount, and day.\n•\tUsage: Great for exploring regression, correlation, and understanding factors affecting tips.\n\n34)\ttitanic.csv (duplicate)\n•\tDescription: Duplicate of the Titanic dataset, used for different experimental setups.\n•\tUsage: Same as above, for binary classification and survival analysis.\n\n\n# 📚 How to Use\nLoad the Data\nEach dataset is in CSV format and can be loaded using pandas in Python:\n\npython\nCopy code\nimport pandas as pd\n\ndata = pd.read_csv('path/to/dataset.csv')\nExploratory Data Analysis\nPerform initial analysis to understand the features, data types, and any necessary preprocessing steps:\n\npython\nCopy code\ndata.info()\ndata.describe()\nMachine Learning\nThese datasets are suitable for a variety of machine learning tasks such as regression, classification, clustering, and feature engineering.\n\n# 💡 Suggested Applications\nRegression Models: Advertising.csv, insurance.csv, kc_house_data.csv\nClassification Models: adult.csv, car_data.csv, titanic.csv, Iris.csv, diabetes.csv\nExploratory Data Analysis: SampleSuperstore.csv, fifa.csv\nTime-Series \u0026 Sequential Data: pendigits-train.csv and pendigits-test.csv\n# 📄 License\nThese datasets are publicly available for educational and research purposes. Please ensure proper citation when using these datasets for publications or sharing results.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimnotamr%2Fdatasets-used","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimnotamr%2Fdatasets-used","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimnotamr%2Fdatasets-used/lists"}