{"id":26055767,"url":"https://github.com/lovnishverma/datasets","last_synced_at":"2026-01-31T07:03:18.311Z","repository":{"id":176043627,"uuid":"654876045","full_name":"lovnishverma/datasets","owner":"lovnishverma","description":"This repository contains various datasets for data analysis, machine learning, and educational purposes","archived":false,"fork":false,"pushed_at":"2026-01-21T09:52:13.000Z","size":42537,"stargazers_count":13,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-21T21:31:03.068Z","etag":null,"topics":["csv","dataset","kaggle-dataset"],"latest_commit_sha":null,"homepage":"https://www.kaggle.com/datasets/princelv84/csv-datasets","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lovnishverma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"lovnishverma","custom":["https://buymeacoffee.com/lovnishverma","https://www.paypal.com/paypalme/princelv"]}},"created_at":"2023-06-17T07:50:22.000Z","updated_at":"2026-01-21T09:52:18.000Z","dependencies_parsed_at":null,"dependency_job_id":"833f1df4-e2d4-4e32-9e57-f37ab4bc1eb7","html_url":"https://github.com/lovnishverma/datasets","commit_stats":null,"previous_names":["lovnishverma/datasets"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lovnishverma/datasets","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovnishverma%2Fdatasets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovnishverma%2Fdatasets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovnishverma%2Fdatasets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovnishverma%2Fdatasets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lovnishverma","download_url":"https://codeload.github.com/lovnishverma/datasets/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovnishverma%2Fdatasets/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28932600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T04:05:25.756Z","status":"ssl_error","status_checked_at":"2026-01-31T04:02:35.005Z","response_time":128,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","dataset","kaggle-dataset"],"created_at":"2025-03-08T10:21:21.510Z","updated_at":"2026-01-31T07:03:18.305Z","avatar_url":"https://github.com/lovnishverma.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/lovnishverma","https://buymeacoffee.com/lovnishverma","https://www.paypal.com/paypalme/princelv"],"categories":[],"sub_categories":[],"readme":"# My Datasets Repository\n\nThis repository contains various datasets for data analysis, machine learning, and educational purposes. Below is a brief description of each dataset available in this repository.\n\n### Want to download any csv file for local use? Follow the steps mentioned below: 👇\n\n\u003col\u003e\n  \u003cli\u003eGo to a csv file in a repository of your choice\u003c/li\u003e\n  \u003cli\u003eFrom the top right bar just above the file section, select and click on \"Raw\" button\u003c/li\u003e\n  \u003cli\u003eA page will appear with comma separated data with no styling\u003c/li\u003e\n  \u003cli\u003eCopy the page url\u003c/li\u003e\n  \u003cli\u003eMake a folder in your desktop\u003c/li\u003e\n  \u003cli\u003eOpen that folder in your favourite code editor and make a simple python file inside the folder. Name it as you please.\u003c/li\u003e\n  \u003cli\u003eCopy this code [From the section below]\u003c/li\u003e\n  \u003cli\u003eRun the python file\u003c/li\u003e\n  \u003cli\u003eThe csv file will get downloaded within sometime, depending upon file size\u003c/li\u003e\n  \u003cli\u003eNow you are ready the use it locally!!\u003c/li\u003e\n\n\u003c/ol\u003e\n\n  ``` \n  import requests\n  import pandas as pd\n  url = '{(copied url here)}' \n  res = requests.get(url, allow_redirects=True)\n  with open('download_file_name.csv','wb') as file:\n      file.write(res.content)\n  download_file_name = pd.read_csv('download_file_name.csv') \n  ```  \n\n\n## Available Datasets\n\n### 1. BMI_Data.csv\n   - Contains Body Mass Index (BMI) data.\n   - Useful for health and fitness analysis.\n\n### 2. departments.csv\n   - Contains department-related information.\n   - Useful for organizational data processing.\n\n### 3. employees.csv\n   - Contains employee details.\n   - Can be used for HR analytics and workforce management.\n\n### 4. iris.csv\n   - Classic Iris dataset for machine learning.\n   - Contains different species of iris flowers with their measurements.\n\n### 5. item_similarity_df.csv\n   - Contains item similarity data.\n   - Useful for recommendation system development.\n\n### 6. movies.csv\n   - Dataset containing information about movies.\n   - Useful for movie recommendation models.\n\n### 7. music_genre.csv\n   - Contains music genre classification data.\n   - Can be used for genre prediction models.\n\n### 8. nielit.patt\n   - Not a database it's for AVR custom Marker\n\n### 9. pandas.csv\n   - Sample dataset for practicing pandas library operations.\n   - Useful for learning data manipulation.\n\n### 10. pandas_tutorial1.csv\n   - Another dataset for pandas tutorials.\n   - Contains structured data for training purposes.\n\n### 11. ratings.csv\n   - Contains user ratings for various items.\n   - Useful for collaborative filtering and recommendation systems.\n\n### 12. sample.csv\n   - A sample dataset.\n   - Can be used for testing and learning purposes.\n\n### 13. test.csv\n   - A test dataset.\n   - Used for validation and experimentation.\n\n[Explore More Datasets on my Kaggle](https://www.kaggle.com/datasets/princelv84/csv-datasets)\n\n## Usage\nThese datasets can be used for:\n- Machine learning projects\n- Data analysis and visualization\n- Educational and tutorial purposes\n\n## How to Contribute\nIf you have additional datasets to contribute, feel free to upload them and update this README with the necessary descriptions.\n\n## License\nThese datasets are provided for educational and research purposes. Please check individual datasets for any specific license information.\n\n---\nFor any questions or suggestions, feel free to raise an issue or contact Lovnish Verma.\n\n# 📊 Machine Learning Dataset Sources\n\nA list of public datasets for machine learning, AI, data science, and analytics projects.\n\n---\n\n## 🔹 General-Purpose ML Repositories\n\n- [UCI Machine Learning Repository](https://archive.ics.uci.edu/) – Classic datasets used in academic ML research.\n- [Kaggle Datasets](https://www.kaggle.com/datasets) – User-contributed datasets with competitions and notebooks.\n- [Google Dataset Search](https://datasetsearch.research.google.com/) – Dataset-specific search engine.\n- [AWS Open Data Registry](https://registry.opendata.aws/) – Public datasets hosted on AWS.\n- [Microsoft Azure Open Datasets](https://azure.microsoft.com/en-us/services/open-datasets/) – Curated datasets for training on Azure.\n- [OpenML](https://www.openml.org/) – Collaborative platform for sharing datasets and experiments.\n- [Papers with Code – Datasets](https://paperswithcode.com/datasets) – ML benchmarks tied to research papers.\n- [Hugging Face Datasets](https://huggingface.co/datasets) – NLP, vision, and multimodal datasets.\n- [Zenodo](https://zenodo.org/) – Scientific datasets with citation support.\n- [Figshare](https://figshare.com/) – Open-access research datasets.\n- [Data World](https://data.world/) – Community platform for data sharing.\n- [Awesome Public Datasets (GitHub)](https://github.com/awesomedata/awesome-public-datasets) – Curated list across domains.\n- [FiveThirtyEight Data](https://data.fivethirtyeight.com/) – Datasets used in data journalism.\n- [Quandl](https://www.quandl.com/) – Financial and economic data.\n\n---\n\n## 🔹 Government \u0026 Open Data Portals\n\n- [India AI – Dataset Repository](https://indiaai.gov.in/datasets) – Indian AI project datasets.\n- [Data.gov.in](https://data.gov.in/) – Indian government open data.\n- [Data.gov (USA)](https://data.gov/) – US federal open datasets.\n- [EU Open Data Portal](https://data.europa.eu/en) – Data from European institutions.\n- [UK Data Service](https://ukdataservice.ac.uk/) – Economic and social research datasets (UK).\n- [Canada Open Government](https://open.canada.ca/en/open-data) – Datasets from Canada.\n- [Australia Data Portal](https://data.gov.au/) – Australian government datasets.\n\n---\n\n## 🔹 Domain-Specific Datasets\n\n### 🖼️ Computer Vision\n\n- [ImageNet](http://www.image-net.org/) – Large-scale image classification dataset.\n- [COCO Dataset](https://cocodataset.org/) – Object detection, segmentation, and captioning.\n- [Open Images Dataset](https://storage.googleapis.com/openimages/web/index.html) – Annotated image data.\n- [Stanford Dogs Dataset](https://www.kaggle.com/jessicali9530/stanford-dogs-dataset) – Fine-grained image classification.\n\n### 🌐 Web \u0026 NLP\n\n- [Common Crawl](https://commoncrawl.org/) – Large-scale web crawl data.\n- [Wikipedia Dumps](https://dumps.wikimedia.org/) – Raw Wikipedia text.\n- [Project Gutenberg](https://www.gutenberg.org/) – Public domain books for NLP.\n- [TREC Question Classification](https://cogcomp.seas.upenn.edu/Data/QA/QC/) – NLP benchmark dataset.\n\n### 🧬 Bio, Medical \u0026 Health\n\n- [PhysioNet](https://physionet.org/) – Physiological and clinical data.\n- [MIMIC-III](https://mimic.physionet.org/) – ICU medical data (de-identified).\n- [NIH Biomedical Data](https://datascience.nih.gov/data) – NIH open data portal.\n- [Cancer Imaging Archive](https://www.cancerimagingarchive.net/) – Medical imaging data for cancer research.\n\n### 🗣️ Speech \u0026 Audio\n\n- [OpenSLR](https://www.openslr.org/) – Speech recognition datasets.\n- [LibriSpeech ASR](https://www.openslr.org/12/) – Audiobook dataset for speech recognition.\n\n### 🗺️ Maps \u0026 Geospatial\n\n- [OpenStreetMap (Geofabrik)](https://download.geofabrik.de/) – Extracts of OSM data.\n- [Google Open Buildings](https://sites.research.google/open-buildings/) – Global building footprints.\n\n---\n\n## ✅ Quick Access Table\n\n| Name | Domain | Link |\n|------|--------|------|\n| UCI ML Repo | General | [Link](https://archive.ics.uci.edu/) |\n| Kaggle | General | [Link](https://www.kaggle.com/datasets) |\n| IndiaAI | Govt (India) | [Link](https://indiaai.gov.in/datasets) |\n| Data.gov.in | Govt (India) | [Link](https://data.gov.in/) |\n| Data.gov | Govt (USA) | [Link](https://data.gov/) |\n| Data World | General | [Link](https://data.world/) |\n| Hugging Face | NLP/ML | [Link](https://huggingface.co/datasets) |\n| Papers with Code | Benchmarks | [Link](https://paperswithcode.com/datasets) |\n| Zenodo | Research | [Link](https://zenodo.org/) |\n\n---\n\n## 📌 Tip\n\nFor code integration and automatic downloads, you can often use Python libraries such as:\n\n```python\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"imdb\")  # Hugging Face example\n````\n\nYou can also automate downloads from Kaggle via API:\n\n```bash\nkaggle datasets download -d username/dataset-name\n```\n\n---\n\nFeel free to contribute more sources via pull request!\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovnishverma%2Fdatasets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flovnishverma%2Fdatasets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovnishverma%2Fdatasets/lists"}