{"id":15885734,"url":"https://github.com/machinecurve/extra_keras_datasets","last_synced_at":"2026-04-04T06:04:14.108Z","repository":{"id":55987332,"uuid":"232652169","full_name":"machinecurve/extra_keras_datasets","owner":"machinecurve","description":"📃🎉 Additional datasets for tensorflow.keras","archived":false,"fork":false,"pushed_at":"2024-06-25T12:16:16.000Z","size":2535,"stargazers_count":32,"open_issues_count":6,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-03-16T06:51:58.037Z","etag":null,"topics":["data-science","dataset","datasets","deep-learning","emnist-digits","emnist-letters","iris","iris-classification","iris-dataset","keras","keras-datasets","keras-tensorflow","lowercase-handwritten-letters","machine-learning","neural-networks","svhn","tensorflow"],"latest_commit_sha":null,"homepage":"https://machinecurve.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/machinecurve.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"custom":["https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=48TCZ8MKGUZNE\u0026source=url"]}},"created_at":"2020-01-08T20:25:22.000Z","updated_at":"2025-01-30T08:43:40.000Z","dependencies_parsed_at":"2024-10-27T23:09:52.643Z","dependency_job_id":"78940801-ed09-4253-b5f4-30a6b5379756","html_url":"https://github.com/machinecurve/extra_keras_datasets","commit_stats":{"total_commits":116,"total_committers":3,"mean_commits":"38.666666666666664","dds":"0.017241379310344862","last_synced_commit":"a209532b33d62909b417a85a1428d597fb70b16f"},"previous_names":["christianversloot/extra_keras_datasets"],"tags_count":31,"template":false,"template_full_name":null,"purl":"pkg:github/machinecurve/extra_keras_datasets","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinecurve%2Fextra_keras_datasets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinecurve%2Fextra_keras_datasets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinecurve%2Fextra_keras_datasets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinecurve%2Fextra_keras_datasets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/machinecurve","download_url":"https://codeload.github.com/machinecurve/extra_keras_datasets/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinecurve%2Fextra_keras_datasets/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31389392,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T04:26:24.776Z","status":"ssl_error","status_checked_at":"2026-04-04T04:23:34.147Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","dataset","datasets","deep-learning","emnist-digits","emnist-letters","iris","iris-classification","iris-dataset","keras","keras-datasets","keras-tensorflow","lowercase-handwritten-letters","machine-learning","neural-networks","svhn","tensorflow"],"created_at":"2024-10-06T05:07:10.767Z","updated_at":"2026-04-04T06:04:14.086Z","avatar_url":"https://github.com/machinecurve.png","language":"Python","funding_links":["https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=48TCZ8MKGUZNE\u0026source=url"],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n \u003cimg src=\"assets/mc_logo.png\" width=\"200\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003cb\u003e📃🎉 Additional datasets for \u003ccode\u003etensorflow.keras\u003c/code\u003e\u003c/b\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003ePowered by MachineCurve at www.machinecurve.com 🚀\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://github.com/christianversloot/extra_keras_datasets/workflows/Tests/badge.svg\"\u003e\u003cimg src=\"https://github.com/christianversloot/extra_keras_datasets/workflows/Build and Publish/badge.svg\"\u003e\u003c/p\u003e\n\nHi there, and welcome to the `extra-keras-datasets` module! This extension to the original `tensorflow.keras.datasets` module offers easy access to additional datasets, in ways almost equal to how you're currently importing them.\n\n_The `extra-keras-datasets` module is not affiliated, associated, authorized, endorsed by, or in any way officially connected with TensorFlow, Keras, or any of its subsidiaries or its affiliates. The official TensorFlow and Keras websites can be found at https://www.tensorflow.org/ and https://keras.io/._\n\n_The names TensorFlow, Keras, as well as related names, marks, emblems and images are registered trademarks of their respective owners._\n\n## Table of Contents\n- [Table of Contents](#table-of-contents)\n- [How to use this module?](#how-to-use-this-module-)\n * [Dependencies](#dependencies)\n * [Installation procedure](#installation-procedure)\n- [Datasets](#datasets)\n * [EMNIST-Balanced](#emnist-balanced)\n * [EMNIST-ByClass](#emnist-byclass)\n * [EMNIST-ByMerge](#emnist-bymerge)\n * [EMNIST-Digits](#emnist-digits)\n * [EMNIST-Letters](#emnist-letters)\n * [EMNIST-MNIST](#emnist-mnist)\n * [KMNIST-KMNIST](#kmnist-kmnist)\n * [KMNIST-K49](#kmnist-k49)\n * [SVHN-Normal](#svhn-normal)\n * [SVHN-Extra](#svhn-extra)\n * [STL-10](#stl-10)\n * [Iris](#iris)\n * [Wine Quality dataset](#wine-quality-dataset)\n * [USPS Handwritten Digits Dataset](#usps-handwritten-digits-dataset)\n- [Contributors and other references](#contributors-and-other-references)\n- [License](#license)\n\n## How to use this module?\n### Dependencies\n**Make sure to install TensorFlow!**\nThis package makes use of the TensorFlow 2.x package and specifically `tensorflow.keras`. Therefore, make sure to install TensorFlow - you can do so in the following way:\n\n* `pip install tensorflow`\n\n### Installation procedure\nInstalling is really easy, and can be done with [PIP](https://pypi.org/project/extra-keras-datasets/): `pip install extra-keras-datasets`. The package depends on `numpy`, `scipy`, `pandas` and `scikit-learn`, which will be automatically installed.\n\n## Datasets\n\n### EMNIST-Balanced\nExtended MNIST (EMNIST) contains digits as well as uppercase and lowercase handwritten letters. `EMNIST-Balanced` contains 131.600 characters across 47 balanced classes.\n\n```\nfrom extra_keras_datasets import emnist\n(input_train, target_train), (input_test, target_test) = emnist.load_data(type='balanced')\n```\n\n\u003ca href=\"./assets/emnist-balanced.png\"\u003e\u003cimg src=\"./assets/emnist-balanced.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### EMNIST-ByClass\nExtended MNIST (EMNIST) contains digits as well as uppercase and lowercase handwritten letters. `EMNIST-ByClass` contains 814.255 characters across 62 unbalanced classes.\n\n```\nfrom extra_keras_datasets import emnist\n(input_train, target_train), (input_test, target_test) = emnist.load_data(type='byclass')\n```\n\n\u003ca href=\"./assets/emnist-byclass.png\"\u003e\u003cimg src=\"./assets/emnist-byclass.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### EMNIST-ByMerge\nExtended MNIST (EMNIST) contains digits as well as uppercase and lowercase handwritten letters. `EMNIST-ByMerge` contains 814.255 characters across 47 unbalanced classes.\n\n```\nfrom extra_keras_datasets import emnist\n(input_train, target_train), (input_test, target_test) = emnist.load_data(type='bymerge')\n```\n\n\u003ca href=\"./assets/emnist-bymerge.png\"\u003e\u003cimg src=\"./assets/emnist-bymerge.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### EMNIST-Digits\nExtended MNIST (EMNIST) contains digits as well as uppercase and lowercase handwritten letters. `EMNIST-Digits` contains 280.000 characters across 10 balanced classes (digits only).\n\n```\nfrom extra_keras_datasets import emnist\n(input_train, target_train), (input_test, target_test) = emnist.load_data(type='digits')\n```\n\n\u003ca href=\"./assets/emnist-digits.png\"\u003e\u003cimg src=\"./assets/emnist-digits.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### EMNIST-Letters\nExtended MNIST (EMNIST) contains digits as well as uppercase and lowercase handwritten letters. `EMNIST-Letters` contains 145.600 characters across 26 balanced classes (letters only).\n\n```\nfrom extra_keras_datasets import emnist\n(input_train, target_train), (input_test, target_test) = emnist.load_data(type='letters')\n```\n\n\u003ca href=\"./assets/emnist-letters.png\"\u003e\u003cimg src=\"./assets/emnist-letters.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### EMNIST-MNIST\nExtended MNIST (EMNIST) contains digits as well as uppercase and lowercase handwritten letters. `EMNIST-MNIST` contains 70.000 characters across 10 balanced classes (equal to `keras.datasets.mnist`).\n\n```\nfrom extra_keras_datasets import emnist\n(input_train, target_train), (input_test, target_test) = emnist.load_data(type='mnist')\n```\n\n\u003ca href=\"./assets/emnist-mnist.png\"\u003e\u003cimg src=\"./assets/emnist-mnist.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### KMNIST-KMNIST\nKuzushiji-MNIST is a drop-in replacement for the MNIST dataset: it contains 70.000 28x28 grayscale images of Japanese Kuzushiji characters.\n\n```\nfrom extra_keras_datasets import kmnist\n(input_train, target_train), (input_test, target_test) = kmnist.load_data(type='kmnist')\n```\n\n\u003ca href=\"./assets/kmnist-kmnist.png\"\u003e\u003cimg src=\"./assets/kmnist-kmnist.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### KMNIST-K49\nKuzushiji-49 extends Kuzushiji-MNIST and contains 270.912 images across 49 classes.\n\n```\nfrom extra_keras_datasets import kmnist\n(input_train, target_train), (input_test, target_test) = kmnist.load_data(type='k49')\n```\n\n\u003ca href=\"./assets/kmnist-k49.png\"\u003e\u003cimg src=\"./assets/kmnist-k49.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### SVHN-Normal\nThe Street View House Numbers dataset (SVHN) contains 32x32 cropped images of house numbers obtained from Google Street View. There are 73.257 digits for training and 26.032 digits for testing. **Noncommercial** use is allowed only: [see the SVHN website for more information](http://ufldl.stanford.edu/housenumbers/).\n\n```\nfrom extra_keras_datasets import svhn\n(input_train, target_train), (input_test, target_test) = svhn.load_data(type='normal')\n```\n\n\u003ca href=\"./assets/svhn-normal.png\"\u003e\u003cimg src=\"./assets/svhn-normal.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### SVHN-Extra\nSVHN-Extra extends SVHN-Normal with 531.131 less difficult samples and contains a total of 604.388 digits for training and 26.032 digits for testing. **Noncommercial** use is allowed only: [see the SVHN website for more information](http://ufldl.stanford.edu/housenumbers/).\n\n```\nfrom extra_keras_datasets import svhn\n(input_train, target_train), (input_test, target_test) = svhn.load_data(type='extra')\n```\n\n\u003ca href=\"./assets/svhn-extra.png\"\u003e\u003cimg src=\"./assets/svhn-extra.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### STL-10\nThe STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It contains 5.000 training images and 8.000 testing images, and represents 10 classes in total (airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck).\n\n```\nfrom extra_keras_datasets import stl10\n(input_train, target_train), (input_test, target_test) = stl10.load_data()\n```\n\n\u003ca href=\"./assets/stl10.png\"\u003e\u003cimg src=\"./assets/stl10.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### Iris\nThis is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda \u0026 Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.\n\nPredicted attribute: class of iris plant.\n\n```\nfrom extra_keras_datasets import iris\n(input_train, target_train), (input_test, target_test) = iris.load_data(test_split=0.2)\n```\n\n\u003ca href=\"./assets/iris.png\"\u003e\u003cimg src=\"./assets/iris.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### Wine Quality dataset\nThis dataset presents wine qualities related to red and white vinho verde wine samples, from the north of Portugal. According to the creators, \"[the] goal is to model wine quality based on physicochemical tests\". Various chemical properties of the wine are available as well (`inputs`) as well as the quality score (`targets`) for the wine.\n\n* Input structure: (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol, wine type)\n* Target structure: quality score between 0 and 10\n\n```\nfrom extra_keras_datasets import wine_quality\n(input_train, target_train), (input_test, target_test) = wine_quality.load_data(which_data='both', test_split=0.2, shuffle=True)\n```\n\n\u003ca href=\"./assets/wine_quality.jpg\"\u003e\u003cimg src=\"./assets/wine_quality.jpg\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n### USPS Handwritten Digits Dataset\nThis dataset presents thousands of 16x16 grayscale images of handwritten digits, generated from real USPS based mail.\n\n* Input structure: 16x16 image\n* Target structure: digit ranging from 0.0 - 9.0 describing the input\n\n```\nfrom extra_keras_datasets import usps\n(input_train, target_train), (input_test, target_test) = usps.load_data()\n```\n\n\u003ca href=\"./assets/usps.png\"\u003e\u003cimg src=\"./assets/usps.png\" width=\"100%\" style=\"border: 3px solid #f6f8fa;\" /\u003e\u003c/a\u003e\n\n---\n\n## Contributors and other references\n* **EMNIST dataset:**\n * Cohen, G., Afshar, S., Tapson, J., \u0026 van Schaik, A. (2017). EMNIST: an extension of MNIST to handwritten letters. Retrieved from http://arxiv.org/abs/1702.05373\n * [tlindbloom](https://stackoverflow.com/users/4008755/tlindbloom) on StackOverflow: [loading EMNIST-letters dataset](https://stackoverflow.com/questions/51125969/loading-emnist-letters-dataset/53547262#53547262) in [emnist.py](./emnist.py).\n* **KMNIST dataset:**\n * Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., \u0026 Ha, D. (2018). Deep learning for classical Japanese literature. arXiv preprint arXiv:1812.01718. Retrieved from https://arxiv.org/abs/1812.01718\n* **SVHN dataset:**\n * Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., \u0026 Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. Retrieved from http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf / http://ufldl.stanford.edu/housenumbers/\n* **STL-10 dataset:**\n * Coates, A., Ng, A., \u0026 Lee, H. (2011, June). An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 215-223). Retrieved from http://cs.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf\n* **Iris dataset:**\n * Fisher,R.A. \"The use of multiple measurements in taxonomic problems\" Annual Eugenics, 7, Part II, 179-188 (1936); also in \"Contributions to Mathematical Statistics\" (John Wiley, NY, 1950).\n* **Wine Quality dataset:**\n * P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.\n* **USPS Handwritten Digits Dataset**\n * Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence, 16(5), 550-554.\n\n## License\nThe licenseable parts of this repository are licensed under a [MIT License](./LICENSE), so you're free to use this repo in your machine learning projects / blogs / exercises, and so on. Happy engineering! 🚀\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachinecurve%2Fextra_keras_datasets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmachinecurve%2Fextra_keras_datasets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachinecurve%2Fextra_keras_datasets/lists"}