{"id":17997922,"url":"https://github.com/tmattio/tf_datasets","last_synced_at":"2025-03-26T04:31:36.919Z","repository":{"id":103558561,"uuid":"95946472","full_name":"tmattio/tf_datasets","owner":"tmattio","description":"Python scripts to download public datasets and generate tfrecords.","archived":false,"fork":false,"pushed_at":"2017-10-20T01:35:20.000Z","size":54,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-20T16:16:40.269Z","etag":null,"topics":["dataset","tensorflow"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tmattio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-01T06:26:41.000Z","updated_at":"2020-04-05T14:40:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"32b03e85-6a23-43ef-915e-543be1342996","html_url":"https://github.com/tmattio/tf_datasets","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmattio%2Ftf_datasets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmattio%2Ftf_datasets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmattio%2Ftf_datasets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmattio%2Ftf_datasets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tmattio","download_url":"https://codeload.github.com/tmattio/tf_datasets/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245589265,"owners_count":20640254,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","tensorflow"],"created_at":"2024-10-29T21:23:07.090Z","updated_at":"2025-03-26T04:31:36.903Z","avatar_url":"https://github.com/tmattio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tensorflow Datasets\n\nPython scripts to download public datasets and generate tfrecords.\n\n## Features\n\n* Show progress of the download of the dataset\n* Show progress of the conversion of the dataset\n* Create the TFRecords datasets multithreaded\n* Split the TFRecords datasets into several shards\n\n## Usage\n\nTo install Tensorflow Datasets, you can install directly from sources:\n\n    git clone git@github.com:tmattio/tf_datasets.git\n    cd tf_datasets\n    make install\n\nTo download and create a dataset, you can use the `tf_datasets` command installed with the package:\n\n    # Create the MNIST dataset\n    tf_make_dataset --dataset_name=mnist --dataset_dir=data/mnist --cleanup\n\nTo use a dataset:\n\n    import tf_datasets as tfd\n\n    mnist = tfd.get_dataset('mnist', './data/mnist')\n    mnist.download()\n    mnist.extract()\n    mnist.convert()\n    mnist.cleanup()\n\n    # This will raise an error if the dataset does not exist\n    images, labels = mnist.load('train')\n\n## Supported Dataset\n\n### Image Classification\n\n* **mnist** - [MNIST](http://yann.lecun.com/exdb/mnist/): The MNIST database of handwritten digits\n* **flowers** - [Flowers](https://github.com/tensorflow/models/blob/master/slim/datasets/flowers.py): The Tensorflow flowers dataset.\n* **cifar10** - [Cifar-10](https://www.cs.toronto.edu/~kriz/cifar.html): The CIFAR-10 is a labeled subset of the 80 million tiny images dataset.\n* **cifar100** - [Cifar-100](https://www.cs.toronto.edu/~kriz/cifar.html): The CIFAR-100 is a labeled subset of the 80 million tiny images dataset.\n\n### Object Detection\n\n* **fddb** - [FDDB](http://vis-www.cs.umass.edu/fddb/): Face Detection Data Set and Benchmark\n* **wider_face** - [WIDER Face](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/): WIDER FACE: A Face Detection Benchmark\n* **svhn** - [SVHN](http://ufldl.stanford.edu/housenumbers/):The Street View House Numbers (SVHN) Dataset\n\n## TODO\n\n* Add unit tests\n* Add loads method for the datasets\n* Create API to download already created dataset\n* Support Caltech Pedestrian dataset\n* Support MSCoco dataset\n* Support Pascal VOC 2007/2012 dataset\n* Support CBSR-Webface dataset\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftmattio%2Ftf_datasets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftmattio%2Ftf_datasets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftmattio%2Ftf_datasets/lists"}