{"id":17921630,"url":"https://github.com/rosejn/torch-datasets","last_synced_at":"2025-03-24T01:32:34.911Z","repository":{"id":4476621,"uuid":"5615576","full_name":"rosejn/torch-datasets","owner":"rosejn","description":"A collection of machine learning datasets for use with Torch7.","archived":false,"fork":false,"pushed_at":"2014-03-12T13:12:09.000Z","size":436,"stargazers_count":36,"open_issues_count":3,"forks_count":19,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-19T01:08:12.410Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rosejn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-08-30T13:56:13.000Z","updated_at":"2025-01-21T04:05:20.000Z","dependencies_parsed_at":"2022-08-06T17:00:18.534Z","dependency_job_id":null,"html_url":"https://github.com/rosejn/torch-datasets","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosejn%2Ftorch-datasets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosejn%2Ftorch-datasets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosejn%2Ftorch-datasets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosejn%2Ftorch-datasets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rosejn","download_url":"https://codeload.github.com/rosejn/torch-datasets/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245194534,"owners_count":20575770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-28T20:35:10.641Z","updated_at":"2025-03-24T01:32:31.303Z","avatar_url":"https://github.com/rosejn.png","language":"Lua","funding_links":[],"categories":["Lua"],"sub_categories":["Tools","[Tools](#tools-1)","Speech Recognition"],"readme":"# Datasets\n\nA collection of easy to use datasets for training and testing machine learning\nalgorithms with Torch7.\n\n\n## Usage\n\n    require('dataset/mnist')\n    m = Mnist.dataset()\n    d:size()                      -- =\u003e 60000\n    d:sample(100)                 -- =\u003e {data = tensor, class = label}\n\n    -- scale values between [0,1] (by default they are in the range [0,255])\n    m = dataset.Mnist({scale = {0, 1}})\n\n    -- or normalize (subtract mean and divide by std)\n    m = dataset.Mnist({normalize = true})\n\n    -- only import a subset of the data (imports full 60,000 samples otherwise),\n    -- sorted by class label\n    m = dataset.Mnist({size = 1000, sort = true})\n\n\nTo process a randomly shuffled ordering of the dataset:\n\n    for sample in m:sampler() do\n      net:forward(sample.data)\n    end\n\n\nOr access mini batches:\n\n    local batch = m:mini_batch(1)\n\n    -- or use directly\n    net:forward(m:mini_batch(1).data)\n\n    -- set the batch size using an options table\n    local batch = m:mini_batch(1, {size = 100})\n\n\nTo process the full dataset in randomly shuffled mini-batches:\n\n    for batch in m:mini_batches() do\n       net:forward(batch.data)\n    end\n\n\nGenerate animations over 10 frames for each sample, which will\nrandomly rotate, translate, and/or zoom within the ranges passed.\n\n    local anim_options = {\n        frames      = 10,\n        rotation    = {-20, 20},\n        translation = {-5, 5, -5, 5},\n        zoom        = {0.6, 1.4}\n     }\n     s = dataset:sampler({animate = anim_options})\n\n\nStandard pipeline options can be used to add post-processing stages (e.g. binarize and flatten):\n\n     s = dataset:sampler({pad = 5, binarize = true, flatten = true})\n\n\nPass a custom pipeline for processing samples:\n\n     s = dataset:sampler({pipeline = my_pipeline})\n\n\nCreate a dataset from bunch of images in a directory\n\n     require 'datset/imageset'\n     d = ImageSet.dataset({dir='your-data-directory'})\n     while true do w=image.display({image=d().data,win=w}) util.sleep(1/10) end\n\nCreate a dataset from bunch of videos in a directory\n\n     require 'datset/videoset'\n     d = VideoSet.dataset({dir='KTH'})\n     while true do w=image.display({image=d().data,win=w}) util.sleep(1/10) end\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosejn%2Ftorch-datasets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frosejn%2Ftorch-datasets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosejn%2Ftorch-datasets/lists"}