https://github.com/rosejn/torch-datasets
A collection of machine learning datasets for use with Torch7.
https://github.com/rosejn/torch-datasets
Last synced: 12 months ago
JSON representation
A collection of machine learning datasets for use with Torch7.
- Host: GitHub
- URL: https://github.com/rosejn/torch-datasets
- Owner: rosejn
- License: bsd-3-clause
- Created: 2012-08-30T13:56:13.000Z (over 13 years ago)
- Default Branch: master
- Last Pushed: 2014-03-12T13:12:09.000Z (almost 12 years ago)
- Last Synced: 2025-03-19T01:08:12.410Z (12 months ago)
- Language: Lua
- Size: 426 KB
- Stars: 36
- Watchers: 8
- Forks: 19
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-machine-master - torch-datasets - Scripts to load several popular datasets including: (Lua)
- awesome-machine-learning - torch-datasets - Scripts to load several popular datasets including: (Lua / [Tools](#tools-1))
- awesome-machine-learning - torch-datasets - Scripts to load several popular datasets including: (Lua / Speech Recognition)
- awesome-machine-learning - torch-datasets - Scripts to load several popular datasets including: (Lua)
- fucking-awesome-machine-learning - torch-datasets - Scripts to load several popular datasets including: (Lua / [Tools](#tools-1))
- awesome-machine-learning - torch-datasets - Scripts to load several popular datasets including: (Lua / [Tools](#tools-1))
- awesome-machine-learning - torch-datasets - Scripts to load several popular datasets including: (Lua / Speech Recognition)
- awesome-machine-learning-cn - 官网
- awesome-machine-learning - torch-datasets - Scripts to load several popular datasets including: (Lua / [Tools](#tools-1))
- awesome-advanced-metering-infrastructure - torch-datasets - Scripts to load several popular datasets including: (Lua / Speech Recognition)
README
# Datasets
A collection of easy to use datasets for training and testing machine learning
algorithms with Torch7.
## Usage
require('dataset/mnist')
m = Mnist.dataset()
d:size() -- => 60000
d:sample(100) -- => {data = tensor, class = label}
-- scale values between [0,1] (by default they are in the range [0,255])
m = dataset.Mnist({scale = {0, 1}})
-- or normalize (subtract mean and divide by std)
m = dataset.Mnist({normalize = true})
-- only import a subset of the data (imports full 60,000 samples otherwise),
-- sorted by class label
m = dataset.Mnist({size = 1000, sort = true})
To process a randomly shuffled ordering of the dataset:
for sample in m:sampler() do
net:forward(sample.data)
end
Or access mini batches:
local batch = m:mini_batch(1)
-- or use directly
net:forward(m:mini_batch(1).data)
-- set the batch size using an options table
local batch = m:mini_batch(1, {size = 100})
To process the full dataset in randomly shuffled mini-batches:
for batch in m:mini_batches() do
net:forward(batch.data)
end
Generate animations over 10 frames for each sample, which will
randomly rotate, translate, and/or zoom within the ranges passed.
local anim_options = {
frames = 10,
rotation = {-20, 20},
translation = {-5, 5, -5, 5},
zoom = {0.6, 1.4}
}
s = dataset:sampler({animate = anim_options})
Standard pipeline options can be used to add post-processing stages (e.g. binarize and flatten):
s = dataset:sampler({pad = 5, binarize = true, flatten = true})
Pass a custom pipeline for processing samples:
s = dataset:sampler({pipeline = my_pipeline})
Create a dataset from bunch of images in a directory
require 'datset/imageset'
d = ImageSet.dataset({dir='your-data-directory'})
while true do w=image.display({image=d().data,win=w}) util.sleep(1/10) end
Create a dataset from bunch of videos in a directory
require 'datset/videoset'
d = VideoSet.dataset({dir='KTH'})
while true do w=image.display({image=d().data,win=w}) util.sleep(1/10) end