https://github.com/kasanaa/mnist
This repo contains a mirror of the files from http://yann.lecun.com/exdb/mnist/ . The reason for this mirror is, that sometimes the download links on the site doesn't work and download of the files get stuck for several minutes.
https://github.com/kasanaa/mnist
learn mnist-data mnist-dataset mnist-image-dataset
Last synced: 17 days ago
JSON representation
This repo contains a mirror of the files from http://yann.lecun.com/exdb/mnist/ . The reason for this mirror is, that sometimes the download links on the site doesn't work and download of the files get stuck for several minutes.
- Host: GitHub
- URL: https://github.com/kasanaa/mnist
- Owner: KaSaNaa
- License: apache-2.0
- Created: 2024-06-14T16:06:00.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-06-14T16:13:13.000Z (11 months ago)
- Last Synced: 2025-02-15T08:38:09.253Z (2 months ago)
- Topics: learn, mnist-data, mnist-dataset, mnist-image-dataset
- Homepage:
- Size: 10.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MNIST
This repo contains a mirror of the files from http://yann.lecun.com/exdb/mnist/ .
See that website for more details of MNIST.The reason for this mirror is, that sometimes the download links on the site doesn't work and download of the files get stuck for several minutes.
But this only happens with `wget` and only on some computers.
With `curl` it was always possible to download the files.To download the files inside of a script, you can use the following links:
- `https://raw.githubusercontent.com/fgnt/mnist/master/train-images-idx3-ubyte.gz`
- `https://raw.githubusercontent.com/fgnt/mnist/master/train-labels-idx1-ubyte.gz`
- `https://raw.githubusercontent.com/fgnt/mnist/master/t10k-images-idx3-ubyte.gz`
- `https://raw.githubusercontent.com/fgnt/mnist/master/t10k-labels-idx1-ubyte.gz`and in the following you can find download code for python (original came from https://cntk.ai/pythondocs/CNTK_103A_MNIST_DataLoader.html):
```python
def get_mnist():
# The code to download the mnist data original came from
# https://cntk.ai/pythondocs/CNTK_103A_MNIST_DataLoader.html
import gzip
import numpy as np
import os
import structfrom urllib.request import urlretrieve
def load_data(src, num_samples):
print("Downloading " + src)
gzfname, h = urlretrieve(src, "./delete.me")
print("Done.")
try:
with gzip.open(gzfname) as gz:
n = struct.unpack("I", gz.read(4))
# Read magic number.
if n[0] != 0x3080000:
raise Exception("Invalid file: unexpected magic number.")
# Read number of entries.
n = struct.unpack(">I", gz.read(4))[0]
if n != num_samples:
raise Exception(
"Invalid file: expected {0} entries.".format(num_samples)
)
crow = struct.unpack(">I", gz.read(4))[0]
ccol = struct.unpack(">I", gz.read(4))[0]
if crow != 28 or ccol != 28:
raise Exception(
"Invalid file: expected 28 rows/cols per image."
)
# Read data.
res = np.frombuffer(
gz.read(num_samples * crow * ccol), dtype=np.uint8
)
finally:
os.remove(gzfname)
return res.reshape((num_samples, crow, ccol)) / 256def load_labels(src, num_samples):
print("Downloading " + src)
gzfname, h = urlretrieve(src, "./delete.me")
print("Done.")
try:
with gzip.open(gzfname) as gz:
n = struct.unpack("I", gz.read(4))
# Read magic number.
if n[0] != 0x1080000:
raise Exception("Invalid file: unexpected magic number.")
# Read number of entries.
n = struct.unpack(">I", gz.read(4))
if n[0] != num_samples:
raise Exception(
"Invalid file: expected {0} rows.".format(num_samples)
)
# Read labels.
res = np.frombuffer(gz.read(num_samples), dtype=np.uint8)
finally:
os.remove(gzfname)
return res.reshape((num_samples))def try_download(data_source, label_source, num_samples):
data = load_data(data_source, num_samples)
labels = load_labels(label_source, num_samples)
return data, labels
# Not sure why, but yann lecun's website does no longer support
# simple downloader. (e.g. urlretrieve and wget fail, while curl work)
# Since not everyone has linux, use a mirror from uni server.
# server = 'http://yann.lecun.com/exdb/mnist'
server = 'https://raw.githubusercontent.com/fgnt/mnist/master'
# URLs for the train image and label data
url_train_image = f'{server}/train-images-idx3-ubyte.gz'
url_train_labels = f'{server}/train-labels-idx1-ubyte.gz'
num_train_samples = 60000print("Downloading train data")
train_features, train_labels = try_download(url_train_image, url_train_labels, num_train_samples)# URLs for the test image and label data
url_test_image = f'{server}/t10k-images-idx3-ubyte.gz'
url_test_labels = f'{server}/t10k-labels-idx1-ubyte.gz'
num_test_samples = 10000print("Downloading test data")
test_features, test_labels = try_download(url_test_image, url_test_labels, num_test_samples)
return train_features, train_labels, test_features, test_labels
```