Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/guillep/idx-reader
Reader for the IDX format written in Pharo
https://github.com/guillep/idx-reader
Last synced: 16 days ago
JSON representation
Reader for the IDX format written in Pharo
- Host: GitHub
- URL: https://github.com/guillep/idx-reader
- Owner: guillep
- License: mit
- Created: 2017-02-23T11:00:52.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-03-01T16:45:38.000Z (almost 8 years ago)
- Last Synced: 2024-10-31T06:42:07.687Z (2 months ago)
- Language: Smalltalk
- Size: 30.3 KB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Idx-reader
This package implements a reader for the IDX format written in Pharo.Idx is a format designed to store vectors and multi dimensional matrixes. This format is used by the MNIST dataset of handwritten digits (http://yann.lecun.com/exdb/mnist/).
## Idx Format
The following description of the format is taken from the original website (http://yann.lecun.com/exdb/mnist/).
The IDX file format is a simple format for vectors and multidimensional matrices of various numerical types. The basic format is
```
header
size in dimension 0
size in dimension 1
size in dimension 2
.....
size in dimension N
data
```The header (also called a magic number in the original description) is a big endian integer where.
- The first 2 bytes are always 0.
- The third byte codes the type of the data:
- 0x08: unsigned byte
- 0x09: signed byte
- 0x0B: short (2 bytes)
- 0x0C: int (4 bytes)
- 0x0D: float (4 bytes)
- 0x0E: double (8 bytes)
- The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....The sizes in each dimension are 4-byte big endian integers.
The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.## Installation
Evaluate the following metacello expression to load it in your Pharo environment.
```smalltalk
Metacello new
baseline: 'IdxReader';
repository: 'github://guillep/idx-reader';
load.
```## Usage
An IdxReader works as a stream. You first create a reader on a binary file stream:
```smalltalk
reader := IdxReader onStream: (File named: 'path/to/your/idxfile') readStream.
```And then you ask it for the next element:
```smalltalk
matrix := reader next.
```The returned object is an array of arrays that depends on the file you're reading. For example, if your file contains a single dimensional data, then you will get an array with data. Likewise, if your file contains 2-dimensional data you will get an array of arrays with data.