An open API service indexing awesome lists of open source software.

https://github.com/pckosek/easy_tfrecords

Python package to assist tfrecord file read and write
https://github.com/pckosek/easy_tfrecords

python tensorflow tensorflow-tutorials tfrecord tfrecords

Last synced: about 1 month ago
JSON representation

Python package to assist tfrecord file read and write

Awesome Lists containing this project

README

        

# easy_tfrecords

### this package is designed to assist reading and writing to tfrecord files in an intuitive way that preserves dtype and data structure

### _Purpose_:

The tfrecord format is a fast and powerful way of feeding data to a tensorflow model; it can automatically batch, randomize and iterate your data across epochs without special instructions. The **problem** with using tfrecord files comes from orchestrating the madness of matching feature structures across the reader, writer and fetcher.



The **easy_tfrecords** module contains methods and classes that allow you to write to and read from tfrecord files in a straightforward, extensible manner.

### _Features_:

- create tfrecord files
- read from single or multiple tfrecord files
- selectively read data from tfrecord files
- examine the data structure of tfrecord files

### _Usage_:

#### **Writing**

- Import data into python however you normally would (excel, pandas, csv, matlab, etc.)
- Reshape each of your arrays of features to `shape=[N, x[, y[, z[, etc.]]]]` where N is the number of features.
- Add multiple lists of features to the file as key-value pairs
#### **Reading**

- Create a reader class object, specifying your file list (can be length 1), optionally specifying batch size and shuffe spec.
- pass a list of which inputs to read from the file

#### Example Code:
```python
import numpy as np
import tensorflow as tf

from easy_tfrecords import create_tfrecords, easy_tfrecords as records

# CREATE SOME TEST DATA
x = np.array([[0, 0, 0, 0], [0, 0, 0, 0]], np.int32)
trainX = np.asarray( [x, x+1, x+2] )

y = np.array([0.25], np.float32)
trainY = np.asarray( [y, y+1, y+2] )

# CREATE AND SAVE TO A FEW TFRECORDS FILES
create_tfrecords('tfr_1.tf', x=trainX, y=trainY)
create_tfrecords('tfr_2.tf', x=trainX+10, y=trainY+10)
create_tfrecords('tfr_3.tf', x=trainX+100, y=trainY+100, z=trainY+100)

# INSTANTIATE THE RECORDS OBJECT
rec = records(files=['data_1.tf', 'data_2.tf'],
shuffle=False,
batch_size=1,
keys=['x', 'y'])

next_factory = rec.get_next_factory()

batch_x = next_factory['x']
batch_y = next_factory['y']

with tf.Session() as sess:

sess.run(rec.get_initializer())

for n in range(10):
print('------------')
print('n => {}\n'.format(n))

x_eval, y_eval = sess.run( [batch_x, batch_y] )
print('x_eval=\n{}\n'.format(x_eval))
print('y_eval=\n{}'.format(y_eval))

sess.close()
```
#### Output :
```
------------
n => 0

x_eval=
[[ 0.25]]

y_eval=
[[[0 0 0 0]
[0 0 0 0]]]
------------
n => 1

x_eval=
[[ 1.25]]

y_eval=
[[[1 1 1 1]
[1 1 1 1]]]
------------
n => 2

x_eval=
[[ 2.25]]

y_eval=
[[[2 2 2 2]
[2 2 2 2]]]
------------
n => 3

x_eval=
[[ 100.25]]

y_eval=
[[[100 100 100 100]
[100 100 100 100]]]
------------
n => 4

x_eval=
[[ 101.25]]

y_eval=
[[[101 101 101 101]
[101 101 101 101]]]
------------
n => 5

x_eval=
[[ 102.25]]

y_eval=
[[[102 102 102 102]
[102 102 102 102]]]
------------
n => 6

x_eval=
[[ 10.25]]

y_eval=
[[[10 10 10 10]
[10 10 10 10]]]
------------
n => 7

x_eval=
[[ 11.25]]

y_eval=
[[[11 11 11 11]
[11 11 11 11]]]
------------
n => 8

x_eval=
[[ 12.25]]

y_eval=
[[[12 12 12 12]
[12 12 12 12]]]
------------
n => 9

x_eval=
[[ 0.25]]

y_eval=
[[[0 0 0 0]
[0 0 0 0]]]
```