https://github.com/pckosek/easy_tfrecords
Python package to assist tfrecord file read and write
https://github.com/pckosek/easy_tfrecords
python tensorflow tensorflow-tutorials tfrecord tfrecords
Last synced: about 1 month ago
JSON representation
Python package to assist tfrecord file read and write
- Host: GitHub
- URL: https://github.com/pckosek/easy_tfrecords
- Owner: pckosek
- Created: 2018-11-06T17:50:43.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-12-02T02:22:32.000Z (over 6 years ago)
- Last Synced: 2025-03-15T11:45:57.369Z (2 months ago)
- Topics: python, tensorflow, tensorflow-tutorials, tfrecord, tfrecords
- Language: Python
- Homepage:
- Size: 12.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# easy_tfrecords
### this package is designed to assist reading and writing to tfrecord files in an intuitive way that preserves dtype and data structure
### _Purpose_:
The tfrecord format is a fast and powerful way of feeding data to a tensorflow model; it can automatically batch, randomize and iterate your data across epochs without special instructions. The **problem** with using tfrecord files comes from orchestrating the madness of matching feature structures across the reader, writer and fetcher.
The **easy_tfrecords** module contains methods and classes that allow you to write to and read from tfrecord files in a straightforward, extensible manner.### _Features_:
- create tfrecord files
- read from single or multiple tfrecord files
- selectively read data from tfrecord files
- examine the data structure of tfrecord files### _Usage_:
#### **Writing**
- Import data into python however you normally would (excel, pandas, csv, matlab, etc.)
- Reshape each of your arrays of features to `shape=[N, x[, y[, z[, etc.]]]]` where N is the number of features.
- Add multiple lists of features to the file as key-value pairs
#### **Reading**
- Create a reader class object, specifying your file list (can be length 1), optionally specifying batch size and shuffe spec.
- pass a list of which inputs to read from the file#### Example Code:
```python
import numpy as np
import tensorflow as tffrom easy_tfrecords import create_tfrecords, easy_tfrecords as records
# CREATE SOME TEST DATA
x = np.array([[0, 0, 0, 0], [0, 0, 0, 0]], np.int32)
trainX = np.asarray( [x, x+1, x+2] )y = np.array([0.25], np.float32)
trainY = np.asarray( [y, y+1, y+2] )# CREATE AND SAVE TO A FEW TFRECORDS FILES
create_tfrecords('tfr_1.tf', x=trainX, y=trainY)
create_tfrecords('tfr_2.tf', x=trainX+10, y=trainY+10)
create_tfrecords('tfr_3.tf', x=trainX+100, y=trainY+100, z=trainY+100)# INSTANTIATE THE RECORDS OBJECT
rec = records(files=['data_1.tf', 'data_2.tf'],
shuffle=False,
batch_size=1,
keys=['x', 'y'])next_factory = rec.get_next_factory()
batch_x = next_factory['x']
batch_y = next_factory['y']with tf.Session() as sess:
sess.run(rec.get_initializer())
for n in range(10):
print('------------')
print('n => {}\n'.format(n))x_eval, y_eval = sess.run( [batch_x, batch_y] )
print('x_eval=\n{}\n'.format(x_eval))
print('y_eval=\n{}'.format(y_eval))sess.close()
```
#### Output :
```
------------
n => 0x_eval=
[[ 0.25]]y_eval=
[[[0 0 0 0]
[0 0 0 0]]]
------------
n => 1x_eval=
[[ 1.25]]y_eval=
[[[1 1 1 1]
[1 1 1 1]]]
------------
n => 2x_eval=
[[ 2.25]]y_eval=
[[[2 2 2 2]
[2 2 2 2]]]
------------
n => 3x_eval=
[[ 100.25]]y_eval=
[[[100 100 100 100]
[100 100 100 100]]]
------------
n => 4x_eval=
[[ 101.25]]y_eval=
[[[101 101 101 101]
[101 101 101 101]]]
------------
n => 5x_eval=
[[ 102.25]]y_eval=
[[[102 102 102 102]
[102 102 102 102]]]
------------
n => 6x_eval=
[[ 10.25]]y_eval=
[[[10 10 10 10]
[10 10 10 10]]]
------------
n => 7x_eval=
[[ 11.25]]y_eval=
[[[11 11 11 11]
[11 11 11 11]]]
------------
n => 8x_eval=
[[ 12.25]]y_eval=
[[[12 12 12 12]
[12 12 12 12]]]
------------
n => 9x_eval=
[[ 0.25]]y_eval=
[[[0 0 0 0]
[0 0 0 0]]]
```