https://github.com/brookisme/tfr2human
easy parsing of Tensor Flow Records into dictionaries and numpy arrays
https://github.com/brookisme/tfr2human
Last synced: 10 months ago
JSON representation
easy parsing of Tensor Flow Records into dictionaries and numpy arrays
- Host: GitHub
- URL: https://github.com/brookisme/tfr2human
- Owner: brookisme
- Created: 2019-10-17T21:34:18.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2020-08-22T05:57:41.000Z (over 5 years ago)
- Last Synced: 2025-02-19T09:33:27.049Z (11 months ago)
- Language: Python
- Size: 25.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### TFR2Human
_easy parsing of TFRecords_
---
#### INSTALL
```bash
git clone https://github.com/brookisme/tfr2human.git
pip install -e tfr2human
```
---
#### PARSER
Usage (see complete [example below](#example)):
```python
TFR_LIST=
FEATURE_PROPS=
BANDS=
SIZE=
parser=tfp.TFRParser(
TFR_LIST,
specs=FEATURE_PROPS,
band_specs=BANDS,
dims=[SIZE,SIZE])
for i,element in enumerate(parser.dataset)
...
some_image=parser.image(element,bands=SOME_IM_BANDS,dtype=np.uint8)
some_data=parser.data(element,keys=SOME_KEYS)
```
---
#### UTILS
Here is a quick run down of the methods:
* get_batches: break datasets into batches.
- this is different than TF's [batch](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset#batch) since it returns batches of datasets to be parsed rather than parsing a batch at a time.
* image_profile: returns an image (rasterio) profile for a given lon/lat/crs/resolution/np.array
* gcs_service: returns a google cloud storage client
* save_to_gcs: save generic file to google cloud storage
* csv/image_to_gcs: save csv/image to google cloud storage
---
```python
#
# CONFIG
#
NOISY=True
NOISE_REDUCER=10
RESOLUTION=20
SIZE=384
MIN_WATER_RATIO=0.005
MAX_WATER_RATIO=0.96
MAX_WATER_NO_DATA_COUNT=int((SIZE**2)*(0.25))
MAX_S1_NAN_COUNT=int((SIZE**2)*(0.01))
MAX_S1_ZERO_COUNT=int((SIZE**2)*(0.1))
WATER_COLUMNS={
0: 'no_data_count',
1: 'not_water_count',
2: 'water_count'
}
#
# TFR Feature Specs
#
WATER_BANDS=['water']
S1_BANDS=['VV','VH','angle','VV_mean','VH_mean']
BANDS=S1_BANDS+WATER_BANDS
FEATURE_PROPS={
'tile_id': tf.string,
'crs': tf.string,
'year': tf.float32,
'month': tf.float32,
'lon': tf.float32,
'lat': tf.float32,
'x_offset': tf.float32,
'y_offset': tf.float32,
'biome_num': tf.float32,
'biome_name': tf.string,
'eco_id': tf.float32,
'eco_name': tf.string,
'grid': tf.string,
'grid_index': tf.int64
# 'nb_s1_images': tf.float32
}
```
```python
#
# HELPERS
#
def process_water(parser,element):
water=parser.image(element,bands=WATER_BANDS,dtype=np.uint8)
values,counts=np.unique(water,return_counts=True)
props={v: c for (v,c) in zip(values,counts)}
props={WATER_COLUMNS[i]: props.get(i,0) for i in range(3)}
water_ratio=props['water_count']/props['not_water_count']
props['water_ratio']=water_ratio
props['valid_water']=((MIN_WATER_RATIO<=water_ratio) and
(water_ratio