https://github.com/brookisme/tfr2human

easy parsing of Tensor Flow Records into dictionaries and numpy arrays
https://github.com/brookisme/tfr2human

Last synced: 10 months ago
JSON representation

easy parsing of Tensor Flow Records into dictionaries and numpy arrays

Host: GitHub
URL: https://github.com/brookisme/tfr2human
Owner: brookisme
Created: 2019-10-17T21:34:18.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2020-08-22T05:57:41.000Z (over 5 years ago)
Last Synced: 2025-02-19T09:33:27.049Z (11 months ago)
Language: Python
Size: 25.4 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          ### TFR2Human

_easy parsing of TFRecords_

---

#### INSTALL

```bash

git clone https://github.com/brookisme/tfr2human.git

pip install -e tfr2human

```

---

#### PARSER

Usage (see complete [example below](#example)):

```python

TFR_LIST=

FEATURE_PROPS=

BANDS=

SIZE=

parser=tfp.TFRParser(

    TFR_LIST,

    specs=FEATURE_PROPS,

    band_specs=BANDS,

    dims=[SIZE,SIZE])

for i,element in enumerate(parser.dataset)

    ...

    some_image=parser.image(element,bands=SOME_IM_BANDS,dtype=np.uint8)

    some_data=parser.data(element,keys=SOME_KEYS)

```

---

#### UTILS

Here is a quick run down of the methods:

* get_batches: break datasets into batches. 

    - this is different than TF's [batch](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset#batch) since it returns batches of datasets to be parsed rather than parsing a batch at a time.

* image_profile: returns an image (rasterio) profile for a given lon/lat/crs/resolution/np.array

* gcs_service: returns a google cloud storage client

* save_to_gcs: save generic file to google cloud storage

* csv/image_to_gcs: save csv/image to google cloud storage

---



#### EXAMPLE

```python

#

# CONFIG

#

NOISY=True

NOISE_REDUCER=10

RESOLUTION=20

SIZE=384

MIN_WATER_RATIO=0.005

MAX_WATER_RATIO=0.96

MAX_WATER_NO_DATA_COUNT=int((SIZE**2)*(0.25))

MAX_S1_NAN_COUNT=int((SIZE**2)*(0.01))

MAX_S1_ZERO_COUNT=int((SIZE**2)*(0.1))

WATER_COLUMNS={

    0: 'no_data_count',

    1: 'not_water_count',

    2: 'water_count'

}

#

# TFR Feature Specs

#

WATER_BANDS=['water']

S1_BANDS=['VV','VH','angle','VV_mean','VH_mean']

BANDS=S1_BANDS+WATER_BANDS

FEATURE_PROPS={

    'tile_id': tf.string,

    'crs': tf.string,

    'year': tf.float32,

    'month': tf.float32,

    'lon': tf.float32,

    'lat': tf.float32,

    'x_offset': tf.float32,

    'y_offset': tf.float32,

    'biome_num': tf.float32,

    'biome_name': tf.string,

    'eco_id': tf.float32,

    'eco_name': tf.string,

    'grid': tf.string,

    'grid_index': tf.int64

    # 'nb_s1_images': tf.float32

}

```

```python

#

# HELPERS

#

def process_water(parser,element):

  water=parser.image(element,bands=WATER_BANDS,dtype=np.uint8)

  values,counts=np.unique(water,return_counts=True)

  props={v: c for (v,c) in zip(values,counts)}

  props={WATER_COLUMNS[i]: props.get(i,0) for i in range(3)}

  water_ratio=props['water_count']/props['not_water_count']

  props['water_ratio']=water_ratio

  props['valid_water']=((MIN_WATER_RATIO<=water_ratio) and 

            (water_ratio

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/brookisme/tfr2human

Awesome Lists containing this project

README