https://github.com/brookisme/eeuploader
a CLI and python-module for uploading (lots of) images to Google Earth Engine
https://github.com/brookisme/eeuploader
Last synced: 7 months ago
JSON representation
a CLI and python-module for uploading (lots of) images to Google Earth Engine
- Host: GitHub
- URL: https://github.com/brookisme/eeuploader
- Owner: brookisme
- Created: 2020-05-02T01:55:37.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-05-05T22:33:24.000Z (over 5 years ago)
- Last Synced: 2025-06-14T04:06:05.136Z (7 months ago)
- Language: Python
- Homepage:
- Size: 121 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### EE Uploader
_a CLI and python-module for uploading (lots of) images to Google Earth Engine_
---
1. [Install](#install)
2. [Quick Start](#quickstart)
3. [Project Setup](#setup)
4. [EEImagesUp Docs](#pydocs)
5. [Requirements](#requirments)
---
### INSTALL
Note: During these early days this must be installed locally but it will be pushed to PIP soon. Similarly, one of the requirements, [mproc](https://github.com/brookisme/mproc) should also be installed locally.
```bash
git clone https://github.com/wri/dl_exporter.git
pushd dl_exporter
pip install -e .
popd
```
---
### QUICK START EXAMPLES
Note: these examples use a feature collection file ([fc.geojson](#fcgeojson)) and args file ([upargs.yaml](#upargsyaml)) that are described in detail [below](#setup).
##### CLI
```bash
# print info before run output includes:
# - the number-of-features
# - an example upload manifest (defaults to the first feature)
eeuploader info fc.geojson upargs.yaml
# or using kwargs instead of arg-config file
eeuploader info fc.geojson user=brookwilliams collection=IM_COLLECTION_NAME
# - save all upload manifests to a pickle file
eeuploader info fc.geojson upargs.yaml --dest manifests.p --all true
# - save upload manifest for indices 1,10,100 to a json file
eeuploader info fc.geojson upargs.yaml --dest manifests.json --save_as json --indices 1,10,100
# upload images to a collection (as above kwargs can be used instead of an arg-config file)
# - all images
eeuploader upload fc.geojson upargs.yaml
# - the first 10 image features
eeuploader upload fc.geojson upargs.yaml --limit 10
# - image features from 10 to 20
eeuploader upload fc.geojson upargs.yaml --index_range 10,20
# - image features 3,400,24
eeuploader upload fc.geojson upargs.yaml --index_range 3,400,24
# - image features 3,400,24 with overwrite=True
eeuploader upload fc.geojson upargs.yaml --index_range 3,400,24 force=True
```
##### PYTHON
```python
import eeuploader.image as eup
up=eup.EEImagesUp(
USER,
features='fc.geojson',
collection=IC,
start_time_key='date',
no_data=0,
force=False)
# print manifest for first feature
pprint(up.manifest(1))
# upload first feature
up.upload(1)
""" ouput (returns without waiting for task to finish)
{'id': 'XER4J2RLIOWJRYNRMC7EALXH',
'name': 'projects/earthengine-legacy/operations/XER4J2RLIOWJRYNRMC7EALXH',
'started': 'OK'}
"""
# upload_collection
up.upload_collection()
print(up.tasks)
""" output (waits for all tasks to complete to return)
[{'creation_timestamp_ms': 1588615171535,
'description': 'Ingest image: '
'"projects/earthengine-legacy/assets/users/..."',
'destination_uris': ['https://...'],
'id': 'VRJ3JFOPHL5ZFKVXTOKB4JEA',
'name': 'projects/earthengine-legacy/operations/VRJ3JFOPHL5ZFKVXTOKB4JEA',
'start_timestamp_ms': 1588615180457,
'state': 'COMPLETED',
'task_type': 'INGEST_IMAGE',
'update_timestamp_ms': 1588615262051},...]
"""
# upload some random thing
up.upload(
uri='gs://bucket/path/to/image.tif',
crs='epsg:32717',
properties={
'property_1': 123
'property_2': '2018-01-01',
'propery_3': 'important piece of information'
},
start_time='2019-04-02' )
```
---
### PROJECT SETUP
1. features_collection file*: JSON containing a `features`-list, where each feature must has a `properties`-dict.
2. (optional) args file: a yaml file containing default arguments for `EEImagesUp.__init__`
* NOTE: Technically you can use `EEImagesUp` without the features_collection file, either for single uploads or by passing a feature-collection-python-dict instead of a file path.
##### FEATURE COLLECTION EXAMPLE:
```json
# fc.geojson
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"id": "0",
"geometry": {
"type": "Polygon",
"coordinates": [[[-63.374182,-3.93245],[-63.328243,-3.932469],[-63.328225,-3.886331],[-63.374161,-3.886312],[-63.374182,-3.93245]]]
},
"properties": {
"gcs": "v1/data/dev/WH/1/dw_-66.3899849022_-2.5792093972-20190820.tif",
"crs": "epsg:32720",
"date": "2019-08-12",
"biome": 1,
"biome_name": "Tropical & Subtropical Moist Broadleaf Forests",
"NBPixels": 250596,
"BareGround": 0.0004948203482896778,
"BuiltArea": 0.001017574103337643,
"Clouds": 0,
"Crops": 0,
"FloodedVegetation": 0.00739836230426663,
"Grass": 0.00702724704304937,
"Scrub": 0,
"Snow Ice": 0,
"Trees": 0.4854546760522913,
"Water": 0.4986073201487654,
"S2_DATASTRIP_ID": "S2B_OPER_MSI_L2A_DS_SGS__20190812T165657_S20190812T143758_N02.13",
"S2_GEE_ID": "20190812T143759_20190812T143758_T20MMA",
"S2_GRID": "20MMA",
"S2_LEVEL": "2A",
"S2_METHOD": "firstNonNull",
"S2_PRODUCT_ID": "S2B_MSIL2A_20190812T143759_N0213_R096_T20MMA_20190812T165657",
"S2_SENSING_ORBIT_DIRECTION": "DESCENDING",
"dw_id": "dw_-63.3512901800_-3.9093045532-20190812",
"eco_region": "Purus v�rzea",
"flipped": false,
"folder": "v1/data/dev/WorkForce/WH/1",
"lat": -3.9093045532,
"lon": -63.35129018,
"map": "https://www.google.com/maps/@-3.9093046188354488,-63.35129165649414,14z/data=!3m1!1e3",
"timestamp": 1565568000000,
}
},
...
]
}
```
##### UPLOAD ARGS EXAMPLE:
```yaml
# upargs.yaml
user: projects/wri-datalab
collection: image_collection_name
bands: null
band_names:
- lulc
pyramiding_policy: mode
no_data: 0
exclude:
- flipped
- map
start_time_key: date
end_time_key: null
days_delta: 1
crs_key: crs
uri_key: gcs
name_key: ee_name
force: false
noisy: false
raise_error: false
```
---
### EEImagesUp DOCS
METHODS:
1. [Initializer](#up-init)
2. [manifest](#up-manifest)
3. [upload](#up-upload)
4. [upload_collection](#up-upload_collection)
##### EEImagesUp.\_\_init\_\_
```python
"""
Args:
user:
gee user or project root
* if it begins with "users" or "projects" the string is unaltered
* otherwise it is pre-pended with "users"
features:
features list or file path to (geo)json feature collection
* if dict or loaded from file path the features list is assumed to
be under the the key "features"
* if None feat(s) or feat properties must be passed directly to the
public methods.
* otherwise feature indices can be used for manifest/upload/upload_collection
collection:
name of image_collection/folder to upload the images.
note: since the main purpose of this script is to upload many features
it attempts to force you to use specify a collection. if you want
to upload to your user/project folder root pass "False"
bands:
** alternatively specify `band_names` (see below) **
a manifest band list as specified here https://developers.google.com/earth-engine/image_manifest#bands
note: every image being uploaded must have the same band structure
band_names:
** ignored if `bands` is not specified **
generates a manifest band list as specified here https://developers.google.com/earth-engine/image_manifest#bands
from a list of band_names
note: every image being uploaded must have the same band structure
pyramiding_policy:
one of MEAN, MODE, SAMPLE (upper or lower case is fine). default=MEAN
note: for band-level control must use `bands` not `band_names` above
no_data:
no_data value(s) or "missing_data" object described here https://developers.google.com/earth-engine/image_manifest#bands
include:
feature-property-keys to include as ee.image-properties
* if None all the feature-property-keys will be included unless `exclude` list is provided
exclude:
** ignored if `include` is provided **
feature-property-keys to exclude as ee.image-properties
start_time_key,end_time_key,crs_key,uri_key,name_key:
if start_time/end_time/crs/... not provided at run time the system will
attempt to find them in the feature-properties using these keys
days_delta:
if not False, and start_time is provided (or found with start_time_key), and end_time is
not provided or found, end_time will be created by adding `days_delta` number of days
to the start_time.
timeout:
how quickly to timeout if `wait` is set to true. defaults to TIMEOUT above.
force:
set to true to overwrite existing assets
noisy:
print progress during `upload_collection`
raise_error:
raise_errors during `upload_collection`
Usage:
import eeuploader.image as eup
import eeuploader.gee_utils as gutils
up=eup.EEImagesUp(
'projects/wri-datalab',
features='dw_organized_features.geojson',
collection='image_collection_name',
start_time_key='date',
no_data=0,
force=True)
# print nb-features and manifest for first feature
print('NB FEATURES:',len(up.features))
pprint(up.manifest(0))
# upload the first feature / print task status
# note: `upload` does not wait for task to complete.
# set `wait=True` to wait for task to complete
print(up.upload(0))
gutils.task_info(up.task_id)
# upload the first 3 features / print task final task status for each
up.upload_collection(limit=3)
print(up.tasks)
"""
```
##### EEImagesUp.manifest
```python
""" manifest for single upload or manifests list
Args:
feat:
a feature dictionary containing a properties dictionary
from which it can pull the uri, crs, ee.image-properties, ...
uri:
google cloud storage uri (with or without the preceding "gs://")
or gcs url for image asset.
name:
name of the new ee.image. if not provided it will create a name
from the uri. `.`s will be replaced with `d` due to ee-naming policies.
tileset_id:
if not provided one will be created from the name
crs:
crs of image (for example 'epsg:4326')
propertie:
updates any features existing in feat['properties']
start/end_time:
strings should be in YYYY-MM-DD format
if start_time but end_time is None, and self.days_delta end_time
will be set start_time+(self.days_delta)days
features:
list of features or feature-indices. if exists the returned manifest
will be a list of upload manifests
dest:
if dest: manifest will be saved and dest will be returned
otherwise: manifest will be returned
save_as:
* file-type: one of ['json', 'pickle']
* defaults to 'pickle'
indent:
if saving to json: indent pretty printing arg.
Returns:
* Manifest for a single upload
* Destination of saved file
"""
```
##### EEImagesUp.upload
```python
""" single upload
Note: if `wait=False` the upload will not wait for task to complete.
Args:
**feat/uri/.../start_time/end_time (see manifest doc-string)**
manifest:
upload manifest. if provided ignores all other arguments an upload
using this manifest
wait:
wait for task to complete
noisy:
print progress during upload
raise_error:
raise_errors during upload
Sets:
self.task_id: task id
self.task: task status
Returns:
task status
"""
```
##### EEImagesUp.upload_collection
```python
""" upload set of features in batches
* This method will always wait for tasks to complete before returning.
* `nb_batches` should be understood as the max number of simultaneous
requests for ee-image-uploads
Args:
features:
* list of features or feature indices in self.features to upload
* if not provided upload all the features in self.features
limit:
* limit features to first `limit`-elements
nb_batches:
divide uploads into `nb_batches` groups and upload them simultaneously
Sets:
self.tasks: list of task status
"""
```
---
### REQUIRMENTS
* https://github.com/brookisme/mproc
* https://click.palletsprojects.com/en/7.x/
* https://pyyaml.org/wiki/PyYAMLDocumentation
* https://pypi.org/project/geojson/
* https://pypi.org/project/Unidecode/