Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://gitlab.com/wgms/glathida

Glacier Thickness Database (GlaThiDa) https://www.gtn-g.ch/data_catalogue_glathida
https://gitlab.com/wgms/glathida

collaboration database datapackage glacier glaciology global

Last synced: 3 months ago
JSON representation

Glacier Thickness Database (GlaThiDa) https://www.gtn-g.ch/data_catalogue_glathida

Awesome Lists containing this project

README

        

# `glathida` Glacier Thickness Database

Worldwide dataset of glacier thickness observations aggregated from literature review, data submissions, and published data.

## Versions

- `v3.1.0`: The latest published version of `glathida` is available from the GTN-G website (https://doi.org/10.5904/wgms-glathida-2020-10). It is described in:

- Welty et al. (2020). _Worldwide version-controlled database of glacier thickness observations_. Earth System Science Data, 12, 3039–3055. https://doi.org/10.5194/essd-12-3039-2020

- `v4-beta`: The next version is in development and available from this GitLab repository (https://gitlab.com/wgms/glathida). There is yet no timeline for publication of this version.

Note that the data structure has changed (see [documentation](https://gitlab.com/wgms/glathida/-/blob/main/documentation.md#data-structure)). Tables `T`, `TT`, and `TTT` are now `glacier`, `band`, and `point`. An additional `survey` table makes it possible to submit `point` measurements without a corresponding entry in `glacier`.

## Contribute

Bug reports, data submissions, and other issues can be posted to the issue tracker at https://gitlab.com/wgms/glathida/-/issues. Submitters are encouraged to validate their data before submission (see the [Developer guide](https://gitlab.com/wgms/glathida#developer-guide) further below).

### Submission: as issue

For users who are not familiar with Git, data can be submitted as an issue.

- Create a new issue: https://gitlab.com/wgms/glathida/-/issues/new
- In the `Choose a template` dropdown, select `Data submission`.
- Fill out and follow the instructions in the issue template.
- Link to your data or attach it directly to the issue. Ideally, the data should be a spreadsheet or CSV files with column names and values as described in the [documentation](https://gitlab.com/wgms/glathida/-/blob/main/documentation.md#data-structure).
- Click `Create issue`.

Your submission will be reviewed by a maintainer, who will create a merge request.

### Submission: as merge request

Users familiar with Git are encouraged to submit a merge request directly.

- Fork the repository.
- Create a new branch in your fork.
- Add data to a new subdirectory of `submissions` (for example, `submissions/{investigator name}-{survey or publication year}-{glacier name}`). Data should be CSV files structured as described in the [documentation](https://gitlab.com/wgms/glathida/-/blob/main/documentation.md#data-structure).
- Modify or remove existing data (in `/data`) as needed.
- Create a merge request.

Continuous integrations tests will automatically check your submission. If they fail, you are encouraged to commit further changes until they pass. The merge request will be reviewed by a maintainer and hopefully merged!

## Developer guide

Clone the repository and move into the directory.

```sh
git clone https://gitlab.com/wgms/glathida.git
cd glathida
```

Create the `glathida` [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) environment and activate it.

```sh
conda env create --file environment.yaml
conda activate glathida
```

Run tests on the metadata (`datapackage.yaml`) and data (`data/*`).

```sh
pytest
```

Or test a data submission.

```sh
python -m tests.check_submission path/to/submission
# For example:
# python -m tests.check_submission data/24k-glacier-2019
```

## User guide (Python)

### Read data

The legacy (`v3`) data is stored as CSV files directly in the `data` subdirectory (e.g. `data/point.csv`), but new data is stored in subdirectories of `data` (for example, `data/24k-glacier-2019/point.csv`). To read all data as unified tables, the provided `helpers.read_data` function can be used. This assumes the `glathida` Python environment has been activated (see above).

```py
from tests import helpers

dfs = helpers.read_data()
dfs.keys()
# dict_keys(['glacier', 'point', 'survey', 'band'])
```

Otherwise, the following can be used in a typical Python environment.

```py
from collections import defaultdict
from pathlib import Path

import pandas as pd

paths = Path('data').rglob('*.csv')
dfs = defaultdict(list)
for path in paths:
df = pd.read_csv(path, low_memory=False)
dfs[path.stem].append(df)
dfs = {
name: pd.concat(df_list, ignore_index=True)
for name, df_list in dfs.items()
}
```

### Assign RGI ID to points

Assumes the data are already loaded in `dfs` as above.

```py
import geopandas as gpd

# Convert points DataFrame to a GeoDataFrame
df = dfs['point']
points = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df['longitude'], df['latitude'], crs=4326)
)

# Read RGI inventory
rgi = gpd.read_file('path/to/rgi')

# Add RGI ID to points using spatial indexing (for speed)
# NOTE: points.sjoin(rgi, how='left', predicate='within') may be as fast
# NOTE: Assumes RGI ID is in column 'rgi_id' (v7). Use 'RGIId' for v6.
points['rgi_id'] = None
i_rgi, i_points = points.sindex.query(rgi.geometry, predicate='intersects')
points.loc[points.index[i_points], 'rgi_id'] = rgi.iloc[i_rgi]['RGIId'].values
```