Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/franckalbinet/marisco

Encoding IAEA MARIS data as NetCDF and others.
https://github.com/franckalbinet/marisco

data marine-radioactivity

Last synced: 3 months ago
JSON representation

Encoding IAEA MARIS data as NetCDF and others.

Awesome Lists containing this project

README

        

# MARISCO

The [IAEA Marine Radioactivity Information System
(MARIS)](https://maris.iaea.org) provides open access to radioactivity
measurements in marine environments. Developed by the [IAEA
Environmental
Laboratories](https://www.iaea.org/about/organizational-structure/department-of-nuclear-sciences-and-applications/division-of-iaea-environment-laboratories)
in Monaco, MARIS offers data on seawater, biota, sediment, and suspended
matter.

This Python package includes command-line tools to convert MARIS
datasets into [`NetCDF`](https://www.unidata.ucar.edu/software/netcdf/)
or `.csv` formats, enhancing compatibility with various scientific and
data analysis software.

## Core Concept: Handlers

`marisco` is built around the concept of `handlers` - specialized
modules designed to convert MARIS datasets into NetCDF format. Each
handler is tailored to a specific data provider and implemented as a
dedicated Jupyter notebook.

### Literate Programming Approach

We’ve adopted a Literate Programming approach, which means:

1. **Documentation**: Each handler serves as comprehensive
documentation.
2. **Code Reference**: The notebooks contain the actual implementation
code.
3. **Communication Tool**: They facilitate discussions with data
providers about discrepancies or inconsistencies.

### Powered by nbdev

To achieve this, we leverage [nbdev](https://nbdev.fast.ai), a powerful
tool that allows us to:

1. Write code within Jupyter notebooks
2. Automatically export relevant parts as dedicated Python modules

This approach bridges the gap between documentation and implementation,
ensuring they remain in sync.

### See It in Action

For a concrete example of this approach, check out our [HELCOM dataset
handler
implementation](https://fr.anckalbi.net/marisco/handlers/helcom.html).

Please note that this project is **still under development**.

We have implemented the [MARIS Legacy
handler](https://fr.anckalbi.net/marisco/handlers/maris_legacy.html) to
convert all existing datasets from the MARIS master database into NetCDF
format. For datasets that are frequently updated, such as
[HELCOM](https://fr.anckalbi.net/marisco/handlers/helcom.html),
[OSPAR](https://www.ospar.org/), and TEPCO/Fukushima-related datasets,
individual handlers are currently being developed and will be available
soon.

## Install

Now, to install `marisco` simply run

``` console
pip install marisco
```

Once successfully installed, run the following command:

``` console
maris_init
```

This command:

1. creates a `.marisco/` directory containing various
configuration/configurable files ((below)) in your `/home` directory
2. creates a `configs.toml` file containing default but configurable
settings (default paths, …)
3. creates a configurable `cdl.toml` file used to generate a MARIS
[NetCDF4 CDL (Common Data
Language)](https://www.unidata.ucar.edu/software/netcdf/workshops/most-recent/nc3model/Cdl.html)
template;
4. downloads several MARIS DB nomenclature/lookup table into
`.marisco/lut/` directory
5. generate `maris-template.nc`, the MARIS NetCDF4 template generated
from `cdl.toml` and use to encode MARIS datasets

> [!TIP]
>
> For inexperienced Python users, please refers to [How to setup
> `Marisco` with
> Anaconda](https://github.com/franckalbinet/marisco/tree/main/install_configure_guide/windows_anaconda)
> or [How to setup `Marisco` with Windows Subsystem for Linux (WSL) and
> Visual Studio Code
> editor](https://github.com/franckalbinet/marisco/tree/main/install_configure_guide//windows_ubuntu_sub_system).

### Zotero API key

Upon conversion, `marisco` will automatically retrieve the bibliographic
metadata of each MARIS dataset from [Zotero](https://www.zotero.org/).
To do so, you need to define the following environment variable
`ZOTERO_API_KEY` containing the MARIS Zotero API key. Please contact the
MARIS team to get your API key.

## Getting started

### Command line utilities

All commands accept a `-h` argument to get access to its documentation.

#### `maris_init`

Create configuration files, MARIS NetCDF CDL (Common Data Language) and
donwload required lookup tables (nomenclatures).

#### `maris_create_nc_template`

Generate MARIS NetCDF template to be used when encoding datasets

#### `maris_netcdfy`

Encode MARIS dataset as NetCDF

Positional arguments:

- `handler_name`: Handler’s name (e.g helcom, …)
- `str`: Path to dataset to encode
- `dest`: Path to converted NetCDF4

Example:

``` console
maris_netcdfy helcom _data/accdb/mors/csv _data/output/helcom.nc
```

## Development

### FAQ

#### How is `cdl.toml` created & what it is used for?

A Python dictionary named `CONFIGS_CDL` specifying MARIS NetCDF
attributes, variables, dimensions, … is defined in
`nbs/api/configs.ipynb` in the first instance. Running the command
`maris_init` will generate a [`toml`](https://www.wikiwand.com/fr/TOML)
version of it named `.marisco/cdl.toml` further used to create a MARIS
NetCDF template named in `.marisco/maris-template.nc`.

Once `marisco` installed, further customization of the MARIS NetCDF
template can be done directly through `.marisco/cdl.toml` file then
running the command `maris_create_nc_template`.