Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/franckalbinet/marisco
Encoding IAEA MARIS data as NetCDF and others.
https://github.com/franckalbinet/marisco
data marine-radioactivity
Last synced: 3 months ago
JSON representation
Encoding IAEA MARIS data as NetCDF and others.
- Host: GitHub
- URL: https://github.com/franckalbinet/marisco
- Owner: franckalbinet
- License: apache-2.0
- Created: 2022-10-10T10:31:12.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-05T12:16:33.000Z (3 months ago)
- Last Synced: 2024-11-05T13:30:38.128Z (3 months ago)
- Topics: data, marine-radioactivity
- Language: Jupyter Notebook
- Homepage: https://fr.anckalbi.net/marisco/
- Size: 28.9 MB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MARISCO
The [IAEA Marine Radioactivity Information System
(MARIS)](https://maris.iaea.org) provides open access to radioactivity
measurements in marine environments. Developed by the [IAEA
Environmental
Laboratories](https://www.iaea.org/about/organizational-structure/department-of-nuclear-sciences-and-applications/division-of-iaea-environment-laboratories)
in Monaco, MARIS offers data on seawater, biota, sediment, and suspended
matter.This Python package includes command-line tools to convert MARIS
datasets into [`NetCDF`](https://www.unidata.ucar.edu/software/netcdf/)
or `.csv` formats, enhancing compatibility with various scientific and
data analysis software.## Core Concept: Handlers
`marisco` is built around the concept of `handlers` - specialized
modules designed to convert MARIS datasets into NetCDF format. Each
handler is tailored to a specific data provider and implemented as a
dedicated Jupyter notebook.### Literate Programming Approach
We’ve adopted a Literate Programming approach, which means:
1. **Documentation**: Each handler serves as comprehensive
documentation.
2. **Code Reference**: The notebooks contain the actual implementation
code.
3. **Communication Tool**: They facilitate discussions with data
providers about discrepancies or inconsistencies.### Powered by nbdev
To achieve this, we leverage [nbdev](https://nbdev.fast.ai), a powerful
tool that allows us to:1. Write code within Jupyter notebooks
2. Automatically export relevant parts as dedicated Python modulesThis approach bridges the gap between documentation and implementation,
ensuring they remain in sync.### See It in Action
For a concrete example of this approach, check out our [HELCOM dataset
handler
implementation](https://fr.anckalbi.net/marisco/handlers/helcom.html).Please note that this project is **still under development**.
We have implemented the [MARIS Legacy
handler](https://fr.anckalbi.net/marisco/handlers/maris_legacy.html) to
convert all existing datasets from the MARIS master database into NetCDF
format. For datasets that are frequently updated, such as
[HELCOM](https://fr.anckalbi.net/marisco/handlers/helcom.html),
[OSPAR](https://www.ospar.org/), and TEPCO/Fukushima-related datasets,
individual handlers are currently being developed and will be available
soon.## Install
Now, to install `marisco` simply run
``` console
pip install marisco
```Once successfully installed, run the following command:
``` console
maris_init
```This command:
1. creates a `.marisco/` directory containing various
configuration/configurable files ((below)) in your `/home` directory
2. creates a `configs.toml` file containing default but configurable
settings (default paths, …)
3. creates a configurable `cdl.toml` file used to generate a MARIS
[NetCDF4 CDL (Common Data
Language)](https://www.unidata.ucar.edu/software/netcdf/workshops/most-recent/nc3model/Cdl.html)
template;
4. downloads several MARIS DB nomenclature/lookup table into
`.marisco/lut/` directory
5. generate `maris-template.nc`, the MARIS NetCDF4 template generated
from `cdl.toml` and use to encode MARIS datasets> [!TIP]
>
> For inexperienced Python users, please refers to [How to setup
> `Marisco` with
> Anaconda](https://github.com/franckalbinet/marisco/tree/main/install_configure_guide/windows_anaconda)
> or [How to setup `Marisco` with Windows Subsystem for Linux (WSL) and
> Visual Studio Code
> editor](https://github.com/franckalbinet/marisco/tree/main/install_configure_guide//windows_ubuntu_sub_system).### Zotero API key
Upon conversion, `marisco` will automatically retrieve the bibliographic
metadata of each MARIS dataset from [Zotero](https://www.zotero.org/).
To do so, you need to define the following environment variable
`ZOTERO_API_KEY` containing the MARIS Zotero API key. Please contact the
MARIS team to get your API key.## Getting started
### Command line utilities
All commands accept a `-h` argument to get access to its documentation.
#### `maris_init`
Create configuration files, MARIS NetCDF CDL (Common Data Language) and
donwload required lookup tables (nomenclatures).#### `maris_create_nc_template`
Generate MARIS NetCDF template to be used when encoding datasets
#### `maris_netcdfy`
Encode MARIS dataset as NetCDF
Positional arguments:
- `handler_name`: Handler’s name (e.g helcom, …)
- `str`: Path to dataset to encode
- `dest`: Path to converted NetCDF4Example:
``` console
maris_netcdfy helcom _data/accdb/mors/csv _data/output/helcom.nc
```## Development
### FAQ
#### How is `cdl.toml` created & what it is used for?
A Python dictionary named `CONFIGS_CDL` specifying MARIS NetCDF
attributes, variables, dimensions, … is defined in
`nbs/api/configs.ipynb` in the first instance. Running the command
`maris_init` will generate a [`toml`](https://www.wikiwand.com/fr/TOML)
version of it named `.marisco/cdl.toml` further used to create a MARIS
NetCDF template named in `.marisco/maris-template.nc`.Once `marisco` installed, further customization of the MARIS NetCDF
template can be done directly through `.marisco/cdl.toml` file then
running the command `maris_create_nc_template`.