{"id":15043336,"url":"https://github.com/franckalbinet/marisco","last_synced_at":"2025-04-14T20:52:37.065Z","repository":{"id":61227926,"uuid":"548881844","full_name":"franckalbinet/marisco","owner":"franckalbinet","description":"Encoding IAEA MARIS data as NetCDF and others.","archived":false,"fork":false,"pushed_at":"2025-02-18T10:30:33.000Z","size":34213,"stargazers_count":4,"open_issues_count":8,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-28T09:04:43.544Z","etag":null,"topics":["data","iaea","marine-radioactivity","netcdf4"],"latest_commit_sha":null,"homepage":"https://fr.anckalbi.net/marisco/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/franckalbinet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-10T10:31:12.000Z","updated_at":"2025-02-24T18:15:46.000Z","dependencies_parsed_at":"2024-01-29T14:41:34.209Z","dependency_job_id":"d63770c1-e790-4724-ab17-d2ea123db0c5","html_url":"https://github.com/franckalbinet/marisco","commit_stats":{"total_commits":244,"total_committers":3,"mean_commits":81.33333333333333,"dds":"0.20491803278688525","last_synced_commit":"c11be583b07687917282636df5f86b3f17ea571d"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franckalbinet%2Fmarisco","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franckalbinet%2Fmarisco/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franckalbinet%2Fmarisco/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franckalbinet%2Fmarisco/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/franckalbinet","download_url":"https://codeload.github.com/franckalbinet/marisco/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248960988,"owners_count":21189990,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","iaea","marine-radioactivity","netcdf4"],"created_at":"2024-09-24T20:48:52.349Z","updated_at":"2025-04-14T20:52:37.049Z","avatar_url":"https://github.com/franckalbinet.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MARISCO\n\n\n\u003c!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! --\u003e\n\nThe [IAEA **M**arine **R**adioactivity **I**nformation **S**ystem\n(MARIS)](https://maris.iaea.org) provides open access to radioactivity\nmeasurements in marine environments. Developed by the [IAEA Marine\nEnvironmental\nLaboratories](https://www.iaea.org/about/organizational-structure/department-of-nuclear-sciences-and-applications/division-of-iaea-environment-laboratories)\nin Monaco, MARIS offers data on seawater, biota, sediment, and suspended\nmatter.\n\nThis Python package includes command-line tools to convert MARIS\ndatasets into [`NetCDF`](https://www.unidata.ucar.edu/software/netcdf/)\nor `.csv` formats, enhancing compatibility with various scientific and\ndata analysis software.\n\n## Core Concept: Handlers\n\n`marisco` is built around the concept of `handlers` - specialized\nmodules designed to convert MARIS datasets into NetCDF format. Each\nhandler is tailored to a specific data provider and implemented as a\ndedicated Jupyter notebook.\n\n### Literate Programming Approach\n\nWe’ve adopted a Literate Programming approach, which means:\n\n1.  **Documentation**: Each handler serves as comprehensive\n    documentation.\n2.  **Code Reference**: The notebooks contain the actual implementation\n    code.\n3.  **Communication Tool**: They facilitate discussions with data\n    providers about discrepancies or inconsistencies.\n\n### Powered by nbdev\n\nTo achieve this, we leverage [nbdev](https://nbdev.fast.ai), a powerful\ntool that allows us to:\n\n1.  Write code within Jupyter notebooks\n2.  Automatically export relevant parts as dedicated Python modules\n\nThis approach bridges the gap between documentation and implementation,\nensuring they remain in sync.\n\n### See It in Action\n\nFor a concrete example of this approach, check out our [OSPAR dataset\nhandler\nimplementation](https://fr.anckalbi.net/marisco/handlers/ospar.html).\n\n### List of currently available handlers\n\nMARISCO includes a suite of specialized data handlers designed to:\n\n- Convert provider-specific data formats into standardized MARIS NetCDF\n  files\n- Ensure data quality and consistency across providers\n- Facilitate integration with the MARIS marine radioactivity database\n- Support automated data processing workflows\n\nThe following handlers are currently implemented:\n\n| Handler | Description | Link to Data Source |\n|----|----|----|\n| [MARIS Legacy](https://fr.anckalbi.net/marisco/handlers/maris_legacy.html) | All legacy MARIS datasets from the MARIS Master Database | \\- |\n| [HELCOM](https://fr.anckalbi.net/marisco/handlers/helcom.html) | HELCOM marine environment protection datasets | [HELCOM](https://helcom.fi/about-us) |\n| [OSPAR](https://fr.anckalbi.net/marisco/handlers/ospar.html) | OSPAR marine environment datasets | [ODIMS OSPAR](https://odims.ospar.org/en/) |\n| [TEPCO](https://fr.anckalbi.net/marisco/handlers/tepco.html) | TEPCO Fukushima monitoring data | [TEPCO Monitoring](https://radioactivity.nsr.go.jp/ja/list/349/list-1.html) |\n| [GEOTRACES](https://fr.anckalbi.net/marisco/handlers/geotraces.html) | BODC GEOTRACES oceanographic data | [GEOTRACES IDP2021](https://www.geotraces.org/geotraces-intermediate-data-product-2021/) |\n\n## Install\n\nNow, to install `marisco` simply run\n\n``` console\npip install marisco\n```\n\nOnce successfully installed, run the following command:\n\n``` console\nmaris_init\n```\n\nThis command:\n\n1.  creates a `.marisco/` directory containing various\n    configuration/configurable files ((below)) in your `/home`\n    directory;\n2.  creates a `configs.toml` file containing default but configurable\n    settings (default paths, …);\n3.  downloads several MARIS DB nomenclature/lookup table into\n    `.marisco/lut/` directory;\n4.  downloads `maris-template.nc`, the MARIS NetCDF4 template.\n\n### Zotero API key\n\nUpon conversion, `marisco` will automatically retrieve the bibliographic\nmetadata of each MARIS dataset from [Zotero](https://www.zotero.org/).\nTo do so, you need to define the following environment variable\n`ZOTERO_API_KEY` containing the MARIS Zotero API key. Please contact the\nMARIS team to get your API key.\n\n## Getting started\n\n### Command line utilities\n\nAll commands accept a `-h` argument to get access to its documentation.\n\n#### `maris_init`\n\nDownload configuration file, NetCDF MARIS template and required lookup\ntables (nomenclatures).\n\n#### `maris_to_nc`\n\nConvert `helcom`, `geotraces`, `tepco` or `ospar` marine radioactivity\ndatasets to MARIS NetCDF4 format.\n\n    usage: maris_to_nc [-h] [--src SRC] ds dest\n\n    positional arguments:\n      ds          Name of the dataset to encode as NetCDF4\n      dest        Output path for NetCDF file\n\n    options:\n      -h, --help  show this help message and exit\n      --src SRC   Optional input data path only required for the 'GEOTRACES' dataset\n\nFor instance: `maris_to_nc ospar 191-OSPAR-2024.nc`\n\n#### `maris_db_to_nc`\n\nThe MARIS Master Database integrates two types of datasets:\n\n- Historical datasets retrieved from published scientific papers\n- Ongoing monitoring data from international programs like `HELCOM`,\n  `OSPAR`, `TEPCO`, and `GEOTRACES`\n\nThis command-line utility converts MARIS datasets from their legacy\nformat to NetCDF4, making them more accessible for modern data analysis\nworkflows. Users can either convert the entire database or specify\nparticular datasets by their reference IDs for selective conversion.\n\n    usage: maris_db_to_nc [-h] [--ref_ids REF_IDS] src dest\n\n    Convert MARIS legacy database to NetCDF4 format. If ref_ids is provided as comma-separated values, only encodes those subsets.\n\n    positional arguments:\n      src                Path to MARIS database dump as `.txt` file\n      dest               Output path for NetCDF file(s)\n\n    options:\n      -h, --help         show this help message and exit\n      --ref_ids REF_IDS  Optional comma-separated reference IDs (e.g., \"123,456,789\") (default: )\n\nFor instance:\n\n- `maris_db_to_nc \"~/pro/data/maris/2024-11-20 MARIS_QA_shapetype_id=1.txt\" ~/pro/tmp/output`  \n- or\n  `maris_db_to_nc \"~/pro/data/maris/2024-11-20 MARIS_QA_shapetype_id=1.txt\" ~/pro/tmp/output --ref_ids=\"16,30\"`\n  for a subset of the MARIS Master Database.\n\n#### `maris_nc_to_csv`\n\nThis utility converts NetCDF files to CSV files that conform to the\nMARIS Standard format, originally designed for OpenRefine workflows.\n\nAlthough MARISCO has now superseded OpenRefine in the data preparation\npipeline, the MARIS master database continues to require CSV inputs in\nthis legacy format. This command-line utility, built with the MARISCO\nlibrary, handles the conversion process.\n\n    usage: maris_nc_to_csv [-h] src dest\n\n    Converts NetCDF files into CSV files that follow the MARIS Standard format.\n\n    positional arguments:\n      src         Input path and filename for NetCDF file\n      dest        Output path and filename (without extension) for CSV file\n\n    options:\n      -h, --help  show this help message and exit\n\nFor instance:\n`maris_nc_to_csv ~/pro/tmp/output/191-OSPAR-2024.nc ~/pro/tmp/output/191-OSPAR-2024`\n\n\u003e [!TIP]\n\u003e\n\u003e ### Note\n\u003e\n\u003e When specifying the destination path (e.g.,\n\u003e `~/pro/tmp/output/191-OSPAR-2024`), the utility automatically appends\n\u003e the MARIS sample type to the filename. For example:\n\u003e\n\u003e - `191-OSPAR-2024_BIOTA.csv` for biological samples\n\u003e\n\u003e While this specific example produces only a BIOTA file, the utility\n\u003e can generate multiple files (one per sample type) depending on the\n\u003e content of the source dataset. This reflects the NetCDF4 file\n\u003e structure, where each MARIS sample type is stored as a separate group\n\u003e within the file.\n\n## Development\n\nThe MARIS NetCDF template is generated from `nbs/files/cdl/maris.cdl`\nCommon Data Language (CDL) file as defined by\n[Unidata](https://docs.unidata.ucar.edu/). To generate the MARIS NetCDF\ntemplate `nbs/files/nc/maris-template.nc`, install the\n[NetCDF-C](https://pjbartlein.github.io/REarthSysSci/install_netCDF.html)\nutilities, once in `Marisco` home directory, run:\n\n``` console\nncgen -4 -o nc/maris-template.nc cdl/maris.cdl\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffranckalbinet%2Fmarisco","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffranckalbinet%2Fmarisco","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffranckalbinet%2Fmarisco/lists"}