https://github.com/linkml/prefixmaps
Semantic prefix map registry
https://github.com/linkml/prefixmaps
bioregistry linkml obofoundry prefixmaps
Last synced: 5 months ago
JSON representation
Semantic prefix map registry
- Host: GitHub
- URL: https://github.com/linkml/prefixmaps
- Owner: linkml
- License: apache-2.0
- Created: 2022-08-22T21:27:19.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-14T23:39:36.000Z (about 1 year ago)
- Last Synced: 2024-05-08T00:27:36.159Z (12 months ago)
- Topics: bioregistry, linkml, obofoundry, prefixmaps
- Language: Python
- Homepage: https://linkml.io/prefixmaps/
- Size: 5.61 MB
- Stars: 9
- Watchers: 8
- Forks: 3
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# prefixmaps
A Python library for retrieving semantic prefix maps.
A semantic prefix map will map a a prefix (e.g. `skos`) to a namespace (e.g `http://www.w3.org/2004/02/skos/core#`).
This repository and the corresponding library is designed to satisfy the following requirements:
- generation of prefix maps in headers of RDF documents
- use in tools that expand CURIEs and short-form identifiers to URIs that can be used as subjects of RDF triples
- coverage of prefixes from multiple different domains
- no single authoritative source of either prefixes or prefix-namespace mappings (clash-resilient)
- preferred semantic namespace is prioritized over web URLs
- authority preferred prefix is prioritized where possible
- each individual prefix map is case-insensitive bijective
- prefix map composition and custom ordering of prefixmaps
- lightweight / low footprint
- fast (TODO)
- network-independence / versioned prefix maps
- optional ability to retrieve latest from external authority on networkWhat this is NOT intended for:
- a general source of metadata about either prefixes or namespaces
- a mechanism for resolving identifiers to web URLs for humans to find information## Installation
```shell
pip install prefixmaps
```## Usage
To use in combination with [curies](https://github.com/cthoyt/curies) library:
```python
from prefixmaps import load_converter
from curies import Converterconverter: Converter = load_converter(["obo", "bioregistry.upper", "linked_data", "prefixcc"])
>>> converter.expand("CHEBI:1")
'http://purl.obolibrary.org/obo/CHEBI_1'
>>> converter.expand("GEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
>>> converter.expand("owl:Class")
'http://www.w3.org/2002/07/owl#Class'
>>> converter.expand("FlyBase:FBgn123")
'http://identifiers.org/fb/FBgn123'
```### Alternate orderings / clash resilience
- prefix.cc uses the prefix `geo` for geosparql `http://www.opengis.net/ont/geosparql#`
- OBO uses prefix `GEO` for the [Geographical Entity Ontology](https://obofoundry.org/ontology/geo), expanding to `http://purl.obolibrary.org/obo/GEO_`
- the Bioregistry uses the prefix [`geo`](https://bioregistry.io/registry/geo) for NCBI GEO, and "re-mints" a [`geogeo`](https://bioregistry.io/registry/geogeo) prefix for the OBO ontologyIf we prioritize prefix.cc the OBO prefix is ignored:
```python
converter = load_converter(["prefixcc", "obo"])>>> converter.expand("GEO:1")
>>> converter.expand("geo:1")
'http://www.opengis.net/ont/geosparql#1'
```Even though prefix expansion is case-sensitive, we intentionally block conflicts that differ only in case.
If we push `bioregistry` at the start of the list then GEOGEO can be used as the prefix for the OBO ontology:
```python
converter = load_converter(["bioregistry", "prefixcc", "obo"])>>> converter.expand("geo:1")
'http://identifiers.org/geo/1'
>>> converter.expand("GEO:1")
>>> converter.expand("GEOGEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
```Note that from the OBO perspective, GEOGEO is non-canonical.
We get similar results using the upper-normalized variant of `bioregistry`:
```python
converter = load_converter(["bioregistry.upper", "prefixcc", "obo"])>>> converter.expand("GEO:1")
'http://identifiers.org/geo/1'
>>> converter.expand("geo:1")
>>> converter.expand("GEOGEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
```Users of OBO ontologies will want to place OBO at the start of the list:
```python
converter = load_converter(["obo", "bioregistry.upper", "prefixcc"])>>> converter.expand("geo:1")
>>> converter.expand("GEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
>>> converter.expand("GEOGEO:1")
```Note under this ordering there is no prefix for NCBI GEO. This is not
a major limitation as there is no canonical semantic rendering of NCBI
GEO. This could be added in future with a unique OBO prefix.You can use the ready-made "merged" prefix set, which prioritizes OBO:
```python
converter = load_converter("merged")>>> converter.expand("GEOGEO:1")
>>> converter.expand("GEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
>>> converter.expand("geo:1")
```### Network independence and requesting latest versions
By default, this will make use of metadata distributed alongside the package. This has certain advantages in terms
of reproducibility, but it means if a new ontology or prefix is added to an upstream source you won't see this.To refresh and use the latest upstream:
```python
converter = load_converter("obo", refresh=True)
```This will perform a fetch from http://obofoundry.org/registry/obo_prefixes.ttl
## Context Metadata
See [contexts.curated.yaml](src/prefixmaps/data/contexts.curated.yaml)
See the description fields
## Repository organization
Data files containing pre-build prefix maps using sources like OBO and Bioregistry are distributed alongside the python
Location:
* [src/prefixmaps/data](src/prefixmaps/data/)
### CSV field descriptions
1. context: a unique handle for this context. This MUST be the same as the basename of the file
2. prefix: corresponds to http://www.w3.org/ns/shacl#prefix
3. namespace: corresponds to http://www.w3.org/ns/shacl#namespace
4. canonical: true if this satisfies bijectivity### Refreshing the Data
The data can be refreshed in several ways:
1. Locally, you can use `tox` with:
```shell
pip install tox tox-poetry
tox -e refresh
```
2. Manually running and automatically committing via [this GitHub Actions workflow](https://github.com/linkml/prefixmaps/blob/main/.github/workflows/refresh.yaml).
3. Running makefile (warning, this requires some pre-configuration
```shell
make etl
```TODO: make a github action that auto-releases new versions
Note that PRs should *not* be made against the individual CSV files. These are generated from upstream sources.
We temporarily house a small number of curated prefixmaps such as [linked_data.yaml](https://github.com/linkml/prefixmaps/blob/main/src/prefixmaps/data/linked_data.curated.yaml), with the CSV generated from the YAML.
Our goal is to ultimately cede these to upstream sources.
## Requesting new prefixes
This repo is NOT a prefix registry. Its job is simply to aggregate
different prefix maps. Request changes upstream.