An open API service indexing awesome lists of open source software.

https://github.com/dataoneorg/mnlite

Light weight read-only DataONE member node in Python Flask
https://github.com/dataoneorg/mnlite

json-ld python3 repository

Last synced: 2 months ago
JSON representation

Light weight read-only DataONE member node in Python Flask

Awesome Lists containing this project

README

          

# mnlite

Light weight read-only DataONE member node in Python Flask

## Development Notes

Creating a MN with node identifier `urn:node:mn_1`:

----
workon mnlite
export FLASK_APP=mnlite
mkdir -p instance/nodes/mn_1
flask m_node new_node mn_1
----

Add a subject to the MN:

----
opersist -f instance/nodes/mn_1 sub -o create -n "Dave" -s 'https://orcid.org/0000-0002-6513-4996'
----

Adjust the node configuration to specify `default_submitter`, `default_owner`, `base_url`, and `contact_subject`:

----
{
"node": {
"node_id": "urn:node:mn_1",
"state": "up",
"name": "Unnamed member node: mn_1",
"description": "No description available for this node.",
"base_url": "http://localhost:5000/mn_1/",
"schedule": {
"hour": "*",
"day": "*",
"min": "0,10,20,30,40,50",
"mon": "*",
"sec": "5",
"wday": "*",
"year": "*"
},
"subject": "http://localhost:5000/mn_1",
"contact_subject": "https://orcid.org/0000-0002-6513-4996"
},
"content_database": "sqlite:///content.db",
"log_database": "sqlite:///eventlog.db",
"data_folder": "data",
"created": "2021-02-19T15:17:09+0000",
"default_submitter": "https://orcid.org/0000-0002-6513-4996",
"default_owner": "https://orcid.org/0000-0002-6513-4996"
"spider":{
"sitemap_urls":[
"https://datadryad.org/sitemap.xml"
]
}
}
----

The `mnlite` service:

----
workon mnlite
export FLASK_APP=mnlite
export FLASK_ENV=development
flask run
----

The `soscan` service:

----
workon mnlite

----

Collecting content from a source.

Implemented as a scrapy based crawler. Given a sitemap, crawls
and adds discovered SO:Dataset entries to the persistence store.

Settings are in `settings.py`

----
workon mnlite
scrapy crawl JsonldSpider -s STORE_PATH=instance/nodes/mn_3
----

To count sitemap loc entries only:

----
scrapy crawl JsonldSpider -s STORE_PATH=instance/nodes/mnTestDRYAD -L INFO -a count_only=1
----

## Model

### Thing

A digital entity. The persistent identifier is the Sha256 hash of the
bytes. A Thing may have more than one identifier. An instance of Thing
may be any digital object such as metadata, data, and so forth.

A Thing may have multiple Identifiers.

A Thing has only one unique Sha256 hash. It has single Sha1 and MD5
hashes that are unique within the constraints of those hashing
algorithms.

### Identifier

Captures metadata associated with a minted identifier. Note that this
is about the identifier, it's creation, and other management aspects.
The content referenced by an identifier is described by the Thing.

An Identifier may be associated with more than one Thing. There are
situations where this can happen:

1. Different representations of the same thing. For example a digital
entity may be conceptually the same though serialized differently.

2. Different aspects of the same thing. For example, a DOI may resolve to
a landing page that describes a Thing.

3. Erroneous duplication, the Identifier is used to reference two or more
distinct, unrelated entities.

4. The identifier refers to a conceptualization of a thing. For example a
series identifer in the DataONE system refers to the most recent version of
some thing.

In practice, while an identifier may be intended to be a globally unique
reference to a specific digital thing, the most reliable mechanism to achieve
this is using an identifier derived from the content of the thing.

Identifiers generated from a hash of the content of a Thing will be unique,
subject to the contraints of the hashing algorithm. It is assumed here
for all practical purposes that a Sha256 identifier will always refer
to exactly one digital entity.

Identifiers may refer to a physical thing. Physical things do not exist
digitally. Hence, any digital entity can only be associated with a physical
thing, it can not be that thing. As such, where an identifier is used to
refer to a physical thing and a digital thing, the digital thing must
result from some observation of the physical thing. The outcomes of such
observation may be manifest in many different forms such as metadata, data
records, images, and other digital entities.

### Relation

Documents a relationship between two identifiers.

Where identifiers refer to specific digital entities, the relation is
unambiguous.

Ambiguous relationships arise where identifiers may refer to more than one
Thing. The degree of ambiguity varies with the precision of the identifiers.

A relationship between physical things implies a physical association. E.g.
a sub-sample, a sibling sample from a batch, geographcally co-located, and
so forth.

Relations exist within a Context. As of this writing, Contexts are
defined by a label.

### AccessRule

Defines how Subjects may interact with a Thing.

A Thing may have multiple AccessRules.

An AccessRule may have multiple Subjects.

### Subject

Identifies an actor that may interact with a Thing.

### Request

Holds metadata associated with a request such as a HTTP request resolving
an Identifier or retrieving a Thing.