Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lblod/mandaten-download-generator-service

Microservice to generate the dump files of mandatendatabank asynchronously
https://github.com/lblod/mandaten-download-generator-service

mu-service

Last synced: 21 days ago
JSON representation

Microservice to generate the dump files of mandatendatabank asynchronously

Awesome Lists containing this project

README

        

# mandaten-download-generator-service

Microservice that generates the dump files (CSV, TTL) of mandatendatabank
asynchronously. A cron job is embedded in the service to trigger an export at
the preconfigured frequency.

## Installation

To add the service to your stack, add the following snippet to

`docker-compose.yml`:

```yaml
services:
export:
image: lblod/mandaten-download-generator-service:1.0.0
volumes:
- ./data/files:/share
- ./config/type-exports.js:/config/type-exports.js
```

Don't forget to update the dispatcher configuration to route requests to the
export service. Files may then be served by the
[mu-file-service](https://github.com/mu-semtech/file-service).

## Model

The task are modelled in agreement with the
[cogs:Job](http://vocab.deri.ie/cogs#Job) and
[task:Task](http://redpencil.data.gift/vocabularies/tasks/Task). The full
description should be availible on
[data.gift](https://redpencil.data.gift/vocabularies/tasks) (TODO). See also
e.g. [jobs-controller-service](https://github.com/lblod/job-controller-service)
for more information on the model.

### Prefixes

```sparql
PREFIX mu:
PREFIX task:
PREFIX dct:
PREFIX prov:
PREFIX nie:
PREFIX ext:
PREFIX oslc:
PREFIX cogs:
PREFIX adms:
PREFIX export:
PREFIX nfo:
PREFIX dbpedia:
```

### Export

A file as a result from an export task.

#### Class

`export:Export`

##### properties

Name | Predicate | Range | Definition
--- | --- | --- | ---
uuid |mu:uuid | xsd:string
classification | export:classification | skos:Concept
fileName | nfo:fileName | xsd:string
format | dct:format | xsd:string
created | dct:created | xsd:dateTime
fileSize | nfo:fileSize | xsd:integer
extension | dbpedia:fileExtension | xsd:string

## Configuration

### CSV export

The SPARQL query to execute for the CSV export must be specified in
`/config/csv-export.sparql`. Note that the variable names in the `SELECT`
clause will be used as column headers in the export.

### TTL export

The Turtle export must be specified in `/config/type-export.js`. This config
specifies a prefix mapping and a list of RDF types with a set of required and
optional properties that must be exported per type. An additional filter for
the `WHERE` clause can be specified per type.

E.g.

```javascript
export default {
prefixes: {
mandaat: "http://data.vlaanderen.be/ns/mandaat#",
person: "http://www.w3.org/ns/person#",
foaf: "http://xmlns.com/foaf/0.1/"
},
types: [
{
type: "mandaat:Mandataris",
requiredProperties: [
"mandaat:start",
"mandaat:eind"
],
optionalProperties: [
"mandaat:status"
],
additionalFilter: ""
},
{
type: "person:Person",
optionalProperties: [
"foaf:name"
],
additionalFilter: ""
}
]
}
```

### Environment variables

The following environment variables can be configured:

* `EXPORT_CRON_PATTERN`: cron pattern to configure the frequency of the cron
job. The pattern follows the format as specified in
[node-cron](https://www.npmjs.com/package/cron#available-cron-patterns).
Defaults to `0 0 */2 * * *`, run every 2 hours.
* `EXPORT_FILE_BASE`: base name of the export file. Defaults to `mandaten`. The
export file will be named `{EXPORT_FILE_BASE}-{timestamp}.{csv|ttl}`.
* `EXPORT_TTL_BATCH_SIZE`: batch size used as `LIMIT` in the `CONSTRUCT` SPARQL
queries per type. Defaults to `1000`. To have a complete export, make sure
`EXPORT_TTL_BATCH_SIZE * number_of_matching_triples` doesn't exceed the
maximum number of triples return by the database (e.g. `ResultSetMaxRows` in
Virtuoso).
* `RETRY_CRON_PATTERN`: cron pattern to configure the frequency of the function
that retries failed tasks. The pattern follows the format as specified in
[node-cron](https://www.npmjs.com/package/cron#available-cron-patterns).
Defaults to `0 */10 * * * *`, run every 10 minutes.
* `NUMBER_OF_RETRIES`: defined the number of times a task will be retried
* `FILES_GRAPH`: graph where files must be stored defaults to
`http://mu.semte.ch/graphs/system/jobs`
* `JOBS_GRAPH`: graph where jobs must be stored defaults to
`http://mu.semte.ch/graphs/system/jobs`
* `TASK_OPERATION_URI`: specify the opertation URI (a thing you can attach a
`skos:prefLabel` to) of the instance of this service. E.g.
`http://lblod.data.gift/id/jobs/concept/TaskOperation/exportMandatarissen`
REQUIRED
* `EXPORT_CLASSIFICATION_URI`: the classification of the export, to ease
filtering. Defaults to:
`http://redpencil.data.gift/id/exports/concept/GenericExport`

## REST API

### POST /export-tasks

Trigger a new export asynchronously.

Returns `202 Accepted` if the export started successfully. The location
response header contains an endpoint to monitor the task status.

Returns `503 Service Unavailable` if an export is already running.

### GET /export-tasks/:id

Get the status of an export task.

Returns `200 OK` with a task resource in the response body. Task status is one
of `ongoing`, `done`, `cancelled` or `failed`.

## Development

Add the following snippet to your stack during development:

```yaml
services:
export:
image: semtech/mu-javascript-template:1.8.0
ports:
- 8888:80
environment:
NODE_ENV: "development"
volumes:
- /path/to/your/code:/app/
- ./data/exports:/data/exports
- ./config/export:/config
```

## Caveats/TODOs

- A migration is wishful if you proviously used 0.x.x versions in your stack.
To convert the old task model to `cogs:Job`
- It needs to be directly linked to Virtuoso. No support for `CONSTRUCT`
queries in the current latest version (v0.6.0-beta.6) of mu-auth.
- From a data model perspective: the retry of the task might be confusing. In
current implementation, a failed task does not mean that it will stop. It
might end once the threshold of retries is reached.
- An option should be added to allow periodic cleanup of the jobs and related
exports.
- The name of the service might be more generic.