Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lblod/mandaten-download-generator-service

Microservice to generate the dump files of mandatendatabank asynchronously
https://github.com/lblod/mandaten-download-generator-service

mu-service

Last synced: 21 days ago
JSON representation

Microservice to generate the dump files of mandatendatabank asynchronously

Host: GitHub
URL: https://github.com/lblod/mandaten-download-generator-service
Owner: lblod
License: mit
Created: 2018-04-10T08:16:17.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2024-10-16T18:48:49.000Z (2 months ago)
Last Synced: 2024-10-18T17:32:46.997Z (2 months ago)
Topics: mu-service
Language: JavaScript
Size: 213 KB
Stars: 0
Watchers: 18
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

        # mandaten-download-generator-service

Microservice that generates the dump files (CSV, TTL) of mandatendatabank

asynchronously. A cron job is embedded in the service to trigger an export at

the preconfigured frequency.

## Installation

To add the service to your stack, add the following snippet to

`docker-compose.yml`:

```yaml

services:

  export:

    image: lblod/mandaten-download-generator-service:1.0.0

    volumes:

      - ./data/files:/share

      - ./config/type-exports.js:/config/type-exports.js

```

Don't forget to update the dispatcher configuration to route requests to the

export service. Files may then be served by the

[mu-file-service](https://github.com/mu-semtech/file-service).

## Model

The task are modelled in agreement with the

[cogs:Job](http://vocab.deri.ie/cogs#Job) and

[task:Task](http://redpencil.data.gift/vocabularies/tasks/Task). The full

description should be availible on

[data.gift](https://redpencil.data.gift/vocabularies/tasks) (TODO). See also

e.g. [jobs-controller-service](https://github.com/lblod/job-controller-service)

for more information on the model.

### Prefixes

```sparql

PREFIX mu: 

PREFIX task: 

PREFIX dct: 

PREFIX prov: 

PREFIX nie: 

PREFIX ext: 

PREFIX oslc: 

PREFIX cogs: 

PREFIX adms: 

PREFIX export: 

PREFIX nfo: 

PREFIX dbpedia: 

```

### Export

A file as a result from an export task.

#### Class

`export:Export`

##### properties

Name | Predicate | Range | Definition

--- | --- | --- | ---

uuid |mu:uuid | xsd:string

classification | export:classification | skos:Concept

fileName | nfo:fileName | xsd:string

format | dct:format | xsd:string

created | dct:created | xsd:dateTime

fileSize | nfo:fileSize | xsd:integer

extension | dbpedia:fileExtension | xsd:string

## Configuration

### CSV export

The SPARQL query to execute for the CSV export must be specified in

`/config/csv-export.sparql`. Note that the variable names in the `SELECT`

clause will be used as column headers in the export.

### TTL export

The Turtle export must be specified in `/config/type-export.js`. This config

specifies a prefix mapping and a list of RDF types with a set of required and

optional properties that must be exported per type. An additional filter for

the `WHERE` clause can be specified per type.

E.g.

```javascript

export default {

  prefixes: {

    mandaat: "http://data.vlaanderen.be/ns/mandaat#",

    person: "http://www.w3.org/ns/person#",

    foaf: "http://xmlns.com/foaf/0.1/"

  },

  types: [

    {

      type: "mandaat:Mandataris",

      requiredProperties: [

        "mandaat:start",

        "mandaat:eind"

      ],

      optionalProperties: [

        "mandaat:status"

      ],

      additionalFilter: ""

    },

    {

      type: "person:Person",

      optionalProperties: [

        "foaf:name"

      ],

      additionalFilter: ""

    }

  ]

}

```

### Environment variables

The following environment variables can be configured:

* `EXPORT_CRON_PATTERN`: cron pattern to configure the frequency of the cron

  job. The pattern follows the format as specified in

  [node-cron](https://www.npmjs.com/package/cron#available-cron-patterns).

  Defaults to `0 0 */2 * * *`, run every 2 hours.

* `EXPORT_FILE_BASE`: base name of the export file. Defaults to `mandaten`. The

  export file will be named `{EXPORT_FILE_BASE}-{timestamp}.{csv|ttl}`.

* `EXPORT_TTL_BATCH_SIZE`: batch size used as `LIMIT` in the `CONSTRUCT` SPARQL

  queries per type. Defaults to `1000`. To have a complete export, make sure

  `EXPORT_TTL_BATCH_SIZE * number_of_matching_triples` doesn't exceed the

  maximum number of triples return by the database (e.g. `ResultSetMaxRows` in

  Virtuoso).

* `RETRY_CRON_PATTERN`: cron pattern to configure the frequency of the function

  that retries failed tasks. The pattern follows the format as specified in

  [node-cron](https://www.npmjs.com/package/cron#available-cron-patterns).

  Defaults to `0 */10 * * * *`, run every 10 minutes.

* `NUMBER_OF_RETRIES`: defined the number of times a task will be retried

* `FILES_GRAPH`: graph where files must be stored defaults to

  `http://mu.semte.ch/graphs/system/jobs`

* `JOBS_GRAPH`: graph where jobs must be stored defaults to

  `http://mu.semte.ch/graphs/system/jobs`

* `TASK_OPERATION_URI`: specify the opertation URI (a thing you can attach a

  `skos:prefLabel` to) of the instance of this service. E.g.

  `http://lblod.data.gift/id/jobs/concept/TaskOperation/exportMandatarissen`

  REQUIRED

* `EXPORT_CLASSIFICATION_URI`: the classification of the export, to ease

  filtering. Defaults to:

  `http://redpencil.data.gift/id/exports/concept/GenericExport`

## REST API

### POST /export-tasks

Trigger a new export asynchronously.

Returns `202 Accepted` if the export started successfully. The location

response header contains an endpoint to monitor the task status.

Returns `503 Service Unavailable` if an export is already running.

### GET /export-tasks/:id

Get the status of an export task.

Returns `200 OK` with a task resource in the response body. Task status is one

of `ongoing`, `done`, `cancelled` or `failed`.

## Development

Add the following snippet to your stack during development:

```yaml

services:

  export:

    image: semtech/mu-javascript-template:1.8.0

    ports:

      - 8888:80

    environment:

      NODE_ENV: "development"

    volumes:

      - /path/to/your/code:/app/

      - ./data/exports:/data/exports

      - ./config/export:/config

```

## Caveats/TODOs

- A migration is wishful if you proviously used 0.x.x versions in your stack.

  To convert the old task model to `cogs:Job`

- It needs to be directly linked to Virtuoso. No support for `CONSTRUCT`

  queries in the current latest version (v0.6.0-beta.6) of mu-auth.

- From a data model perspective: the retry of the task might be confusing. In

  current implementation, a failed task does not mean that it will stop. It

  might end once the threshold of retries is reached.

- An option should be added to allow periodic cleanup of the jobs and related

  exports.

- The name of the service might be more generic.