Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lblod/mandaten-download-generator-service
Microservice to generate the dump files of mandatendatabank asynchronously
https://github.com/lblod/mandaten-download-generator-service
mu-service
Last synced: 21 days ago
JSON representation
Microservice to generate the dump files of mandatendatabank asynchronously
- Host: GitHub
- URL: https://github.com/lblod/mandaten-download-generator-service
- Owner: lblod
- License: mit
- Created: 2018-04-10T08:16:17.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-10-16T18:48:49.000Z (2 months ago)
- Last Synced: 2024-10-18T17:32:46.997Z (2 months ago)
- Topics: mu-service
- Language: JavaScript
- Size: 213 KB
- Stars: 0
- Watchers: 18
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# mandaten-download-generator-service
Microservice that generates the dump files (CSV, TTL) of mandatendatabank
asynchronously. A cron job is embedded in the service to trigger an export at
the preconfigured frequency.## Installation
To add the service to your stack, add the following snippet to
`docker-compose.yml`:
```yaml
services:
export:
image: lblod/mandaten-download-generator-service:1.0.0
volumes:
- ./data/files:/share
- ./config/type-exports.js:/config/type-exports.js
```Don't forget to update the dispatcher configuration to route requests to the
export service. Files may then be served by the
[mu-file-service](https://github.com/mu-semtech/file-service).## Model
The task are modelled in agreement with the
[cogs:Job](http://vocab.deri.ie/cogs#Job) and
[task:Task](http://redpencil.data.gift/vocabularies/tasks/Task). The full
description should be availible on
[data.gift](https://redpencil.data.gift/vocabularies/tasks) (TODO). See also
e.g. [jobs-controller-service](https://github.com/lblod/job-controller-service)
for more information on the model.### Prefixes
```sparql
PREFIX mu:
PREFIX task:
PREFIX dct:
PREFIX prov:
PREFIX nie:
PREFIX ext:
PREFIX oslc:
PREFIX cogs:
PREFIX adms:
PREFIX export:
PREFIX nfo:
PREFIX dbpedia:
```### Export
A file as a result from an export task.
#### Class
`export:Export`
##### properties
Name | Predicate | Range | Definition
--- | --- | --- | ---
uuid |mu:uuid | xsd:string
classification | export:classification | skos:Concept
fileName | nfo:fileName | xsd:string
format | dct:format | xsd:string
created | dct:created | xsd:dateTime
fileSize | nfo:fileSize | xsd:integer
extension | dbpedia:fileExtension | xsd:string## Configuration
### CSV export
The SPARQL query to execute for the CSV export must be specified in
`/config/csv-export.sparql`. Note that the variable names in the `SELECT`
clause will be used as column headers in the export.### TTL export
The Turtle export must be specified in `/config/type-export.js`. This config
specifies a prefix mapping and a list of RDF types with a set of required and
optional properties that must be exported per type. An additional filter for
the `WHERE` clause can be specified per type.E.g.
```javascript
export default {
prefixes: {
mandaat: "http://data.vlaanderen.be/ns/mandaat#",
person: "http://www.w3.org/ns/person#",
foaf: "http://xmlns.com/foaf/0.1/"
},
types: [
{
type: "mandaat:Mandataris",
requiredProperties: [
"mandaat:start",
"mandaat:eind"
],
optionalProperties: [
"mandaat:status"
],
additionalFilter: ""
},
{
type: "person:Person",
optionalProperties: [
"foaf:name"
],
additionalFilter: ""
}
]
}
```### Environment variables
The following environment variables can be configured:
* `EXPORT_CRON_PATTERN`: cron pattern to configure the frequency of the cron
job. The pattern follows the format as specified in
[node-cron](https://www.npmjs.com/package/cron#available-cron-patterns).
Defaults to `0 0 */2 * * *`, run every 2 hours.
* `EXPORT_FILE_BASE`: base name of the export file. Defaults to `mandaten`. The
export file will be named `{EXPORT_FILE_BASE}-{timestamp}.{csv|ttl}`.
* `EXPORT_TTL_BATCH_SIZE`: batch size used as `LIMIT` in the `CONSTRUCT` SPARQL
queries per type. Defaults to `1000`. To have a complete export, make sure
`EXPORT_TTL_BATCH_SIZE * number_of_matching_triples` doesn't exceed the
maximum number of triples return by the database (e.g. `ResultSetMaxRows` in
Virtuoso).
* `RETRY_CRON_PATTERN`: cron pattern to configure the frequency of the function
that retries failed tasks. The pattern follows the format as specified in
[node-cron](https://www.npmjs.com/package/cron#available-cron-patterns).
Defaults to `0 */10 * * * *`, run every 10 minutes.
* `NUMBER_OF_RETRIES`: defined the number of times a task will be retried
* `FILES_GRAPH`: graph where files must be stored defaults to
`http://mu.semte.ch/graphs/system/jobs`
* `JOBS_GRAPH`: graph where jobs must be stored defaults to
`http://mu.semte.ch/graphs/system/jobs`
* `TASK_OPERATION_URI`: specify the opertation URI (a thing you can attach a
`skos:prefLabel` to) of the instance of this service. E.g.
`http://lblod.data.gift/id/jobs/concept/TaskOperation/exportMandatarissen`
REQUIRED
* `EXPORT_CLASSIFICATION_URI`: the classification of the export, to ease
filtering. Defaults to:
`http://redpencil.data.gift/id/exports/concept/GenericExport`## REST API
### POST /export-tasks
Trigger a new export asynchronously.
Returns `202 Accepted` if the export started successfully. The location
response header contains an endpoint to monitor the task status.Returns `503 Service Unavailable` if an export is already running.
### GET /export-tasks/:id
Get the status of an export task.
Returns `200 OK` with a task resource in the response body. Task status is one
of `ongoing`, `done`, `cancelled` or `failed`.## Development
Add the following snippet to your stack during development:
```yaml
services:
export:
image: semtech/mu-javascript-template:1.8.0
ports:
- 8888:80
environment:
NODE_ENV: "development"
volumes:
- /path/to/your/code:/app/
- ./data/exports:/data/exports
- ./config/export:/config
```## Caveats/TODOs
- A migration is wishful if you proviously used 0.x.x versions in your stack.
To convert the old task model to `cogs:Job`
- It needs to be directly linked to Virtuoso. No support for `CONSTRUCT`
queries in the current latest version (v0.6.0-beta.6) of mu-auth.
- From a data model perspective: the retry of the task might be confusing. In
current implementation, a failed task does not mean that it will stop. It
might end once the threshold of retries is reached.
- An option should be added to allow periodic cleanup of the jobs and related
exports.
- The name of the service might be more generic.