Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Wikimedia-Sverige/DCAT

A project aimed at generating a DCAT-AP for Wikibase installations in general and Wikidata in particular.
https://github.com/Wikimedia-Sverige/DCAT

Last synced: 5 days ago
JSON representation

A project aimed at generating a DCAT-AP for Wikibase installations in general and Wikidata in particular.

Awesome Lists containing this project

README

        

DCAT-AP for Wikibase
=================

# Note
This repo stems from the time before the tool was integrated into the Wikimedia dumping infrastructure. The up-to-date repository can be found at [wikimedia/operations-dumps-dcat](https://github.com/wikimedia/operations-dumps-dcat).

---
A project aimed at generating a [DCAT-AP](https://joinup.ec.europa.eu/system/files/project/c3/22/18/DCAT-AP_Final_v1.00.html)
document for [Wikibase](http://wikiba.se) installations
in general and [Wikidata](http://wikidata.org) in particular.

Takes into account access through:

* Content negotiation (various formats)
* MediaWiki api (various formats)
* Entity dumps e.g. json, ttl (assumes that these are compressed)

An example result can be found at [lokal-profil / dcatap.rdf](https://gist.github.com/lokal-profil/8086dc6bf2398d84a311).
The live DCAT-AP description of Wikidata can be found [here](https://dumps.wikimedia.org/wikidatawiki/entities/dcatap.rdf).

## To use

1. Copy `config.example.json` to `config.json` and change the contents
to match your installation. Refer to the *Config* section below for
an explanation of the individual configuration parameters.
2. Copy `catalog.example.json` to a suitable place (e.g. on-wiki) and
update the translations to fit your wikibase installation. Set this
value as `catalog-i18n` in the config file.
3. Create the dcatap.rdf file by running `php DCAT.php` or
`php DCAT.php --config="" --dumpDir="" --outputDir=""`
where each of the options is optional and can be left out.
The options are:
1. `--config` is the relative path to the json file containing the
configurations, defaults to `./config.json`
2. `--dumpDir` is the relative path to the directory containing the
dumps (if any), defaults to the `directory` parameter in the
config file
3. `--outputDir` is the relative path to the directory where the
`dcatap.rdf` file should be created, defaults to the `directory`
parameter in the config file

## Translations

* Translations which are generic to the tool are handled by [Intuition](https://github.com/Krinkle/intuition)
and should be translated through [translatewiki.net](https://translatewiki.net).
* Translations which are specific to a project/catalog are added to
the location specified in the `catalog-i18n` parameter of the config
file.

## Config

Below follows a key by key explanation of the config file.

* `directory`: Relative path to the directory containing the dump
subcategories (if any) and for the final dcat file.
* `api-enabled`: (`Boolean`) Is API access activated for the MediaWiki
installation?
* `dumps-enabled`: (`Boolean`) Is JSON dump generation activated for the
WikiBase installation?
* `uri`: URL used as basis for rdf identifiers,
e.g. *http://www.example.org/about*
* `catalog-homepage`: URL for the homepage of the WikiBase installation,
e.g. *http://www.example.org*
* `catalog-issued`: ISO date at which the WikiBase installation was
first issued, e.g. *2000-12-24*
* `catalog-license`: License of the catalog, i.e. of the dcat file
itself (not the contents of the WikiBase installation),
e.g. *http://creativecommons.org/publicdomain/zero/1.0/*
* `catalog-i18n`: URL or path to json file containing i18n strings for
catalog title and description. Can be an on-wiki page,
e.g. *https://www.example.org/w/index.php?title=MediaWiki:DCAT.json&action=raw*
* `keywords`: (`array`) List of keywords applicable to all of the datasets
* `themes`: (`array`) List of thematic ids in accordance with
[Eurovoc](http://eurovoc.europa.eu/), e.g. *2191* for
http://eurovoc.europa.eu/2191
* `publisher`:
* `name`: Name of the publisher
* `homepage`: URL for or the homepage of the publisher
* `email`: Contact e-mail for the publisher, should be a function
address, e.g. *[email protected]*
* `publisherType`: Publisher type according to [ADMS](http://purl.org/adms/publishertype/1.0),
e.g. *NonProfitOrganisation*
* `contactPoint`:
* `name`: Name of the contact point
* `email`: E-mail for the contact point, should ideally be a
function address, e.g. *[email protected]*
* `vcardType`: Type of contact point, either `Organization` or
`Individual`
* `ld-info`:
* `accessURL`: URL to the content negotiation endpoint of the
WikiBase installation, e.g. *http://www.example.org/entity/*
* `mediatype`: (`object`) List of [IANA media types](http://www.iana.org/assignments/media-types/)
available through content negotiation in the format *file-ending:media-type*
* `license`: License of the data in the distribution, e.g.
*http://creativecommons.org/publicdomain/zero/1.0/*
* `api-info`:
* `accessURL`: URL to the MediaWiki API endpoint of the wiki,
e.g. *http://www.example.org/w/api.php*
* `mediatype`: (`object`) List of non-deprecated formats available
thorough the API, see ld-info:mediatype above for formatting
* `license`: See ld-info:license above
* `dump-info`:
* `accessURL`: URL to the directory where the *.json.gz* files
reside (`$1` is replaced on the fly by the actual filename),
e.g. *http://example.org/dumps/$1*
* `mediatype`: (`object`) List of media types. e.g.
`{"json": "application/json"}`
* `compression`: (`object`) List of compression formats, in the
format *name:file-ending* e.g. `{"gzip": "gz"}`
* `license`: See ld-info:license above