Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/TriplyDB/YALC

🕸 YALC: Yet Another LOD Cloud (registry of Linked Open Datasets).
https://github.com/TriplyDB/YALC

json-ld linked-data linked-open-data lod-cloud open-data rdf semantic-web

Last synced: about 1 month ago
JSON representation

🕸 YALC: Yet Another LOD Cloud (registry of Linked Open Datasets).

Awesome Lists containing this project

README

        

[![](https://img.shields.io/badge/datasets-338-brightgreen)](datasets)
[![](https://img.shields.io/badge/organizations-102-brightgreen)](organizations)
[![](https://img.shields.io/badge/tooLittleInfo-25-yellow)](datasets/too-little-info)
[![](https://img.shields.io/badge/todo-14-red)](datasets/todo)
[![](https://img.shields.io/badge/errors-56-red)](datasets/errors)

# YALC: Yet Another LOD Cloud

This repository contains configuration files for Linked Open Datasets
that are published on the web. These datasets can be freely used at
.

## Get started

Go to https://triplydb.com and use the search bar to search for
datasets.

## Contribute

If your favorite Linked Dataset is not yet available at
, you can add its configuration in a [pull
request](https://github.com/TriplyDB/YALC/pulls) or you can open a
‘Dataset request’ [issue](https://github.com/TriplyDB/YALC/issues).

## Repository structure

This repository contains the following directories:


/datasets


Contains one configuration file for each dataset.

/datasets/errors

Contains one configuration file for each dataset that cannot yet be uploaded because it contains errors.

/datasets/todo

Contains one configuration file for each dataset that cannot yet be uploaded because some functionality is still missing.

/datasets/too-little-info

Contains partial configuration files for datasets for which too little information is available at the moment.



/img

Contains images that are used in dataset and organization configurations.

/organizations

Contains one configuration file for each organization.

/rdf

Contains the RDF definitions that are used by the configuration files in this repository, and includes small RDF datasets that are part of the LOD Cloud but for which we could not find an online publication elsewhere.

## Configuration format

The configuration files in YALC all follow a JSON configuration
format. The following subsections document the format for dataset
configuration files and the format for organization configuration
files.

### Dataset configuration format

The dataset configuration format is used for the files in the
[`/datasets`](datasets) subdirectory. Each file contains a JSON
object with the following keys:


"about"


Zero or more topics that characterize what the dataset is about.


Topic values must also appear in the topic hierarchy. Topic values are specified with their IRI local name. For example, "eGov" is used to denote the topic with IRI https://triplydb.com/Triply/yalc/id/topic/eGov.


For vocabularies, this key must include the value "vocabulary".


This key is optional, in which case the dataset has zero topics.


"asset"


Links to binary files that are part of the dataset.




One example is documentation files (DOCX, PDF, ODT) that either occur in the dataset or that describe it. Another example is media files (images, sounds, videos) that occur in the dataset.


This key is optional.


"description"


The description of the dataset.


This must be at least 50 characters and at most 1,000 characters long.


This key is required.


"diagram"


An image showing a diagrammatic overview of the dataset. This value must be the local name for a file in the /img subdirectory.


This key is optional.


"graph"


Specifies the named graph that will be used to store the content from the default graph in.


This key is optional.


If this key is not specified and exactly one prefix is specified, then the prefix IRI is used as the default graph name.


"homepage"


The URL of the web page that is the authoritative location on the Internet for human-readable information about the dataset.


Sometimes a dataset does not have its own web page. In such cases it is possible to specify a web page that describes or mentions the dataset.


This key is optional, in which case the dataset will have no homepage.


"id"


Identifies the dataset and (optionally) its organization and version. The format for this value is "ORGANIZATION/DATASET@VERSION".


The values for ORGANIZATION and DATASET must be at least 2 and at most 40 characters long. They must consist of digits ([0-9]), letters ([A-Za-z]), and hyphens (-).


The value for ORGANIZATION must correspond with a file named ORGANIZATION.json in the /organizations subdirectory. See the organization configuration
format section
for more information.


The value of VERSION must follow one of the following formats:




VERSION formatVERSION_OBJECT format


MAJOR[.MINOR[.PATCH]]{"@type": "SemanticVersion", "major": "MAJOR", "minor": "MINOR", "patch": "PATCH"}
YYYY{"@type": "TemporalVersion", "year": "YYYY"}
YYYY-MM{"@type": "TemporalVersion", "yearMonth": "YYYY-MM"}
YYYY-MM-DD{"@type": "TemporalVersion", "date": "YYYY-MM-DD"}


If the organization that issues the dataset is not known, the ORGANIZATION/ prefix can be omitted. If omitted, the ‘none’ organization will be used.


If the version of the dataset is not known, the @VERSION suffix can be omitted. If omitted, "1.0.0" is used as the dataset version.


The DATASET part of this value is required.


"image"


The logo or image for this dataset. This value must be the local name for a file in the /img subdirectory.


This key is optional. If it is omitted, the image of the dataset's organization (if any) is used.


"license"


The license of this dataset. The value must be one of the following:



  • https://creativecommons.org/licenses/by-nc/4.0/

  • https://creativecommons.org/licenses/by-sa/3.0/

  • https://creativecommons.org/licenses/by/1.0/

  • https://creativecommons.org/licenses/by/2.0/

  • https://creativecommons.org/licenses/by/2.5/

  • https://creativecommons.org/licenses/by/3.0/

  • https://creativecommons.org/licenses/by/4.0/

  • https://creativecommons.org/publicdomain/zero/1.0/

  • https://opendatacommons.org/licenses/by/1-0/

  • https://opendatacommons.org/licenses/odbl/1.0/

  • https://opendatacommons.org/licenses/pddl/1-0/

  • https://opensource.org/licenses/BSD-3-Clause


This key is optional. Since datasets in YALC must have a license, the value https://creativecommons.org/licenses/by/4.0/ is used in case this key is omitted.


"name"


The display name of the dataset.


This key is optional.


"namespace"


Zero or more IRIs that denote namespaces for this dataset.


IRIs within these specified namespaces will have this dataset as their authority. Ideally, every IRI has an authoritative dataset to which it belongs. An IRI can have at most one authoritative dataset to which it belongs.


This key is optional. If it is omitted and the "prefix" key is present, then the IRIs that appear in the value of the "prefix" key are used as the namespaces.


"prefix"


A JSON object containing RDF prefix declarations. The keys of this JSON object are aliases that can be used to denote their corresponding IRI values.


This key is optional.


"successor"


Allows an outdated version of a dataset to point to its successor version.


The value notation is identical to the notation that is used for the "id" key.



"url"




Zero or more URLs from which an RDF serialization of the dataset can be downloaded.


This key may be omitted if one or more namespaces are specified (see the documentation of the "namespace" key for more details).



The following example shows the full dataset configuration for file
`datasets/[email protected]`. Notice the following details:
- The value for key `"id"` is `"w3c/[email protected]"`, whose prefix corresponds to the organization configuration file [`organizations/w3c.json`](organizations/w3c.json).
- The value for key `"image"` is
`"owl.png"`, which corresponds to file [`img/owl.png`](img/owl.png).
- While the `"namespace"` key is not specified, its value is implicitly set to `"http://www.w3.org/2002/07/owl#"`, which is specified in the `"prefix"` key.
- While the `"url"` key is not specified, its value is implicitly set to `"http://www.w3.org/2002/07/owl#"`, because this is the (implicit) value of the `"namespace"` key.

```json
{
"about": "vocabulary",
"description": "This ontology partially describes the built-in classes and properties that together form the basis of the RDF/XML syntax of OWL 2. The content of this ontology is based on Tables 6.1 and 6.2 in Section 6.4 of the OWL 2 RDF-Based Semantics specification, available at .\n\nPlease note that those tables do not include the different annotations (labels, comments and `rdfs:isDefinedBy` links) used in this file. Also note that the descriptions provided in this ontology do not provide a complete and correct formal description of either the syntax or the semantics of the introduced terms (please see the OWL 2 recommendations for the complete and normative specifications).\n\nFurthermore, the information provided by this ontology may be misleading if not used with care. This ontology SHOULD NOT be imported into OWL ontologies. Importing this file into an OWL 2 DL ontology will cause it to become an OWL 2 Full ontology and may have other, unexpected, consequences.",
"homepage": "https://www.w3.org/TR/owl2-overview",
"id": "w3c/[email protected]",
"image": "owl.png",
"name": "Web Ontology Language (OWL)",
"prefix": {
"owl": "http://www.w3.org/2002/07/owl#"
}
}
```

### Organization configuration format

The organization configuration format is used for files in the
[`/organizations`](organizations) subdirectory. Each file contains a JSON object with the following keys:


"description"


The description of the dataset.


This must be at least 50 characters and at most 1,000 characters long.


This key is required.


"homepage"


The URL of the main web page for human-readable information about the organization.


This key is optional, in which case the organization will have no homepage.


"id"


Identifies the organization.


The value must be at least 2 and at most 40 characters long. It must consist of digits ([0-9]), letters ([A-Za-z]), and hyphens (-).


"image"


The logo or image for the organization. This value must be the local name for a file in the /img subdirectory.


This key is optional. If it is omitted, the image img/rdf.png is used.


"name"


The display name of the organization.


This key is optional.



The following example shows the full organization configuration file
[`organizations/w3c.json`](organizations/w3c.json):

```json
{
"description": "The World Wide Web Consortium (W3C) is an international community where Member organizations, a full-time staff, and the public work together to develop Web standards. Led by Web inventor and Director Tim Berners-Lee and CEO Jeffrey Jaffe, W3C's mission is to lead the Web to its full potential. Contact W3C for more information.",
"homepage": "https://www.w3.org",
"id": "w3c",
"image": "w3c.png",
"name": "World Wide Web Consortium (W3C)"
}
```

### Semantic definitions

The configuration files in this repository can themselves be processed as Linked Data. This is achieved by the following definition files:


/rdf/yalc.jsonld

Configuration files can be processed as RDF by including the context stored in this file.

/rdf/yalc.trig

Definitions for the classes and properties that are used in the configuration files.

/rdf/topics.jsonld

Topic hierarchy that is used to tag datasets.

The Linked Data version of the configuration files is itself published [over here](https://triplydb.com/Triply/YALC).

## Pull request details

Pull requests can be created to add new datasets, or to improve existing datasets.

### Pull request for a new dataset

In order to add a new Linked Open Data to this repository, create a [pull request](https://github.com/TriplyDB/YALC/pulls) that includes at least the following:

- A dataset file `datasets/DATASET.json` whose contents adhere to the [dataset configuration format](#dataset-configuration-format).

If the organization that is specified in the dataset file does not yet have an organization file, it must be included as well:

- An organization file `organizations/ORGANIZATION.json` whose contents adhere to the [organization configuration format](#organization-configuration-format).

If the dataset and/or organization file specifies an image that does not yet belong to the `/img` subdirectory, then it must be added as well:

- A dataset image file and/or an organization image file in the `/img` directory. The following image formats can be used: JPEG, PNG, SVG. SVG images are preferred, since they are smaller in size and do not suffer from resolution issues.

### Pull request for an existing dataset

Feel free to improve the configurations for existing datasets.