{"id":15772765,"url":"https://github.com/mjanez/ckan-mqa","last_synced_at":"2026-02-09T07:04:11.947Z","repository":{"id":181993945,"uuid":"667500580","full_name":"mjanez/ckan-mqa","owner":"mjanez","description":"Docker Compose for Metadata Quality Assessment (MQA) on CKAN and European Data Portal catalogs","archived":false,"fork":false,"pushed_at":"2025-01-09T10:58:43.000Z","size":210,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-16T18:35:27.534Z","etag":null,"topics":["ckan","dcat-ap","edp","geodcat-ap","metadata","metadata-quality","mqa"],"latest_commit_sha":null,"homepage":"https://github.com/mjanez/ckan-docker","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mjanez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-17T16:39:13.000Z","updated_at":"2025-01-09T10:58:47.000Z","dependencies_parsed_at":"2024-08-01T03:06:33.617Z","dependency_job_id":null,"html_url":"https://github.com/mjanez/ckan-mqa","commit_stats":null,"previous_names":["mjanez/ckan-mqa"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mjanez/ckan-mqa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjanez%2Fckan-mqa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjanez%2Fckan-mqa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjanez%2Fckan-mqa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjanez%2Fckan-mqa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mjanez","download_url":"https://codeload.github.com/mjanez/ckan-mqa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjanez%2Fckan-mqa/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29258625,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-09T04:11:57.159Z","status":"ssl_error","status_checked_at":"2026-02-09T04:11:56.117Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ckan","dcat-ap","edp","geodcat-ap","metadata","metadata-quality","mqa"],"created_at":"2024-10-04T15:41:59.881Z","updated_at":"2026-02-09T07:04:10.237Z","avatar_url":"https://github.com/mjanez.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eDocker Metadata Quality Assessment (MQA) for CKAN/EDP catalogs\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/mjanez/ckan-mqa\"\u003e\u003cimg src=\"https://img.shields.io/badge/%20ckan-mqa-brightgreen\" alt=\"mqa2ckan version\"\u003e\u003c/a\u003e\u003ca href=\"https://opensource.org/licenses/MIT\"\u003e \u003cimg src=\"https://img.shields.io/badge/license-Unlicense-brightgreen\" alt=\"License: Unlicense\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/mjanez/ckan-mqa/actions/workflows/docker/badge.svg\" alt=\"License: Unlicense\"\u003e\u003c/a\u003e\n\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"#overview\"\u003eOverview\u003c/a\u003e •\n    \u003ca href=\"#quick-start\"\u003eQuick start\u003c/a\u003e •\n    \u003ca href=\"#debug\"\u003eDebug\u003c/a\u003e •\n    \u003ca href=\"#containers\"\u003eContainers\u003c/a\u003e •\n    \u003ca href=\"#references\"\u003eDCAT-AP info\u003c/a\u003e\n\u003c/p\u003e\n\n**Requirements**:\n* [Docker](https://docs.docker.com/get-docker/)\n\n## Overview\n`ckan-mqa` offers a Docker Compose solution for performing [Metadata Quality Assessment (MQA)](https://data.europa.eu/mqa/methodology) on both CKAN endpoints and the European Data Portal catalogs. MQA is a crucial process to ensure the accuracy, completeness, and reliability of metadata, enhancing the overall data interoperability and accessibility.\n\nThis Docker Compose configuration enhances a Python MQA software [^1] to integrates the powerful MQA toolset seamlessly with CKAN endpoints and European Data Portal catalogs, enabling users to perform in-depth assessments of metadata quality effortlessly. The setup provides an efficient way to run comprehensive quality checks on various metadata attributes, including data relevance, schema compliance, data format consistency, and adherence to standard vocabularies.\n\n\u003e [!TIP]\n\u003e It can be tested with an open data portal of the CKAN type such as: [mjanez/ckan-docker](https://github.com/mjanez/ckan-docker)[^2]\n\n### [Metadata Quality Assessment Methodology](https://data.europa.eu/mqa/methodology)\nThe MQA measures the quality of various indicators, each indicator is explained in the tables below. The results of the checks are stored as Data Quality Vocabulary ([DQV](https://www.w3.org/TR/vocab-dqv/)) . DQV is a specification of the W3C that is used to describe the quality of a dataset.\n\n **Dimension**    | **Maximal points** \n:----------------:|:------------------:\n Findability      | 100                \n Accessibility    | 100                \n Interoperability | 110                \n Reusability      | 75                 \n Contextuality    | 20                 \n *Sum*              | 405    \n\nThe dimensions are derived from the FAIR principles:\n* **Findability**\nThe following table describes the metrics that help people and machines in finding datasets. A maximum of 100 points can be scored in this area.\n\n* **Accessibility**\nThe following table describes which metrics are used to determine whether access to the data referenced by the distributions is guaranteed. A maximum of 100 points can be scored in this area.\n\n* **Interoperability**\nThe following table describes the metrics used to determine whether a distribution is considered interoperable. According to the assumption 'identical content with several distributions', only the distribution with the highest number of points is used to calculate the points. A maximum of 110 points can be scored in this area\n\n* **Reusability**\nThe following table describes which metrics are used to check the reusability of the data. A maximum of 75 points can be scored in this area.\n\n* **Contextuality**\nThe following table show some light weight properties, that provide more context to the user. A maximum of 20 points can be scored in this area.\n\n![5 MQA_dimensions png](https://github.com/mjanez/ckan-mqa/assets/96422458/0c54d8c3-e454-4a6a-bcd6-ebc0a0dae080)\n\nThe final rating happens via four rating groups. The mapping of the points to the rating category is shown in the table below. The representation of the rating in the MQA is expressed exclusively via the rating categories. This enables providers to achieve the highest rating even with a slight deduction of points.\n\n **Rating** | **Range of points** \n:----------:|:-------------------:\n Excellent  | 351 - 405           \n Good       | 221 – 350           \n Sufficient | 121 – 220           \n Bad        | 0 - 120             \n\n\n#### Example of ckan-mqa results summary \n\n **Dimension**    | **Indicator/property**                    | **Count** | **Population** | **Percentage** | **Points** | **Weight** \n:----------------:|:-----------------------------------------:|:---------:|:--------------:|:--------------:|:----------:|:----------:\n Findability      | dcat:keyword                              | 46        | 46             | 1.0            | 30.0       | 30         \n Findability      | dcat:theme                                | 46        | 46             | 1.0            | 30.0       | 30         \n Findability      | dct:spatial                               | 42        | 46             | 0.91           | 18.26      | 20         \n Findability      | dct:temporal                              | 0         | 46             | 0.0            | 0          | 20         \n Accessibility    | dcat:accessURL code=200                   | 255       | 255            | 1.0            | 50.0       | 50         \n Accessibility    | dcat:downloadURL                          | 0         | 255            | 0.0            | 0          | 20         \n Accessibility    | dcat:downloadURL code=200                 | 0         | 255            | 0.0            | 0          | 30         \n Interoperability | dct:format                                | 255       | 255            | 1.0            | 20.0       | 20         \n Interoperability | dcat:mediaType                            | 255       | 255            | 1.0            | 10.0       | 10         \n Interoperability | dct:format/dcat:mediaType from vocabulary | 378       | 510            | 0.74           | 7.41       | 10         \n Interoperability | dct:format non-proprietary                | 131       | 255            | 0.51           | 10.27      | 20         \n Interoperability | dct:format machine-readable               | 252       | 255            | 0.99           | 19.76      | 20         \n Interoperability | DCAT-AP compliance                        | 0         | 46             | 0.0            | 0          | 30         \n Reusability      | dct:license                               | 255       | 255            | 1.0            | 20.0       | 20         \n Reusability      | dct:license from vocabulary               | 245       | 255            | 0.96           | 9.61       | 10         \n Reusability      | dct:accessRights                          | 46        | 46             | 1.0            | 10.0       | 10         \n Reusability      | dct:accessRights from vocabulary          | 0         | 46             | 0.0            | 0          | 5          \n Reusability      | dcat:contactPoint                         | 46        | 46             | 1.0            | 20.0       | 20         \n Reusability      | dct:publisher                             | 46        | 46             | 1.0            | 10.0       | 10         \n Contextuality    | dct:rights                                | 255       | 255            | 1.0            | 5.0        | 5          \n Contextuality    | dcat:byteSize                             | 0         | 255            | 0.0            | 0          | 5          \n Contextuality    | dct:issued                                | 46        | 46             | 1.0            | 5.0        | 5          \n Contextuality    | dct:modified                              | 46        | 46             | 1.0            | 5.0        | 5          \n Total points     | Rating: Good                              |           |                | 0.69           | 280.31     | 405        \n\n                              \n## Quick start\nFirst copy the `.env.example` template as `.env` and configure by changing the `CKAN_CATALOG_URL`,  as well as the DCAT-AP Profile version (`DCATAP_FILES_VERSION`), if needed.\n\n```bash\ncp .env.example .env\n```\n\nCustom ennvars:\n- `CKAN_CATALOG_URL`: URL of the CKAN catalog to be downloaded (i.e. `http://localhost:5000/catalog.rdf?q=organization:test`).\n- `APP_DIR`: Path to the application folder in Docker.\n- `TZ`: Timezone.\n- `DCATAP_FILES_VERSION`: DCAT-AP version (Avalaibles: 2.0.1, 2.1.0, 2.1.1).\n- `UPDATE_VOCABS`: Update vocabs from the EU Publications Office at start (`True` or `False`).\n- `CKAN_METADATA_TYPE`: CKAN Metadata elements type: `ckan_uris` for GeoDCAT-AP schema with all elements described by URIs (e.g. `dct:format` = \u003chttp://publications.europa.eu/resource/authority/file-type/XML\u003e) or `ckan` if used a CKAN default schema with label metadata elements (e.g. `dct:format` = \"XML\").\n\n### With docker compose\nTo deploy the environment, `docker compose` will build the latest image ([`ghcr.io/mjanez/ckan-mqa:latest`](https://github.com/mjanez/ckan-mqa/pkgs/container/ckan-mqa)).\n\n```bash\ngit clone https://github.com/mjanez/ckan-mqa\ncd ckan-mqa\n\ndocker compose up --build\n\n# Or detached mode\ndocker compose up -d --build\n```\n\n\u003e [!NOTE]\n\u003e Deploy the dev (local build) `docker-compose.dev.yml` with:\n\u003e\n\u003e```bash\n\u003e docker compose -f docker-compose.dev.yml up --build\n\u003e```\n\u003e\n\u003eIf needed, to build a specific container simply run:\n\u003e\n\u003e```bash\n\u003e  docker build -t target_name xxxx/\n\u003e```\n\n### Without Docker\nDependencies:\n```bash\npython3 -m pip install --user pipx\npipx install pdm\npdm install --no-self\n```\n\nRun:\n```bash\npdm run python ckan2mqa/ckan2mqa.py\n```\n\n## Debug\n### VSCode\n1. Build and run container.\n2. Attach Visual Studio Code to container\n3. Start debugging on `ckan2mqa.py` Python file (`Debug the currently active Python file`).\n\n## Containers\nList of *containers*:\n### Base images\n| Repository | Type | Docker tag | Size | Notes |\n| --- | --- | --- | --- | --- |\n| python 3.11| base image | `python/python:3.11-slim` | 45.57 MB |  - |\n\n### Built images\n| Repository | Type | Docker tag | Size | Notes |\n| --- | --- | --- | --- | --- |\n| mjanez/ckan-mqa| custom image | `mjanez/ckan-mqa:v*.*.*` | 264 MB |  Tag version. |\n| mjanez/ckan-mqa| custom image | `mjanez/ckan-mqa:latest` | 264 MB |  Latest stable version. |\n| mjanez/ckan-mqa| custom image | `mjanez/ckan-mqa:main` | 264 MB |  Dev version.  |\n\n\n## References\n### DCAT-AP Validator Validation Cases\nThe different cases to validate in the [DCAT-AP Validator](https://www.itb.ec.europa.eu/shacl/dcat-ap/upload) are based on the level of completeness of the checks and the incorporation of background knowledge (vocabularies). Each case is designed for a specific data exchange scenario.\nThe following describes each case and recommends which one you should use for a CKAN catalog:\n\n#### Case 1: DCAT-AP Base Zero (no background knowledge)\nIncludes all constraints required for technical coherence, excluding range class membership constraints and controlled vocabulary usage.\n\n*SHACL Profiles*: \n* [2.1.1](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_shapes.ttl)\n* [3.0.0](https://github.com/SEMICeu/DCAT-AP/blob/master/releases/3.0.0/html/shacl/shapes.ttl)\n\n\n#### Case 2: DCAT-AP Ranges Zero (no background knowledge)\nIncludes all range class membership constraints.\n\n*SHACL Profiles*: \n* [2.1.1](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_range.ttl)\n* [3.0.0](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/range.ttl)\n\n#### Case 3: DCAT-AP Base (with background knowledge)\nExtends Case 1 with background knowledge, including all vocabularies used in DCAT-AP.\n\n*SHACL Profiles*: \n* 2.1.1: [`shapes`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_shapes.ttl) and [`imports`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_imports.ttl)\n* 3.0.0: [`shapes`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/shapes.ttl) and [`imports`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/imports.ttl)\n\n#### Case 4: DCAT-AP Ranges (with background knowledge)\nExtends Case 2 with background knowledge, adding validation of range class membership and vocabulary standards compliance.\n\n*SHACL Profiles*:\n* 2.1.1: [`range`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_range.ttl) and [`imports`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_imports.ttl)\n* 3.0.0: [`range`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/range.ttl) and [`imports`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/imports.ttl)\n\n#### Case 5: DCAT-AP Recommendations (with background knowledge)\nIncludes all constraints related to recommended properties.\n\n*SHACL Profiles*: \n* 2.1.1: [`shapes recommended`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_shapes_recommended.ttl) and [`imports`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_imports.ttl)\n* 3.0.0: [`shapes recommended`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/shapes_recommended.ttl) and [`imports`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/imports.ttl)\n\n\n#### Case 6: DCAT-AP Controlled Vocabularies\nIncludes all constraints related to controlled vocabularies.\n\n*SHACL Profiles*: \n* 2.1.1: [`vocabularies shape`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_mdr-vocabularies.shape.ttl) and [`imports`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_mdr_imports.ttl)\n* 3.0.0: [`vocabularies shape`](https://github.com/SEMICeu/DCAT-AP/raw/3.0.0/releases/3.0.0/html/shacl/mdr-vocabularies.shape.ttl) and [`mdr imports`](https://github.com/SEMICeu/DCAT-AP/raw/3.0.0/releases/3.0.0/html/shacl/mdr_imports.ttl)\n\n\n#### Case 7: DCAT-AP Full (with background knowledge)\nThe union of Cases 3, 4, 5, and 6.\n\n*SHACL Profiles*: \n* 2.1.1: [`shapes`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_shapes.ttl), [`shapes recommended`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_shapes_recommended.ttl), [`imports`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_imports.ttl), [`range`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_range.ttl) and [`deprecateduris`](https://github.com/SEMICeu/DCAT-AP/raw/2.1.1-draft/releases/2.1.1/dcat-ap_2.1.1_shacl_deprecateduris.ttl)\n* 3.0.0: [`shapes`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/shapes.ttl), [`shapes recommended`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/shapes_recommended.ttl), [`imports`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/imports.ttl), [`range`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/range.ttl) and [`deprecateduris`](https://github.com/SEMICeu/DCAT-AP/raw/gh-pages/releases/3.0.0-draft/html/shacl/deprecateduris.ttl)\n\n\n#### Recommendation:\nFor most use cases, `Case 3: DCAT-AP Base (with background knowledge)` is recommended. It provides comprehensive validation of basic coherence and vocabulary standards compliance.\nIf your CKAN catalog uses controlled vocabularies, consider using `Case 6: DCAT-AP Controlled Vocabularies` or `Case 7: DCAT-AP Full (with background knowledge)` for more exhaustive validation.\nRemember, the choice of the appropriate validation case depends on your specific needs and data exchange context.\n\n\u003e [!TIP]\n\u003e \n\u003e DCAT-AP:\n\u003e  - https://github.com/SEMICeu/DCAT-AP/tree/master/releases\n\u003e  - https://semiceu.github.io/DCAT-AP/releases/3.0.0/#validation-of-dcat-ap\n\u003e\n\u003eEU Vocabularies: https://op.europa.eu/en/web/eu-vocabularies/dcat-ap\n\u003e\n\u003eValidator: \n\u003e- https://www.itb.ec.europa.eu/shacl/dcat-ap/upload\n\u003e-  https://github.com/ISAITB/validator-resources-dcat-ap/tree/master#\n\u003e\n\u003eDCAT-AP Country profiles:\n\u003e  - https://github.com/diggsweden/DCAT-AP-SE\n\u003e  - https://github.com/opendata-swiss/dcat_ap_ch\n\u003e\n\u003eSHACLs: https://github.com/ISAITB/validator-resources-dcat-ap/blob/baca3adf63d31ee415fa5e769249053ae211414c/resources/config.properties\n\n## License\nCopyright (c) the respective contributors.\nIt is open and licensed under the GNU Affero General Public License (AGPL) v3.0 whose full text may be found at:\nhttp://www.fsf.org/licensing/licenses/agpl-3.0.html\n\n[^1]: Program to test MQA evaluation: Javier Nogueras (jnog@unizar.es), Javier Lacasta (jlacasta@unizar.es), Manuel Ureña (maurena@ujaen.es), F. Javier Ariza (fjariza@ujaen.es), Héctor Ochoa Ortiz (719509@unizar.es). Trafair Project 2020.\n[^2]: A custom installation of Docker Compose with specific extensions for spatial data and [GeoDCAT-AP](https://github.com/SEMICeu/GeoDCAT-AP)/[INSPIRE](https://github.com/INSPIRE-MIF/technical-guidelines) metadata [profiles](https://en.wikipedia.org/wiki/Geospatial_metadata).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmjanez%2Fckan-mqa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmjanez%2Fckan-mqa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmjanez%2Fckan-mqa/lists"}