{"id":47201554,"url":"https://github.com/pkiraly/metadata-qa-api","last_synced_at":"2026-03-13T13:11:37.509Z","repository":{"id":38997002,"uuid":"60538424","full_name":"pkiraly/metadata-qa-api","owner":"pkiraly","description":"Metadata Quality Assessment Framework API","archived":false,"fork":false,"pushed_at":"2026-02-26T10:38:38.000Z","size":3701,"stargazers_count":20,"open_issues_count":66,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-02-26T13:10:37.759Z","etag":null,"topics":["code4lib","csv","json","xml"],"latest_commit_sha":null,"homepage":"http://pkiraly.github.io/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pkiraly.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2016-06-06T15:20:20.000Z","updated_at":"2026-02-26T10:38:41.000Z","dependencies_parsed_at":"2026-02-26T10:05:03.012Z","dependency_job_id":null,"html_url":"https://github.com/pkiraly/metadata-qa-api","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/pkiraly/metadata-qa-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pkiraly%2Fmetadata-qa-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pkiraly%2Fmetadata-qa-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pkiraly%2Fmetadata-qa-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pkiraly%2Fmetadata-qa-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pkiraly","download_url":"https://codeload.github.com/pkiraly/metadata-qa-api/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pkiraly%2Fmetadata-qa-api/sbom","scorecard":{"id":736465,"data":{"date":"2025-08-11","repo":{"name":"github.com/pkiraly/metadata-qa-api","commit":"607774ce51e32b094eeb49b60cc63821b7e7ca2b"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":5,"checks":[{"name":"Maintained","score":10,"reason":"30 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 10","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":0,"reason":"Found 0/11 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":10,"reason":"GitHub workflow tokens follow principle of least privilege","details":["Info: topLevel 'contents' permission set to 'read': .github/workflows/maven.yml:6","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/maven.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/pkiraly/metadata-qa-api/maven.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/maven.yml:27: update your workflow using https://app.stepsecurity.io/secureworkflow/pkiraly/metadata-qa-api/maven.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/maven.yml:36: update your workflow using https://app.stepsecurity.io/secureworkflow/pkiraly/metadata-qa-api/maven.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/maven.yml:43: update your workflow using https://app.stepsecurity.io/secureworkflow/pkiraly/metadata-qa-api/maven.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/maven.yml:58: update your workflow using https://app.stepsecurity.io/secureworkflow/pkiraly/metadata-qa-api/maven.yml/main?enable=pin","Info:   0 out of   4 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 third-party GitHubAction dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.md:0","Info: FSF or OSI recognized license: GNU General Public License v3.0: LICENSE.md:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v0.9.8 not signed: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/229294653","Warn: release artifact v0.9.4 not signed: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/149746936","Warn: release artifact v0.9.3 not signed: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/112268965","Warn: release artifact v0.9.1 not signed: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/103226762","Warn: release artifact v0.9.0 not signed: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/83746734","Warn: release artifact v0.9.8 does not have provenance: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/229294653","Warn: release artifact v0.9.4 does not have provenance: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/149746936","Warn: release artifact v0.9.3 does not have provenance: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/112268965","Warn: release artifact v0.9.1 does not have provenance: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/103226762","Warn: release artifact v0.9.0 does not have provenance: https://api.github.com/repos/pkiraly/metadata-qa-api/releases/83746734"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":10,"reason":"SAST tool detected","details":["Info: SAST configuration detected: Sonar","Info: all commits (21) are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":5,"reason":"5 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-w33c-445m-f8w7","Warn: Project is vulnerable to: GHSA-389x-839f-4rhx","Warn: Project is vulnerable to: GHSA-xq3w-v528-46rv","Warn: Project is vulnerable to: GHSA-4g8c-wm8x-jfhw","Warn: Project is vulnerable to: GHSA-qh8g-58pp-2wxh"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-22T16:01:05.208Z","repository_id":38997002,"created_at":"2025-08-22T16:01:05.208Z","updated_at":"2025-08-22T16:01:05.208Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30467786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-13T11:00:43.441Z","status":"ssl_error","status_checked_at":"2026-03-13T11:00:23.173Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code4lib","csv","json","xml"],"created_at":"2026-03-13T13:11:36.870Z","updated_at":"2026-03-13T13:11:37.497Z","avatar_url":"https://github.com/pkiraly.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Metadata Quality Assessment Framework API\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15788411.svg)](https://doi.org/10.5281/zenodo.15788411)\n\nThis project is the central piece of the Metadata Quality Assurance\nFramework, every other project is built on top of it. It provides\na general framework for measuring metadata quality in different \ndigital collections.\n\n  * [Quality dimensions](#quality-dimensions)\n  * [Running as command-line application](#running-as-command-line-application)\n  * [Using the library](#using-the-library)\n  * [Defining schema with a MQA Schema file](#defining-schema-with-a-mqa-schema-file)\n    + [Rules](#rules)\n      - [Cardinality](#cardinality)\n        * [`minCount \u003cnumber\u003e`](#mincount-number)\n        * [`maxCount \u003cnumber\u003e`](#maxcount-number)\n      - [Value Range](#value-range)\n        * [`minExclusive \u003cnumber\u003e`](#minexclusive-number)\n        * [`minInclusive \u003cnumber\u003e`](#mininclusive-number)\n        * [`maxExclusive \u003cnumber\u003e`](#maxexclusive-number)\n        * [`maxInclusive \u003cnumber\u003e`](#maxinclusive-number)\n      - [String constraints](#string-constraints)\n        * [`minLength \u003cnumber\u003e`](#minlength-number)\n        * [`maxLength \u003cnumber\u003e`](#maxlength-number)\n        * [`hasValue \u003cString\u003e`](#hasvalue-string)\n        * [`in [String1, ..., StringN]`](#in-string1--stringn)\n        * [`pattern \u003cregular expression\u003e`](#pattern-regular-expression)\n        * [`minWords \u003cnumber\u003e`](#minwords-number)\n        * [`maxWords \u003cnumber\u003e`](#maxwords-number)\n      - [Comparision of properties](#comparision-of-properties)\n        * [`equals \u003cfield label\u003e`](#equals-field-label)\n        * [`disjoint \u003cfield label\u003e`](#disjoint-field-label)\n        * [`lessThan \u003cfield label\u003e`](#lessthan-field-label)\n        * [`lessThanOrEquals \u003cfield label\u003e`](#lessthanorequals-field-label)\n      - [Logical operators](#logical-operators)\n        * [`and [\u003crule1\u003e, ..., \u003cruleN\u003e]`](#and-rule1--rulen)\n        * [`or [\u003crule1\u003e, ..., \u003cruleN\u003e]`](#or-rule1--rulen)\n        * [`not [\u003crule1\u003e, ..., \u003cruleN\u003e]`](#not-rule1--rulen)\n      - [Other constraints](#other-constraints)\n        * [`contentType [type1, ..., typeN]`](#contenttype-type1--typen)\n        * [`validLink \u003cboolean\u003e`](#validLink-boolean)\n        * [`unique \u003cboolean\u003e`](#unique-boolean)\n        * [`dependencies [id1, id2, ..., idN]`](#dependencies-id1-id2--idn)\n        * [`dimension [criteria1, criteria2, ..., criteriaN]`](#dimension-criteria1-criteria2--criterian)\n        * [`hasLanguageTag \u003canyOf|oneOf|allOf\u003e`](#haslanguagetag-anyofoneofallof)\n        * [`isMultilingual \u003cboolean\u003e`](#ismultilingual-boolean)\n      - [General properties](#general-properties)\n        * [`id \u003cString\u003e`](#id-string)\n        * [`description \u003cString\u003e`](#description-string)\n        * [`failureScore \u003cinteger\u003e`](#failurescore-integer)\n        * [`successScore \u003cinteger\u003e`](#successscore-integer)\n        * [`hidden \u003cboolean\u003e`](#hidden-boolean)\n        * [`skip \u003cboolean\u003e`](#skip-boolean)\n        * [`debug \u003cboolean\u003e`](#debug-boolean)\n      - [Set rules via Java API](#set-rules-via-java-api)\n  * [Defining MeasurementConfiguration with a configuration file](#defining-measurementconfiguration-with-a-configuration-file)\n  * [Using an experimental version](#using-an-experimental-version)\n  * [More info](#more-info)\n\n\n## Quality dimensions\n\nThe framework measures the following features:\n\n* _completeness_: it says how complete your records, i.e. what ratio of data\n  elements defined in the metadata schema is available in the records. It\n  can also collect information about the extistence of the field, and their \n  cardinality (how many times they occur in a record)\n* _uniqueness and TF-IDF score_: it calculates the TF-IDF scores for field\n  values. It is useful to learn how unique or how frequent the data values.\n* _rule catalogue_: one can set different rules or constraints against the\n  data values. It checks if these rules are followed. One can set scores for\n  failure and success cases.\n* _multilingual saturation_: how multilingual your records. It requires XML\n  or RDF based multilingual annotation. It reports the number of tagged\n  literals, number of distinct language tags, number of tagged literals per\n  language tag, average number of languages per property for which there is\n  at least one language-tagged literal\n* _language extractor_: it extracts the language tag of data elements if any.\n\nabove these there are some helper calculators:\n\n* _extractor_: extracts and outputs values from the record\n* _annotator_: injects metadata into the output (e.g. some values, which helps\n  further processings, such as file name, date, identifier or other information\n  about the measurement, which are not available within the records)\n* _indexer_: index particular data elements with Solr before measurement. It\n  is a necessary step for measuring TF-IDF or uniqueness\n\n## Running as command-line application\n\nusage:\n\n```\n./mqa -i \u003cfile\u003e -s \u003cfile\u003e -m \u003cfile\u003e\n      [-f \u003cformat\u003e] [-h \u003carg\u003e] [-o \u003cfile\u003e] [-r \u003cpath\u003e] [-v \u003cformat\u003e] [-w \u003cformat\u003e] [-z]\n```\n* `-i,--input \u003cfile\u003e` Input file.\n* `-n,--inputFormat \u003cformat\u003e` (optional, String) The format of input file. Right now it supports two JSON variants:\n  * `ndjson`: line delimited JSON in which every line is a new record (the default value)\n  * `json-array`: JSON file that contains an array of objects\n* `-s,--schema \u003cfile\u003e` MQA Schema file describing the metadata structure to run assessment against.\n* `-v,--schemaFormat \u003cformat\u003e` Format of MQA Schema file: json, yaml. Default: based on file extension, else json.\n* `-m,--measurements \u003cfile\u003e` Configuration file for measurements.\n* `-w,--measurementsFormat \u003cformat\u003e` Format of measurements config file: json, yaml. Default: based on file extension, else json.\n* `-o,--output \u003cfile\u003e` Output file.\n* `-f,--outputFormat \u003cformat\u003e` Format of the output: json, ndjson (new line delimited JSON), csv, csvjson (json encoded in csv; useful for RDB bulk loading). Default: ndjson.\n* `-r,--recordAddress \u003cpath\u003e` An XPath or JSONPath expression to separate individual records in an XML or JSON files.\n* `-z,--gzip` Flag to indicate that input is gzipped.\n* `-h,--headers \u003carg\u003e` Headers to copy from source\n\n## Using the library\n\nIf you want to implement it to your collection you have to define a schema,\nwhich presentats an existing metadata schema, and configure the basic facade,\nwhich will run the calculation.\n\nFirst, add the library into your project's `pom.xml` file:\n\n```xml\n\u003cdependencies\u003e\n  ...\n  \u003cdependency\u003e\n    \u003cgroupId\u003ede.gwdg.metadata\u003c/groupId\u003e\n    \u003cartifactId\u003emetadata-qa-api\u003c/artifactId\u003e\n    \u003cversion\u003e0.9.4\u003c/version\u003e\n  \u003c/dependency\u003e\n\u003c/dependencies\u003e\n```\n\nDefine a configuration:\n```Java\nMeasurementConfiguration config = new MeasurementConfiguration()\n  // we will measure completeness now\n  .enableCompletenessMeasurement();\n```\n\nYou can create a \u003ca href=\"#defining-measurementconfiguration-with-a-configuration-file\"\u003econfiguration file\u003c/a\u003e.\n\nDefine a schema:\n```Java\nSchema schema = new BaseSchema()\n  // this schema will be used for a CSV file\n  .setFormat(Format.CSV)\n  // DataELement represents a data element, which might have \n  // a number of properties\n  .addField(\n    new DataELement(\"url\", Category.MANDATORY)\n        .setExtractable()\n  )\n  .addField(new DataELement(\"name\"))\n  .addField(new DataELement(\"alternateName\"))\n  ...\n  .addField(new DataELement(\"temporalCoverage\"));\n```\n\nBuild a `CalculatorFacade` object:\n\n```Java\nCalculatorFacade calculator = new CalculatorFacade(config) // use configuration\n  .setSchema(schema)   // set the schema which describes the source\n  .configure();        // finalize the configuration\n```\n\nIf you have a CSV source and you would like to reuse the headers use `setCsvReader()`:\n```Java\nCalculatorFacade calculator = new CalculatorFacade(config) // use configuration\n  .setSchema(schema)   // set the schema which describes the source\n  .setCsvReader(       // optional, if it is a CSV source\n    new CsvReader()\n      .setHeader(((CsvAwareSchema) schema).getHeader()))\n  .configure();        // finalize the configuration\n```\n\nThese are the two important requirements for the start of the measuring.\nThe measuring is simple:\n\n```Java\nString csv = calculator.measure(input)\n```\n\nThe `input` should be a string formatted as JSON, XML or CSV. The output is a\ncomma separated line. The `calculator.getHeader()` returns the list of the\ncolumn names.\n\nThere are a couple of alternatives, if you would like to receive a List or a Map:\n\n* `String measure(String record) throws InvalidJsonException`\nReturns a CSV string\n\n```Java\n\"0.352941,1.0,1,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0\"\n```\n* `List\u003cString\u003e measureAsList(String record) throws InvalidJsonException`\nReturns a list of strings. \n\n```Java\nList.of(\"0.352941\", \"1.0\", \"1\", \"1\", \"0\", \"1\", \"0\", \"0\", \"0\", \"0\", \"1\", \"0\", \"0\", \"1\", \"1\",\n        \"0\", \"0\", \"0\", \"0\", \"1\", \"1\", \"0\", \"1\", \"0\", \"0\", \"0\", \"0\", \"1\", \"0\", \"0\", \"1\", \"1\",\n        \"0\", \"0\", \"0\", \"0\");\n```\n* `List\u003cObject\u003e measureAsListOfObjects(String record) throws InvalidJsonException`\nReturns a list of objects\n\n```Java\nList.of(0.35294117647058826, 1.0, true, true, false, true, false, false, false, false, true,\n        false, false, true, true, false, false, false, false, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0,\n        0, 1, 1, 0, 0, 0, 0);\n```\n* `Map\u003cString, Object\u003e measureAsMap(String record) throws InvalidJsonException`\nReturns a map of objects. The keys of the map are the names of the metrics.\n\n```Java\nMap.of(\n  \"completeness:TOTAL\", 0.35294117647058826,\n  \"completeness:MANDATORY\", 1.0,\n  \"existence:url\", true,\n  \"existence:name\", true,\n  \"existence:alternateName\", false,\n  \"existence:description\", true,\n  \"existence:variablesMeasured\", false,\n  \"existence:measurementTechnique\", false,\n  \"existence:sameAs\", false,\n  \"existence:doi\", false,\n  \"existence:identifier\", true,\n  \"existence:author\", false,\n  \"existence:isAccessibleForFree\", false,\n  \"existence:dateModified\", true,\n  \"existence:distribution\", true,\n  \"existence:spatialCoverage\", false,\n  \"existence:provider\", false,\n  \"existence:funder\", false,\n  \"existence:temporalCoverage\", false,\n  \"cardinality:url\", 1.\n  \"cardinality:name\", 1,\n  \"cardinality:alternateName\", 0,\n  \"cardinality:description\", 1,\n  \"cardinality:variablesMeasured\", 0,\n  \"cardinality:measurementTechnique\", 0,\n  \"cardinality:sameAs\", 0,\n  \"cardinality:doi\", 0,\n  \"cardinality:identifier\", 1,\n  \"cardinality:author\", 0,\n  \"cardinality:isAccessibleForFree\", 0,\n  \"cardinality:dateModified\", 1,\n  \"cardinality:distribution\", 1,\n  \"cardinality:spatialCoverage\", 0,\n  \"cardinality:provider\", 0,\n  \"cardinality:funder\", 0,\n  \"cardinality:temporalCoverage\", 0\n)\n```\n\n* `String measureAsJson(String inputRecord) throws InvalidJsonException`\nReturns a JSON representation\n\n```JSON\n{\n  \"completeness\":{\n    \"completeness\":{\n      \"TOTAL\":0.35294117647058826,\n      \"MANDATORY\":1.0\n    },\n    \"existence\":{\n      \"url\":true,\n      \"name\":true,\n      \"alternateName\":false,\n      \"description\":true,\n      \"variablesMeasured\":false,\n      \"measurementTechnique\":false,\n      \"sameAs\":false,\n      \"doi\":false,\n      \"identifier\":true,\n      \"author\":false,\n      \"isAccessibleForFree\":false,\n      \"dateModified\":true,\n      \"distribution\":true,\n      \"spatialCoverage\":false,\n      \"provider\":false,\n      \"funder\":false,\n      \"temporalCoverage\":false\n    },\n    \"cardinality\":{\n      \"url\":1,\n      \"name\":1,\n      \"alternateName\":0,\n      \"description\":1,\n      \"variablesMeasured\":0,\n      \"measurementTechnique\":0,\n      \"sameAs\":0,\n      \"doi\":0,\n      \"identifier\":1,\n      \"author\":0,\n      \"isAccessibleForFree\":0,\n      \"dateModified\":1,\n      \"distribution\":1,\n      \"spatialCoverage\":0,\n      \"provider\":0,\n      \"funder\":0,\n      \"temporalCoverage\":0\n    }\n  }\n}\n```\n\n* `Map\u003cString, List\u003cMetricResult\u003e\u003e measureAsMetricResult(String inputRecord) throws InvalidJsonException`\nReturns a map with a \"raw\" format. The keys of the map are the individual\ncalculators. The values are list of MetricResult objects. Each has a name\n(use `getName()` method), and a map of metrics (use `getResultMap()` method).\nSince it is rather difficult to illustrate, let me give you some assertions here:\n\n```Java\nassertTrue(metrics instanceof Map);\nassertEquals(1, metrics.size());\nassertEquals(\"completeness\", metrics.keySet().iterator().next());\n// the calculator produced three metrics\nassertEquals(3, metrics.get(\"completeness\").size());\n\n// first: completeness\nassertEquals(\"completeness\", metrics.get(\"completeness\").get(0).getName());\nassertEquals(\n  Map.of(\"TOTAL\", 0.35294117647058826, \"MANDATORY\", 1.0),\n  metrics.get(\"completeness\").get(0).getResultMap());\n\n// second: existence\nassertEquals(\"existence\", metrics.get(\"completeness\").get(1).getName());\nassertEquals(\n  Set.of(\"url\", \"name\", \"alternateName\", \"description\", \"variablesMeasured\", \"measurementTechnique\",\n        \"sameAs\", \"doi\", \"identifier\", \"author\", \"isAccessibleForFree\", \"dateModified\",\n        \"distribution\", \"spatialCoverage\", \"provider\", \"funder\", \"temporalCoverage\"),\n  metrics.get(\"completeness\").get(1).getResultMap().keySet());\nassertEquals(\n  List.of(true, true, false, true, false, false, false, false, true, false, false, true, true,\n          false, false, false, false),\n  new ArrayList(metrics.get(\"completeness\").get(1).getResultMap().values()));\n\n// third: cardinality\nassertEquals(\"cardinality\", metrics.get(\"completeness\").get(2).getName());\nassertEquals(\n  Set.of(\"url\", \"name\", \"alternateName\", \"description\", \"variablesMeasured\", \"measurementTechnique\",\n        \"sameAs\", \"doi\", \"identifier\", \"author\", \"isAccessibleForFree\", \"dateModified\",\n        \"distribution\", \"spatialCoverage\", \"provider\", \"funder\", \"temporalCoverage\"),\n  metrics.get(\"completeness\").get(2).getResultMap().keySet());\nassertEquals(\n  List.of(1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0),\n  new ArrayList(metrics.get(\"completeness\").get(2).getResultMap().values()));\n```\n\nIf your input is a CSV file, and you already processed the lines \ninto list of cells, you could use the same methods:\n\n* `String measure(List\u003cString\u003e record) throws InvalidJsonException`\n* `List\u003cString\u003e measureAsList(List\u003cString\u003e record) throws InvalidJsonException`\n* `List\u003cObject\u003e measureAsListOfObjects(List\u003cString\u003e record) throws InvalidJsonException`\n* `Map\u003cString, Object\u003e measureAsMap(List\u003cString\u003e record) throws InvalidJsonException`\n* `String measureAsJson(List\u003cString\u003e inputRecord) throws InvalidJsonException`\n* `Map\u003cString, List\u003cMetricResult\u003e\u003e measureAsMetricResult(List\u003cString\u003e inputRecord) throws InvalidJsonException`\n\nAn example which collects output into a StringBuffer (you can persist lines\ninto a CSV file or into a database):\n\n```Java\n// collect the output into a container. The output is a CSV file\nStringBuffer output = new StringBuffer();\n\n// get the header of the output CSV\noutput.append(calculator.getHeader())\n\n// The input could be JSON, XML or CSV. \n// You can set any kind of datasource, as long it returns a String\nIterator iterator = ...;\nwhile (iterator.hasNext()) {\n  try {\n    // measure the input\n    String csv = calculator.measure(iterator.next());\n    // save csv\n    output.append(csv);\n  } catch (InvalidJsonException e) {\n    // handle exception\n  }\n}\n\n// get the output\nString metrics = output.toString();\n```\n\n## Defining schema with a MQA Schema file\n\nSchemas can be defined using the **MQA Schema** language. A MQA Schema can be given as YAML file or as JSON file.\n\n```\nSchema schema = ConfigurationReader\n  .readSchemaYaml(\"path/to/some/configuration.yaml\")\n  .asSchema();\n```\n\nA MQA Schema in YAML syntax:\n\n```yaml\nformat: json\nfields:\n  - name: edm:ProvidedCHO/@about\n    path:  $.['providedCHOs'][0]['about']\n    indexField: id\n    extractable: true\n    rules:\n      - and:\n        - minCount: 1 \n        - maxCount: 1\n        failureScore: -10\n      - pattern: ^https?://.*$\n        successScore: 3\n    categories:\n      - MANDATORY\n  - name: Proxy/dc:title\n    path: $.['proxies'][?(@['europeanaProxy'] == false)]['dcTitle']\n    categories:\n      - DESCRIPTIVENESS\n      - SEARCHABILITY\n      - IDENTIFICATION\n      - MULTILINGUALITY\n      - CUSTOM\n  - name: Proxy/dcterms:alternative\n    path: $.['proxies'][?(@['europeanaProxy'] == false)]['dctermsAlternative']\n    categories:\n      - DESCRIPTIVENESS\n      - SEARCHABILITY\n      - IDENTIFICATION\n      - MULTILINGUALITY\ngroups:\n  - fields:\n      - Proxy/dc:title\n      - Proxy/dc:description\n    categories:\n      - MANDATORY\n```\n\nThe same in JSON syntax:\n\n```json\n{\n  \"format\": \"json\",\n  \"fields\": [\n    {\n      \"name\": \"edm:ProvidedCHO/@about\",\n      \"path\":  \"$.['providedCHOs'][0]['about']\",\n      \"indexField\": \"id\",\n      \"extractable\": true,\n      \"rules\": [\n        {\n          \"and\": [\n            {\"minCount\": 1},\n            {\"maxCount\": 1}\n          ],\n          \"failureScore\": -10\n        },\n        {\n          \"pattern\": \"^https?://.*$\",\n          \"successScore\": 3\n        }\n      ],\n      \"categories\": [\"MANDATORY\"]\n    },\n    {\n      \"name\": \"Proxy/dc:title\",\n      \"path\": \"$.['proxies'][?(@['europeanaProxy'] == false)]['dcTitle']\",\n      \"categories\": [\n        \"DESCRIPTIVENESS\",\n        \"SEARCHABILITY\",\n        \"IDENTIFICATION\",\n        \"MULTILINGUALITY\"\n      ]\n    },\n    {\n      \"name\": \"Proxy/dcterms:alternative\",\n      \"path\": \"$.['proxies'][?(@['europeanaProxy'] == false)]['dctermsAlternative']\",\n      \"categories\": [\n        \"DESCRIPTIVENESS\",\n        \"SEARCHABILITY\",\n        \"IDENTIFICATION\",\n        \"MULTILINGUALITY\"\n      ]\n    }\n  ],\n  \"groups\": [\n    {\n      \"fields\": [\n        \"Proxy/dc:title\",\n        \"Proxy/dc:description\"\n      ],\n      \"categories\": [\n        \"MANDATORY\"\n      ]\n    }\n  ]\n}\n```\n\nThe central piece is the `fields` array. Each item represents the properties of\na single data elements (a DataELement in the API). Its properties are:\n\n* `name` (String): the name or label of the data element\n* `path` (String): a address of the data element. If the format is XML, it\n  should be an XPath expression. If format is JSON, it should be a JSONPath\n  expression. If the format is CSV, it should be the name of the column. \n* `categories` (List\u003cString\u003e): a list of categories this field belongs to.\n  Categories can be anything, in Europeana's use case these are the core\n  functionalities the field supports\n* `extractable` (boolean): whether the field can be extracted if field\n  extraction is turned on\n* `rules` (List\u003cRule\u003e): a set of rules or constraints which will be checked\n  against\n* `indexField` (String): the name which can be used in a search engine connected\n  to the application (at the time of writing Apache Solr is supported)\n* `inactive` (boolean): the data element is inactive, do not run checks on this\n* `identifierField` (boolean): the data element is the identifier of the record\n* `asLanguageTagged` (boolean): treat the data element as language tagged. It works \n  for JSON where the content of the data element is encoded with an associated \n  array, where the keys are the language tags.\n\nOptionaly you can set the \"canonical list\" of categories. It provides\ntwo additional functionalities \n\n* if a field contains a category which is not listed in the list, that will be\n  excluded (with a warning in the log)\n* the order of the categories in the output follows the order set in the\n  configuration.\n\nHere is an example (in YAML):\n\n```yaml\nformat: json\n...\ncategories:\n  - MANDATORY\n  - DESCRIPTIVENESS\n  - SEARCHABILITY\n  - IDENTIFICATION\n  - CUSTOM\n  - MULTILINGUALITY\n\n```\n### Rules\n\nOne can add constraints to the fields. There are content rules, which\nthe tool will check. In this version the tool mimin SHACL constraints.\n\n#### Cardinality\nOne can specify with these constraints how many occurrences of a data element\na record can have.\n\n##### `minCount \u003cnumber\u003e`\nSpecifies the minimum number of field occurence (API: `setMinCount()` or `withMinCount()`)\n\nExample: the field should have at least one occurrence\n\n```yaml\n- name: about\n  path:  $.['about']\n  rules:\n  - minCount: 1\n```\n\n##### `maxCount \u003cnumber\u003e`\nSpecifies the maximum number of field occurence (API: `setMaxCount()` or `withMaxCount()`)\n\nExample: the field might have maximum one occurrence\n\n```yaml\n- name: about\n  path:  $.['about']\n  rules:\n  - maxCount: 1\n```\n\n#### Value Range\n\nYou can set a range of value within which the field's value should remain. You\ncan set a lower and higher bound with boolean operators. You can specify either\nintegers or floating point numbers.\n\n##### `minExclusive \u003cnumber\u003e`\nThe minimum exclusive value ([field value] \u003e limit, API: `setMinExclusive(Double)` or `withMinExclusive(Double)`)\n\n##### `minInclusive \u003cnumber\u003e`\nThe minimum inclusive value ([field value] \u003e= limit, API: `setMinInclusive(Double)` or `withMinExclusive(Double)`)\n\n##### `maxExclusive \u003cnumber\u003e`\nThe maximum exclusive value ([field value] \u003c limit, API: `setMaxExclusive(Double)` or `withMaxExclusive(Double)`)\n\n##### `maxInclusive \u003cnumber\u003e`\nThe maximum inclusive value ([field value] \u003c= limit, API: `setMaxInclusive(Double)` or `withMaxInclusive(Double)`)\n\nExample: 1.0 \u003c= price \u003c= 2.0\n\n```yaml\n- name: price\n  path:  $.['price']\n  rules:\n    - and:\n      - minInclusive: 1.0\n      - maxInclusive: 2.0\n```\n\nExample: 1.0 \u003c price \u003c 2.0\n\n```yaml\n- name: price\n  path:  $.['price']\n  rules:\n    - and:\n      - minExclusive: 1\n      - maxExclusive: 2\n```\nNote: integers will be interpreted as floating point numbers.\n\n#### String constraints\n\n##### `minLength \u003cnumber\u003e`\nThe minimum string length of each field value (API: `setMinLength(Integer)` or `withMinLength(Integer)`)\n\nExample: the field value should not be empty\n```yaml\n- name: about\n  path:  $.['about']\n  rules:\n    - minLength: 1\n```\n\n##### `maxLength \u003cnumber\u003e`\nThe maximum string length of each field value (API: `setMaxLength(Integer)` or `withMaxLength(Integer)`)\n\nExample: the value should be 3, 4, or 5 characters long.\n\n```yaml\n- name: about\n  path:  $.['about']\n  rules:\n    - and:\n      - minLength: 3\n      - maxLength: 5\n```\n\n##### `hasValue \u003cString\u003e`\nThe value should be equal to the provided value (API: `setHasValue(String)` or `withHasValue(String)`)\n\nExample: the status should be \"published\".\n\n```yaml\n- name: status\n  path:  $.['status']\n  rules:\n    - hasValue: published\n```\n\n##### `in [String1, ..., StringN]`\nThe string value should be one of the listed values (API: `setIn(List\u003cString\u003e)` or `withIn(List\u003cString\u003e)`)\n\nExample: the value should be either \"dataverse\", \"dataset\" or \"file\".\n\n```yaml\n- name: type\n  path:  $.['type']\n  rules:\n    - in: [dataverse, dataset, file]\n```\n\n##### `pattern \u003cregular expression\u003e`\nA regular expression that each field value matches to satisfy the condition.\nThe expression can match a a part of the whole string (see the Java Matcher \nobject's [find](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#find--) method).\n(API: `setPattern(String)` or `withPattern(String)`)\n\nExample: the field value should start with http:// or https:// and end with\n.jpg, .jpeg, .jpe, .jfif, .png, .tiff, .tif, .gif, .svg, .svgz, or .pdf.\n\n```yaml\n- name: thumbnail\n  path: oai:record/dc:identifier[@type='binary']\n  rules:\n    - pattern: ^https?://.*\\.(jpg|jpeg|jpe|jfif|png|tiff|tif|gif|svg|svgz|pdf)$\n```\n\n##### `minWords \u003cnumber\u003e`\nThe minimum word count of each field value (API: `setMinWords(Integer)` or `withMinWord(Integer)`)\n\nExample: the field value should have at least one words\n\n```yaml\n- name: about\n  path:  $.['about']\n  rules:\n    - minWords: 1\n```\n\n##### `maxWords \u003cnumber\u003e`\nThe maximum string length of each field value (API: `setMaxWords(Integer)` or `withMaxWords(Integer)`)\n\nExample: the value should be at least 3 character long, but should not contain more than 2 words.\n\n```yaml\n- name: about\n  path:  $.['about']\n  rules:\n    - and:\n      - minLength: 3\n      - maxWords: 2\n```\n\n#### Comparision of properties\n\n##### `equals \u003cfield label\u003e`\nThe set of all values of a field is equal to the set of all values of another\nfield (API: `setEquals(String)` or `withEquals(String)`)\n\nExample: The ID should be equal to the ISBN number.\n\n```yaml\nfields:\n  - name: id\n    path:  $.['id']\n    rules:\n      - equals: isbn\n  - name: isbn\n    path:  $.['isbn']\n```\n\n##### `disjoint \u003cfield label\u003e`\nThe set of values of a field is disjoint (not equal) with the set of all values\nof another field (API: `setDisjoint(String)` or `withDisjoint(String)`)\n\nExample: The title should be different from description.\n\n```yaml\nfields:\n  - name: title\n    path:  $.['title']\n    rules:\n      - equals: description\n  - name: description\n    path:  $.['description']\n```\n##### `lessThan \u003cfield label\u003e`\n\nEach values of a field is smaller than each values of another field\n(API: `setLessThan(String)` or `withLessThan(String)`)\n\nExample: the date of birth is less than the date of death\n\n```yaml\n- name: birthDate\n  path: oai:record/dc:date[@type='birth']\n  rules:\n    - lessThan: deathDate\n```\n\n##### `lessThanOrEquals \u003cfield label\u003e`\n\nEach values of a field is smaller than or equals to each values of another field\n(API: `setLessThanOrEquals(String)` or `withLessThanOrEquals(String)`)\n\nExample: the starting page of the article should be less than or equal to the\nending page:\n\n```yaml\n- name: startingPage\n  path: startingPage\n  rules:\n    - lessThan: endingPage\n```\n\n#### Logical operators\n\nWith logical operators you can build complex rules. Each component should fit to its own rules. \n\n##### `and [\u003crule1\u003e, ..., \u003cruleN\u003e]`\n\nPasses if all the rules in the set passed. (API: `setAnd(List\u003cRule\u003e)` or `withAnd(List\u003cRule\u003e)`)\n\nExample: The ID should have one and only one occurrence, and is should not be an empty string.\n\n```yaml\n- name: id\n  path: oai:record/dc:identifier[@type='providerItemId']\n  rules:\n    - and:\n      - minCount: 1\n      - maxCount: 1\n      - minLength: 1\n```\n\n##### `or [\u003crule1\u003e, ..., \u003cruleN\u003e]`\n\nPasses if at least one of the rules in the set passed. (API: `setOr(List\u003cRule\u003e)` or `withOr(List\u003cRule\u003e)`)\n\nExample: The thumbnail should either end with a known image extension or its\ncontent type should be one of the provided MIME image types.\n\n```yaml\n- name: thumbnail\n  path: oai:record/dc:identifier[@type='binary']\n  rules:\n    - or:\n      - pattern: ^.*\\.(jpg|jpeg|jpe|jfif|png|tiff|tif|gif|svg|svgz)$\n      - contentType: [image/jpeg, image/png, image/tiff, image/tiff-fx, image/gif, image/svg+xml]\n```\n\n##### `not [\u003crule1\u003e, ..., \u003cruleN\u003e]`\n\nPasses if none of the rules in the set passed. (API: `setNot(List\u003cRule\u003e)` or `withNot(List\u003cRule\u003e)`)\n\nExample: make sure that the title and the description is different.\n\n```yaml\n- name: title\n  path:  $.['title']\n  rules:\n    - not:\n      - equals: description\n```\n\n#### Other constraints\n\nThese rules don't have parallel in SHACL.\n\n\n##### `contentType [type1, ..., typeN]`\n\nThis rule interprets the value as a URL, fetches it and extracts the HTTP header's\ncontent type, then checks if it is one of those allowed.\n\nExample: The HTTP content type should be image/jpeg, image/png, image/tiff,\nimage/tiff-fx, image/gif, or image/svg+xml.\n\n```yaml\n- name: thumbnail\n  path: oai:record/dc:identifier[@type='binary']\n  rules:\n    - contentType: [image/jpeg, image/png, image/tiff, image/tiff-fx, image/gif, image/svg+xml]\n```\n\n##### `validLink \u003cboolean\u003e`\n\n(since v0.9.8)\n\nThis rule interprets the value as a URL, parse then checks if it returns a valid HTTP response.\n\nExample: The HTTP content type should be image/jpeg, image/png, image/tiff,\nimage/tiff-fx, image/gif, or image/svg+xml.\n\n```yaml\n- name: thumbnail\n  path: oai:record/dc:identifier[@type='binary']\n  rules:\n    - validLink: true\n```\n\nYou can also add a timout parameter in millisecond (if you not set the default value is \n5000 ms i.e., 5 seconds). If the request to not retrieve results within this time limit, it\nbreaks the connection and the check will return failure. Sometimes the response time is too\nlong, and you would like to check several thousands of URLs, which otherwise would take a very\nlong time.\n\nSet timeout for 1 second:\n\n```yaml\n- name: thumbnail\n  path: oai:record/dc:identifier[@type='binary']\n  rules:\n    - validLink: true\n      timout: 1000\n```\n\n##### `unique \u003cboolean\u003e`\n\n(since v0.9.0)\n\nThis rule checks if the value of the field is unique. Prerequisite: the field\nshould have indexField property, and the content should be indexed with Apache\nSolr.\n\n##### `dependencies [id1, id2, ..., idN]`\n\n(since v0.9.0)\n\nThis rule checks if other rules has already checked and passed. It passes if\nall dependent rules has passed or resulted NA, otherwise fail. The ids should\nbe valid, and the dependent rule should take place after the ones from which\nit depends.\n\n##### `dimension [criteria1, criteria2, ..., criteriaN]`\n\n(since v0.9.0)\n\nThis checks if a linked image fits to some dimension constraints (unit in\npixel) - if the value is an URL for an image. One can check the minimum and\nmaximum size of width, height and shorter or longer sides (in case it is not\nimportant if width or height is the shorter). The criteria:\n\n- `minWidth`: the minimum width\n- `maxWidth`: the maximum width\n- `minHeight`: minimum height\n- `maxHeight`: maximum height\n- `minShortside`: minimum length of the shorter side of the image\n- `maxShortside`: maximum length of the shorter side of the image\n- `minLongside`: minimum length of the longer side of the image\n- `maxLongside`: minimum length of the longer side of the image\n\nexample:\n\n```ỳaml\nformat: csv\nfields:\n- name: thumbnail\n  path: oai:record/dc:identifier[@type='binary']\n  rules:\n  - id: 3.1\n    failureScore: -9\n    dimension:\n      minWidth: 200\n      minHeight: 200\n```\n\n##### `hasLanguageTag \u003canyOf|oneOf|allOf\u003e`\n\n(since v0.9.6)\n\nIt checks if the data element value has language tag. In XML the language tag is\nfound in `@xml:lang` attribute. In JSON it might be encoded differently. Right now \nMQAF suppoert the following encoding:\n\n```json\n\"description\": {\n  \"de\": [\"Porträt\"]\n}\n```\n\nSince this kind of structure might be applied not only for the language annotation, at\nthe field level we should set that the field is expected to have language annotation:\n\n```yaml\nformat: json\nfields:\n  - name: description\n    path: $.['description']\n    asLanguageTagged: true\n```\n\nThe parameter defines if any, one or all instances should have language annotation:\n\n* `anyOf`: the test passes if at least one instance has language tag\n* `oneOf`: the test passes if one and only one instance has language tag\n* `allOf`: the test passes if at least all instances have language tag\n\nA full example:\n\n```yaml\nformat: json\nfields:\n- name: description\n  path: $.['description']\n  asLanguageTagged: true\n  rules:\n  - hasLanguageTag: allOf\n```\n\n##### `isMultilingual \u003cboolean\u003e`\n\n(since v0.9.6)\n\nIt checks if the data element is multilingual, so it has at least two instances with\ndifferent language annotations.\n\n```json\n{\n  \"description\":{\n    \"de\":[\"Portr\\u00e4t\"],\n    \"zh\":[\"\\u8096\\u50cf\"]\n  }\n}\n\n```\n\nan example schema\n\n```yaml\nformat: json\nfields:\n  - name: description\n    path: $.['description']\n    asLanguageTagged: true\n    rules:\n      - isMultilingual: true\n```\n\n##### `hasChildren [element1, element2, ...]`\n\n(since v0.9.7)\n\nIt checks if the data element has all the specified child elements.\n\n```xml\n\u003clido:date\u003e\n  \u003clido:earliestDate\u003e1801\u003c/lido:earliestDate\u003e\n  \u003clido:latestDate\u003e1900\u003c/lido:latestDate\u003e\n\u003c/lido:date\u003e\n```\n\nan example schema\n\n```yaml\nformat: json\nfields:\n  - name: description\n    path: lido:date\n    rules:\n      - hasChildren:\n          - lido:earliestDate\n          - lido:latestDate\n```\n\nIt checks if all the `lido:date` elements have both `lido:earliestDate` and `lido:latestDate`.\n\n#### General properties\n\n##### `id \u003cString\u003e`\n\nYou can define an identifier to the rule, which will be reflected in the output.\nIf you miss it, the system will assign a count number. ID might also help if\nyou transform a human readable document such as cataloguing rules into a\nconfiguration file, and you want to keep linkage between them. \n(API `setId(String)` or `withId(String)`)\n\n##### `description \u003cString\u003e`\n\nProvide a description to document what the particular rule is doing. It can be\nanything reasonable, it does not play a role in the calculation.\n\n##### `failureScore \u003cinteger\u003e`\n\nA score which will be calculated if the validation fails. The score should be a\nnegative or positive integer (including zero). \n(API `setFailureScore(Integer)` or `withFailureScore(Integer)`)\n\n##### `successScore \u003cinteger\u003e`\n\nA score which will be calculated if the validation passes. The score should be\na negative or positive integer (including zero).\n(API `setSuccessScore(Integer)` or `withSuccessScore(Integer)`)\n\n##### `naScore \u003cinteger\u003e`\n\nA score which will be calculated if the value is missing (there is no such data element).\nThe score should be a negative or positive integer (including zero).\n(API `setNaScore(Integer)` or `withNaScore(Integer)`)\n\nExample: set of rules with IDs and scores.\n\n```yaml\n- name: providerid\n  path: oai:record/dc:identifier[@type='providerid']\n  rules:\n  - and:\n    - minCount: 1\n    - minLength: 1\n      failureScore: -6\n      id: 2.1\n  - pattern: ^(DE-\\d+|DE-MUS-\\d+|http://id.zdb-services.de\\w+|\\d{8}|oai\\d{13})$\n    failureScore: -3\n    naScore: 0\n    id: 2.2\n  - pattern: ^(DE-\\d+|DE-MUS-\\d+|http://id.zdb-services.de\\w+)$\n    successScore: 6\n    naScore: 0\n    id: 2.4\n  - pattern: ^http://id.zdb-services.de\\w+$\n    successScore: 3\n    naScore: 0\n    id: 2.5\n  - pattern: ^http://d-nb.info/gnd/\\w+$\n    successScore: 3\n    naScore: 0\n    id: 2.6\n```\n\n##### `hidden \u003cboolean\u003e`\n\n(since v0.9.0)\n\nIf the rule is hidden it will be calculated, but its output will not be present\nin the overall output. It can be used together width dependencies to set up\ncompound conditions.  \n\n##### `skip \u003cboolean\u003e`\n\n(since v0.9.0)\n\nThis rule prevents a particular rule to be part of calculation. This could be\nuseful in development phase when you started to create a complex rule but\nhaven't yet finished, or when the execution of the rule takes long time (e.g.\nchecking content type or image dimension), and temporary you would like to\nturn it off.\n\n##### `debug \u003cboolean\u003e`:\n\n(since v0.9.0)\n\nIf set, the tool logs the rule identifier, its value and the rule's result.\n\n##### `mandatory \u003cboolean\u003e`:\n\n(since v0.9.0)\n\nIf set, the data element should exists, should have a value, and should pass the checks.\n\n##### `alwaysCheckDependencies \u003cboolean\u003e`:\n\n(since v0.9.0)\n\nIf set, and the dependencies are set, they will be checked even if the\ndata element is not existing.\n\n##### `scope \u003canyOf|oneOf|allOf\u003e`:\n\n(since v0.9.0)\n\nThe parameter defines if any, one or all instances should pass the check:\n\n* `anyOf`: the test passes if at least one instance passes the test\n* `oneOf`: the test passes if one and only one instance passes the test\n* `allOf`: the test passes if at least all instances pass the test\n\n#### Set rules via Java API \n\n```Java\nSchema schema = new BaseSchema()\n  .setFormat(Format.CSV)\n  .addField(\n    new DataELement(\"title\", \"title\")\n      .setRule(\n        new Rule()\n          .withDisjoint(\"description\")\n      )\n  )\n  .addField(\n    new DataELement(\"url\", \"url\")\n      .setRule(\n        new Rule()\n          .withMinCount(1)\n          .withMaxCount(1)\n          .withPattern(\"^https?://.*$\")\n      )\n  )\n  ;\n```\n\nVia configuration file (a YAML example):\n\n```yaml\nformat: csv\nfields:\n  - name: title\n    categories: [MANDATORY]\n    rules:\n      disjoint: description\n  - name: url\n    categories: [MANDATORY]\n    extractable: true\n    rules:\n      minCount: 1\n      maxCount: 1\n      pattern: ^https?://.*$\n```\n\nIn both cases we defined two fields. `title` has one constraints: it should\nnot be equal to the value of `description` field (which is masked out from\nthe example). Note: if this hypothetical `description` field is not available\nthe API drops an error message into the log. `url` should have one and only\none instance, and its value should start with \"http://\" or \"https://\".\n\nAs you can see there are two types of setters in the API: `setSomething` and \n`withSomething`. The difference is that `setSomething` returs with void, but\n`withSomething` returns with the Rule object, so you can use it in a chain\nsuch as `new Rule().withMinCount(1).withMaxCount(3)` \n(while `new Rule().setMinCount(1).setMaxCount(3)` doesn't work, because\n`setMinCount()` does returns nothing, and one can not apply `setMaxCount(3)`\non that nothing).\n\n## Defining MeasurementConfiguration with a configuration file\n\nMeasurementConfiguration can be created from JSON or YAML configuration files\nwith the following methods:\n\n* `ConfigurationReader.readMeasurementJson(String filePath)`: reading configuration from JSON\n* `ConfigurationReader.readMeasurementYaml(String filePath)`: reading configuration from YAML\n\nan example:\n\n```Java\nMeasurementConfiguration configuration = ConfigurationReader\n  .readMeasurementJson(\"path/to/some/configuration.json\");\n```\n\nAn example JSON file:\n\n```JSON\n{\n  \"fieldExtractorEnabled\": false,\n  \"fieldExistenceMeasurementEnabled\": true,\n  \"fieldCardinalityMeasurementEnabled\": true,\n  \"completenessMeasurementEnabled\": true,\n  \"tfIdfMeasurementEnabled\": false,\n  \"problemCatalogMeasurementEnabled\": false,\n  \"ruleCatalogMeasurementEnabled\": false,\n  \"languageMeasurementEnabled\": false,\n  \"multilingualSaturationMeasurementEnabled\": false,\n  \"collectTfIdfTerms\": false,\n  \"uniquenessMeasurementEnabled\": false,\n  \"completenessCollectFields\": false,\n  \"saturationExtendedResult\": false,\n  \"checkSkippableCollections\": false\n}\n```\n\n* `fieldExtractorEnabled`: Flag whether or not the field extractor is enabled\n  (default: false). (API calls: setters: `enableFieldExtractor()`,\n  `disableFieldExtractor()`, getter: `isFieldExtractorEnabled()`)\n* `fieldExistenceMeasurementEnabled`: Flag whether or not run the field\n  existence measurement (default: true). (API calls: setters:\n  `enableFieldExistenceMeasurement()`, `disableFieldExistenceMeasurement()`,\n  getter: `isFieldExistenceMeasurementEnabled()`)\n* `fieldCardinalityMeasurementEnabled`: Flag whether or not run the field\n  cardinality measurement (default: true). (API calls: setters:\n  `enableFieldCardinalityMeasurement()`, `disableFieldCardinalityMeasurement()`,\n  getter: `isFieldCardinalityMeasurementEnabled()`)\n* `completenessMeasurementEnabled`: Flag whether or not run the completeness\n  measurement (default: true). (API calls: setters: `enableCompletenessMeasurement()`,\n  `disableCompletenessMeasurement()`, getter: `isCompletenessMeasurementEnabled()`)\n* `tfIdfMeasurementEnabled`: Flag whether or not run the uniqueness measurement\n  (default: false). (API calls: setters: `enableTfIdfMeasurement()`, `disableTfIdfMeasurement()`,\n  getter: `isTfIdfMeasurementEnabled()`)\n* `problemCatalogMeasurementEnabled`: Flag whether or not run the problem catalog (default: false).\n  (API calls: setters: `enableProblemCatalogMeasurement()`, `disableProblemCatalogMeasurement()`,\n  getter: `isProblemCatalogMeasurementEnabled()`)\n* `ruleCatalogMeasurementEnabled`: Flag whether or not run the rule catalog (default: false).\n  (API calls: setters: `enableRuleCatalogMeasurement()`, `disableRuleCatalogMeasurement()`,\n  getter: `isRuleCatalogMeasurementEnabled()`)\n* `languageMeasurementEnabled`: Flag whether or not run the language detector (default: false).\n  (API calls: setters: `enableLanguageMeasurement()`, `disableLanguageMeasurement()`,\n  getter: `isLanguageMeasurementEnabled()`)\n* `multilingualSaturationMeasurementEnabled`: Flag whether or not run the multilingual\n  saturation measurement (default: false).  (API calls: setters:\n  `enableMultilingualSaturationMeasurement()`, `disableMultilingualSaturationMeasurement()`,\n  getter: `isMultilingualSaturationMeasurementEnabled()`)\n* `collectTfIdfTerms`: Flag whether or not collect TF-IDF terms in uniqueness\n  measurement (default: false). (API calls: setters: `collectTfIdfTerms(boolean)`,\n  getter: `collectTfIdfTerms()`)\n* `uniquenessMeasurementEnabled`: Flag whether or not to run in uniqueness\n  measurement (default: false). (API calls: setters: `enableUniquenessMeasurement()`,\n  `disableUniquenessMeasurement()`, getter: `isUniquenessMeasurementEnabled()`)\n* `completenessCollectFields`: Flag whether or not run missing/empty/existing field\n  collection in completeness (default: false). (API calls: setters:\n  `enableCompletenessFieldCollecting(boolean)`,\n  getter: `isCompletenessFieldCollectingEnabled()`)\n* `saturationExtendedResult`: Flag whether or not to create extended result in\n  multilingual saturation calculation (default: false).\n  (API calls: setters: `enableSaturationExtendedResult(boolean)`,\n  getter: `isSaturationExtendedResult()`)\n* `checkSkippableCollections`: Flag whether or not to check skipable collections (default: false).\n  (API calls: setters: `enableCheckSkippableCollections(boolean)`,\n  getter: `isCheckSkippableCollections()`)\n* `String solrHost`: The hostname of the Solr server.\n  (API calls: setters: `setSolrHost(String)`, `withSolrHost(String):MeasurementConfiguration`,\n  getter: `getSolrHost()`)\n* `String solrPort`: The port of the Solr server.\n  (API calls: setters: `setSolrPort(String)`, `withSolrPort(String):MeasurementConfiguration`,\n  getter: `getSolrPort()`)\n* `String solrPath`: The path part of of the Solr server URL.\n  (API calls: setters: `setSolrPath(String)`, `withSolrPath(String):MeasurementConfiguration`,\n  getter: `getSolrPath()`)\n* `String onlyIdInHeader`: the Rules should return the ID in the header instead of a\n  generated value. (API calls: setters: `setOnlyIdInHeader(boolean)`,\n  `withOnlyIdInHeader(boolean):MeasurementConfiguration`,\n  getter: `isOnlyIdInHeader()`)\n\n## Using an experimental version\n  \nIf you want to try an experimental version (which has `SNAPSHOT` in its\nversion name), you have to enable the retrieval of those versions in the `pom.xml` file:\n\n```xml\n\u003crepositories\u003e\n  \u003crepository\u003e\n    \u003cid\u003esonatypeSnapshots\u003c/id\u003e\n    \u003cname\u003eSonatype Snapshots\u003c/name\u003e\n    \u003curl\u003ehttps://oss.sonatype.org/content/repositories/snapshots\u003c/url\u003e\n    \u003creleases\u003e\n     \u003cenabled\u003efalse\u003c/enabled\u003e\n    \u003c/releases\u003e\n    \u003csnapshots\u003e\n      \u003cenabled\u003etrue\u003c/enabled\u003e\n    \u003c/snapshots\u003e\n  \u003c/repository\u003e\n\u003c/repositories\u003e\n\n\u003cdependencies\u003e\n  ...\n  \u003cdependency\u003e\n    \u003cgrroupId\u003ede.gwdg.metadata\u003c/grroupId\u003e\n    \u003cartifactId\u003emetadata-qa-api\u003c/artifactId\u003e\n    \u003cversion\u003e0.9-SNAPSHOT\u003c/version\u003e\n  \u003c/dependency\u003e\n\u003c/dependencies\u003e\n```\n\nThanks to Miel Vander Sande ([@mielvds](https://github.com/mielvds)) for the hint!\n\n## More info\n\nSince version 0.8-SNAPSHOT the project requires Java 11.\n\nFor the usage and implementation of the API see https://github.com/pkiraly/europeana-qa-api.\n\nJava doc for the actual development version of the API: https://pkiraly.github.io/metadata-qa-api.\n\n[![Build Status](https://travis-ci.org/pkiraly/metadata-qa-api.svg?branch=main)](https://travis-ci.com/pkiraly/metadata-qa-api)\n[![Coverage Status (@coveralls)](https://coveralls.io/repos/github/pkiraly/metadata-qa-api/badge.svg?branch=main)](https://coveralls.io/github/pkiraly/metadata-qa-api?branch=main)\n[![Coverage Status (@codecov)](https://codecov.io/gh/pkiraly/metadata-qa-api/branch/main/graph/badge.svg?token=HLLGXJSVZL)](https://codecov.io/gh/pkiraly/metadata-qa-api)\n[![javadoc](https://javadoc.io/badge2/de.gwdg.metadataqa/metadata-qa-api/javadoc.svg)](https://javadoc.io/doc/de.gwdg.metadataqa/metadata-qa-api)\n[![Maven Central](https://img.shields.io/maven-central/v/de.gwdg.metadataqa/metadata-qa-api)](https://search.maven.org/artifact/de.gwdg.metadataqa/metadata-qa-api)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpkiraly%2Fmetadata-qa-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpkiraly%2Fmetadata-qa-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpkiraly%2Fmetadata-qa-api/lists"}