{"id":50822694,"url":"https://github.com/ranjithguggilla/iso19115-validator","last_synced_at":"2026-06-13T15:36:41.252Z","repository":{"id":358580870,"uuid":"1240233123","full_name":"ranjithguggilla/iso19115-validator","owner":"ranjithguggilla","description":"ISO 19115-2, CF-1.8, and ACDD-1.3 metadata linter with Schematron policy rules, YAML DSL, FAIR scoring, and FastAPI web UI","archived":false,"fork":false,"pushed_at":"2026-05-18T02:42:34.000Z","size":175,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-18T04:51:48.350Z","etag":null,"topics":["acdd","cf-conventions","data-curation","fair-data","iso19115","marine-science","metadata","netcdf","oceanography","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ranjithguggilla.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-15T22:53:00.000Z","updated_at":"2026-05-18T02:42:37.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ranjithguggilla/iso19115-validator","commit_stats":null,"previous_names":["ranjithguggilla/iso19115-validator"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ranjithguggilla/iso19115-validator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ranjithguggilla%2Fiso19115-validator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ranjithguggilla%2Fiso19115-validator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ranjithguggilla%2Fiso19115-validator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ranjithguggilla%2Fiso19115-validator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ranjithguggilla","download_url":"https://codeload.github.com/ranjithguggilla/iso19115-validator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ranjithguggilla%2Fiso19115-validator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34290346,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acdd","cf-conventions","data-curation","fair-data","iso19115","marine-science","metadata","netcdf","oceanography","python"],"created_at":"2026-06-13T15:36:40.490Z","updated_at":"2026-06-13T15:36:41.245Z","avatar_url":"https://github.com/ranjithguggilla.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# iso19115-validator\n\n**CLI and web-based metadata linter for ISO 19115-2, CF-1.8, and ACDD-1.3 geospatial standards.**\n\nValidates XML metadata records and NetCDF file attributes against international standards, institutional policies, and FAIR data principles — entirely offline, with no external API calls.\n\n[![CI](https://github.com/ranjithguggilla/iso19115-validator/actions/workflows/ci.yml/badge.svg)](https://github.com/ranjithguggilla/iso19115-validator/actions)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://python.org)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n\n![Validation Report](docs/screenshots/validation_report.png)\n\n![FAIR Score](docs/screenshots/fair_score.png)\n\n---\n\n## Why This Exists\n\nGeospatial data repositories require metadata that conforms to ISO 19115-2, the Climate and Forecast (CF) Conventions, and the Attribute Convention for Data Discovery (ACDD). Manual metadata review is tedious and error-prone. Existing validators are often online-only, focused on a single standard, or lack actionable fix suggestions.\n\n`isolint` combines structural validation (XSD), policy enforcement (Schematron), convention checking (CF/ACDD), and custom institutional rules (YAML DSL) into a single offline tool with exact XPath error locations and concrete fix suggestions.\n\n---\n\n## Features\n\n| Capability | Description |\n|---|---|\n| **XSD structural validation** | Checks required/recommended ISO 19115-2 elements with XPath locations |\n| **Schematron policy rules** | Validates dates, geographic bounds, topic categories, URLs, abstracts |\n| **CF-1.8 convention checking** | Inspects NetCDF global and variable attributes against CF standard |\n| **ACDD-1.3 compliance** | Checks required/recommended/suggested discovery attributes |\n| **YAML rules DSL** | Define custom institutional policies without writing Python |\n| **SHA-256 checksum verification** | Validates MANIFEST.sha256 and per-file sidecar checksums |\n| **FAIR self-scoring** | Scores Findable/Accessible/Interoperable/Reusable with letter grade |\n| **Metadata diff** | Compares two XML or NetCDF files and reports structural differences |\n| **Auto-suggestions** | Generates prioritized improvement recommendations |\n| **FastAPI web UI** | Browser-based drag-and-drop validation dashboard |\n| **Multiple output formats** | Text (Rich terminal), JSON, Markdown compliance reports |\n| **Fully offline** | No external API calls, no telemetry, no network required |\n\n---\n\n## Architecture\n\n```\n                    ┌──────────────────────────────────────┐\n                    │            isolint CLI                │\n                    │  check · suggest · diff · fair · serve│\n                    └──────────┬───────────────────────────┘\n                               │\n                    ┌──────────▼───────────────────────────┐\n                    │        ValidationEngine               │\n                    │  orchestrates all validation layers   │\n                    └──┬────┬────┬────┬────┬───────────────┘\n                       │    │    │    │    │\n           ┌───────────┘    │    │    │    └──────────┐\n           ▼                ▼    ▼    ▼               ▼\n    ┌──────────┐   ┌─────┐ ┌──┐ ┌────┐    ┌──────────────┐\n    │   XSD    │   │Sch- │ │CF│ │ACDD│    │  YAML Rules  │\n    │Validator │   │ematron│ │  │ │    │    │    Engine    │\n    └──────────┘   └──────┘ └──┘ └────┘    └──────────────┘\n           │           │      │     │              │\n           └───────────┴──────┴─────┴──────────────┘\n                               │\n                    ┌──────────▼───────────────────────────┐\n                    │       ComplianceReport                │\n                    │  findings · JSON · Markdown · Rich    │\n                    └──────────────────────────────────────┘\n```\n\n---\n\n## Quick Start\n\n### Installation\n\n```bash\n# From source\ngit clone https://github.com/ranjithguggilla/iso19115-validator.git\ncd iso19115-validator\npip install -e \".[dev]\"\n\n# Verify installation\nisolint --version\n```\n\n### Validate Metadata\n\n```bash\n# Validate an ISO 19115-2 XML file\nisolint check metadata.xml\n\n# Validate a directory of metadata + NetCDF files\nisolint check /path/to/data/package/\n\n# Get JSON report\nisolint check metadata.xml --format json -o report.json\n\n# Get Markdown report\nisolint check metadata.xml --format markdown -o report.md\n\n# Apply custom institutional rules\nisolint check metadata.xml --rules my_rules.yaml\n```\n\n### Get Improvement Suggestions\n\n```bash\nisolint suggest metadata.xml\nisolint suggest /path/to/data/ --format json\n```\n\n### Compare Two Metadata Files\n\n```bash\nisolint diff old_metadata.xml new_metadata.xml\nisolint diff v1.nc v2.nc --format json\n```\n\n### Compute FAIR Score\n\n```bash\nisolint fair metadata.xml\nisolint fair /path/to/data/ --format json\n```\n\n### Start Web UI\n\n```bash\nisolint serve\n# Opens at http://127.0.0.1:8000\n```\n\n---\n\n## How It Works — Step by Step\n\n### Step 1: File Discovery\n\nWhen pointed at a directory, the engine scans for:\n- `*.xml` → ISO 19115-2 validation (XSD + Schematron)\n- `*.nc` → CF-1.8 + ACDD-1.3 attribute checking\n- `*.sha256` → Checksum verification\n\n### Step 2: XSD Structural Validation\n\nFor each XML file, the validator checks:\n\n1. **Well-formedness** — Can lxml parse the document without errors?\n2. **Required elements** — Are `fileIdentifier`, `language`, `contact`, `dateStamp`, and `identificationInfo` present?\n3. **Recommended elements** — Are `abstract`, `topicCategory`, `extent`, and `dataQualityInfo` present?\n4. **Namespace declarations** — Does the root element declare ISO TC211 namespaces?\n\nEach finding includes the exact XPath to the offending (or missing) element.\n\n### Step 3: Schematron Policy Rules\n\nSeven semantic assertions enforce data quality beyond structure:\n\n| Rule | Check | Severity |\n|------|-------|----------|\n| SCH-001 | Date stamps use ISO 8601 format | Error |\n| SCH-002 | Geographic bounding box coordinates are valid | Error |\n| SCH-003 | Topic categories from controlled vocabulary | Error |\n| SCH-004 | Online resource URLs are well-formed | Warning |\n| SCH-005 | Responsible party has name (org or individual) | Warning |\n| SCH-006 | No empty `gco:CharacterString` elements | Warning |\n| SCH-007 | Abstract is at least 50 characters | Warning |\n\n### Step 4: CF-1.8 Convention Checking\n\nFor NetCDF files, the checker inspects:\n\n- **Global attributes**: `Conventions` must reference CF; `title` is required\n- **Variable attributes**: Each data variable needs `standard_name` or `long_name` plus `units`\n- **Coordinate variables**: Must have `units`; `time` should have `calendar`\n- **Standard names**: Validated against a curated lookup table of common oceanographic names\n\n### Step 5: ACDD-1.3 Compliance\n\nThree-tier attribute classification:\n\n- **Required** (4 attrs): `title`, `summary`, `keywords`, `Conventions`\n- **Recommended** (16 attrs): Including `creator_name`, `license`, `geospatial_*`, `time_coverage_*`\n- **Suggested** (14 attrs): Including `publisher_*`, `platform`, `instrument`\n\nCross-attribute consistency: lat min \u003c lat max, time start \u003c time end.\n\n### Step 6: Custom Rules (YAML DSL)\n\nOrganizations define rules in YAML without writing Python:\n\n```yaml\nrules:\n  - id: INST-001\n    description: \"Dataset must have a DOI\"\n    severity: error\n    check:\n      type: xpath_exists\n      xpath: \"//gmd:identifier//gco:CharacterString[starts-with(., '10.')]\"\n    suggestion: \"Register with DataCite or Zenodo.\"\n\n  - id: INST-002\n    description: \"License must be specified in NetCDF\"\n    severity: error\n    check:\n      type: attr_exists\n      attribute: license\n    suggestion: \"Add license='CC-BY-4.0' to NetCDF global attributes.\"\n```\n\n**Available rule types:**\n- `xpath_exists` — XML element must exist\n- `xpath_not_empty` — XML element must have content\n- `xpath_regex` — XML element text must match regex pattern\n- `attr_exists` — NetCDF global attribute must exist\n- `attr_regex` — NetCDF attribute value must match regex\n- `file_exists` — Named file must exist in directory\n\n### Step 7: Report Generation\n\nReports are produced in three formats:\n\n**Rich terminal** (default) — colored severity indicators, XPath locations, fix suggestions\n\n**JSON** — machine-readable for CI/CD integration:\n```json\n{\n  \"target\": \"metadata.xml\",\n  \"passed\": false,\n  \"summary\": {\"errors\": 3, \"warnings\": 2, \"info\": 4},\n  \"findings\": [\n    {\n      \"severity\": \"error\",\n      \"message\": \"Required element missing: gmd:contact\",\n      \"xpath\": \"//gmd:contact\",\n      \"rule_id\": \"XSD-010\",\n      \"suggestion\": \"Add the required element gmd:contact.\"\n    }\n  ]\n}\n```\n\n**Markdown** — for documentation and pull request comments\n\n---\n\n## FAIR Self-Scoring\n\nThe FAIR scorer evaluates metadata against the four FAIR principles:\n\n| Principle | What's Checked |\n|-----------|---------------|\n| **F**indable | Unique identifier, rich metadata (title/abstract/keywords), dataset ID |\n| **A**ccessible | Online resource URLs, contact information |\n| **I**nteroperable | XML namespaces, vocabulary references, cross-dataset links |\n| **R**eusable | License/constraints, provenance/lineage, community standards |\n\nEach principle scores 0.0–1.0. The overall score is the mean of all four. Letter grades: A (≥90%), B (≥80%), C (≥70%), D (≥60%), F (\u003c60%).\n\n```\nFAIR Score: 46% (Grade: F)\n\n  Findable       ████████░░░░░░░░░░░░ 44%\n  Accessible     ██████████░░░░░░░░░░ 50%\n  Interoperable  ██████░░░░░░░░░░░░░░ 33%\n  Reusable       ██████████░░░░░░░░░░ 50%\n```\n\n---\n\n## YAML Rules DSL\n\nThe YAML rules DSL lets institutions define custom validation policies without touching Python. Rules are loaded at runtime and applied alongside the built-in checks.\n\n### Built-in Rule Sets\n\n| Rule Set | File | Description |\n|----------|------|-------------|\n| Oceanographic | `isolint/rules/oceanographic.yaml` | Rules for marine observation datasets |\n| Institutional | `isolint/rules/institutional.yaml` | Template for organizational policies |\n\n### Writing Custom Rules\n\nCreate a YAML file:\n\n```yaml\nname: my-organization\nversion: \"1.0\"\n\nrules:\n  - id: ORG-001\n    description: \"File identifier must use our naming convention\"\n    severity: error\n    check:\n      type: xpath_regex\n      xpath: \"//gmd:fileIdentifier/gco:CharacterString\"\n      pattern: \"^ORG-\\\\d{4}-\\\\d+\"\n    suggestion: \"Use format ORG-YYYY-NNN.\"\n\n  - id: ORG-002\n    description: \"README must be present in data package\"\n    severity: warning\n    check:\n      type: file_exists\n      filename: \"README.txt\"\n    suggestion: \"Include a README.txt.\"\n```\n\nApply with:\n```bash\nisolint check /data/package --rules my_rules.yaml\n```\n\n---\n\n## Web UI\n\nStart the FastAPI dashboard:\n\n```bash\nisolint serve --port 8000\n```\n\n**Endpoints:**\n\n| Method | Path | Description |\n|--------|------|-------------|\n| GET | `/` | HTML dashboard with drag-and-drop upload |\n| POST | `/validate` | Validate uploaded files |\n| POST | `/suggest` | Get improvement suggestions |\n| POST | `/fair` | Compute FAIR score |\n| GET | `/health` | Health check |\n\nThe web UI is a single-page application with a dark theme. Upload XML or NetCDF files, click Validate/Suggest/FAIR Score, and see results inline.\n\n---\n\n## Project Structure\n\n```\niso19115-validator/\n├── isolint/                    # Core package\n│   ├── __init__.py             # Package exports\n│   ├── engine.py               # ValidationEngine — orchestrates all layers\n│   ├── report.py               # ComplianceReport and Finding models\n│   ├── xsd_validator.py        # XSD structural validation\n│   ├── schematron.py           # Schematron policy rules\n│   ├── cf_checker.py           # CF-1.8 convention checker\n│   ├── acdd_checker.py         # ACDD-1.3 compliance checker\n│   ├── checksum_validator.py   # SHA-256 manifest verification\n│   ├── yaml_rules.py           # YAML rules DSL engine\n│   ├── fair.py                 # FAIR self-scoring module\n│   ├── diff.py                 # Metadata diff engine\n│   ├── suggest.py              # Auto-suggestion engine\n│   ├── cli.py                  # Click CLI (check, suggest, diff, fair, serve)\n│   ├── web.py                  # FastAPI web interface\n│   ├── rules/                  # Built-in YAML rule sets\n│   │   ├── oceanographic.yaml  # Marine observation rules\n│   │   └── institutional.yaml  # Template for organizational policies\n│   └── schemas/                # Schema references\n├── tests/                      # 93 tests across 9 modules\n│   ├── conftest.py             # Shared fixtures\n│   ├── test_xsd_validator.py   # XSD validation tests\n│   ├── test_schematron.py      # Schematron policy tests\n│   ├── test_report.py          # Report model tests\n│   ├── test_checksum.py        # Checksum verification tests\n│   ├── test_yaml_rules.py      # YAML DSL tests\n│   ├── test_fair.py            # FAIR scoring tests\n│   ├── test_diff.py            # Diff engine tests\n│   ├── test_suggest.py         # Suggestion engine tests\n│   ├── test_engine.py          # Integration tests\n│   ├── test_cli.py             # CLI command tests\n│   └── test_web.py             # FastAPI endpoint tests\n├── examples/                   # Sample metadata files\n│   ├── valid_metadata.xml      # Complete ISO 19115-2 record\n│   ├── incomplete_metadata.xml # Deliberately incomplete (for testing)\n│   └── sample_rules.yaml       # Example YAML rules\n├── docs/\n│   ├── METHODS.md              # Technical methods documentation\n│   ├── sample_report_valid.json\n│   ├── sample_report_incomplete.json\n│   └── sample_fair_score.md\n├── scripts/\n│   └── generate_sample_report.py\n├── .github/workflows/ci.yml   # CI: lint + test matrix + build\n├── pyproject.toml              # Project metadata and dependencies\n├── Makefile                    # Development shortcuts\n├── CHANGELOG.md                # Version history\n└── LICENSE                     # MIT\n```\n\n---\n\n## Validation Rule Reference\n\n### XSD Rules (XSD-xxx)\n\n| Rule ID | Severity | Description |\n|---------|----------|-------------|\n| XSD-001 | Error | XML syntax error |\n| XSD-002 | Warning | Root element not MD_Metadata or MI_Metadata |\n| XSD-010 | Error | Required element missing |\n| XSD-011 | Warning | Required element is empty |\n| XSD-020 | Info | Recommended element missing |\n| XSD-030 | Warning | No ISO TC211 namespaces declared |\n\n### Schematron Rules (SCH-xxx)\n\n| Rule ID | Severity | Description |\n|---------|----------|-------------|\n| SCH-001 | Error | Invalid ISO 8601 date format |\n| SCH-002 | Error | Invalid geographic bounding box |\n| SCH-003 | Error | Invalid topic category |\n| SCH-004 | Warning | Malformed online resource URL |\n| SCH-005 | Warning | Responsible party missing name |\n| SCH-006 | Warning | Empty CharacterString elements |\n| SCH-007 | Warning | Abstract too short (\u003c 50 chars) |\n\n### CF Convention Rules (CF-xxx)\n\n| Rule ID | Severity | Description |\n|---------|----------|-------------|\n| CF-010 | Error | Missing required CF global attribute |\n| CF-011 | Info | Missing recommended CF global attribute |\n| CF-012 | Warning | Conventions attribute doesn't reference CF |\n| CF-020 | Warning | Variable lacks standard_name and long_name |\n| CF-021 | Info | Unrecognized standard_name |\n| CF-022 | Warning | Variable missing units attribute |\n| CF-030 | Error | Coordinate variable missing units |\n| CF-031 | Info | Time coordinate missing calendar |\n\n### ACDD Rules (ACDD-xxx)\n\n| Rule ID | Severity | Description |\n|---------|----------|-------------|\n| ACDD-010 | Error | Missing required ACDD attribute |\n| ACDD-011 | Warning | Required ACDD attribute is empty |\n| ACDD-020 | Warning | Missing recommended ACDD attributes |\n| ACDD-030 | Info | Missing suggested ACDD attributes |\n| ACDD-040 | Error | Geospatial lat min \u003e max |\n| ACDD-041 | Error | Latitude out of range |\n| ACDD-042 | Error | Longitude out of range |\n| ACDD-050 | Error | Time coverage start \u003e end |\n\n### Checksum Rules (CHK-xxx)\n\n| Rule ID | Severity | Description |\n|---------|----------|-------------|\n| CHK-001 | Warning | Malformed checksum line |\n| CHK-002 | Error | Referenced file not found |\n| CHK-003 | Error | Checksum mismatch |\n| CHK-100 | Info | Verification summary |\n\n---\n\n## Development\n\n```bash\n# Install with dev dependencies\npip install -e \".[dev]\"\n\n# Run tests\nmake test\n\n# Run tests with coverage\nmake test-cov\n\n# Lint\nmake lint\n\n# Validate example metadata\nmake check\n\n# Generate sample reports\nmake sample\n```\n\n---\n\n## Testing\n\n93 tests across 9 test modules covering:\n\n- **XSD validation**: well-formedness, required/recommended elements, namespaces\n- **Schematron rules**: date formats, geographic bounds, topic categories, URLs, abstracts\n- **Report model**: finding serialization, JSON/Markdown rendering, pass/fail logic\n- **Checksum validation**: valid/invalid digests, missing files, manifest format\n- **YAML rules**: rule loading, XPath evaluation, regex matching, file existence, disabled rules\n- **FAIR scoring**: score computation, grading, XML/NetCDF scoring\n- **Diff engine**: identical files, added/removed/changed elements, malformed input\n- **Suggestion engine**: XML suggestions, directory scanning, priority sorting\n- **CLI**: all commands (check, suggest, diff, fair), all output formats, file output\n- **Web API**: dashboard, validate, suggest, FAIR endpoints\n\n```bash\npytest tests/ -v\n# ============================== 93 passed ==============================\n```\n\n---\n\n## Security\n\n- **Offline-only**: No external API calls, no telemetry, no data leaves your machine\n- **No code execution from metadata**: XML parsing uses lxml with no XSLT or script evaluation\n- **Input validation**: All user inputs sanitized through Click parameter types\n- **Temp file cleanup**: Web uploads use Python's `tempfile` with automatic cleanup\n- **No secrets required**: Works without any API keys or tokens\n\n---\n\n## References\n\n- ISO 19115-2:2019 — Geographic information — Metadata — Part 2\n- CF Metadata Conventions v1.8 (2020). http://cfconventions.org/\n- ACDD 1.3 (2015). https://wiki.esipfed.org/ACDD_1.3\n- FAIR Data Principles (Wilkinson et al., 2016). doi:10.1038/sdata.2016.18\n- ISO/IEC 19757-3:2020 — Schematron\n\n---\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Franjithguggilla%2Fiso19115-validator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Franjithguggilla%2Fiso19115-validator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Franjithguggilla%2Fiso19115-validator/lists"}