https://github.com/jmvanel/dwca-rdf
https://github.com/jmvanel/dwca-rdf
darwin-core dwca jena rdf
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/jmvanel/dwca-rdf
- Owner: jmvanel
- Created: 2020-08-20T09:04:58.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-12-15T18:56:13.000Z (over 5 years ago)
- Last Synced: 2024-01-26T10:37:01.591Z (over 2 years ago)
- Topics: darwin-core, dwca, jena, rdf
- Language: Scala
- Homepage:
- Size: 19.5 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Convert a Darwin Core Archive into Darwin Core RDF
It's work in progress.
It leverages on existing gbif Java library
https://github.com/gbif/dwca-io
and Apache Jena.
I tried in on this Darwin Core Archive, that contains plants in my département:
https://www.gbif.org/dataset/85a97a5f-751a-49ca-ad7d-238a80c0d30c
TESTED in Scala REPL with
```
runMain jmvanel.DWCA2RDF /home/jmv/data/Biologie/GBIF.org/Flore_Ain_0039246-200613084148143.zip
```
See also the JSON-LD project for GBIF.org API :
https://github.com/jmvanel/rdf-convert/tree/master/gbif.org
## Current result sample
```turtle
# graph size 34489
# person Map size 28
.
.
"Potentilla crantzii (Crantz) Fritsch" .
"46.15952" .
"5.39777" .
.
.
.
"A.Cl. BOLOMIER -" .
.
```
NOTES
- I suppose that soon :) the GBIF API URL's will be dereferenceable RDF URI; this gbif API is not currently RDF, but it could be made RDF later with JSON-LD : https://api.gbif.org/v1/occurrence/
- same for persons, but here I don't know what is the API for persons, if any ; I just used prefix https://api.gbif.org/v1/person/ for now
- detect if a "recordedBy" value is a person or an organization is not trivial; I made no attempt yet ...
- taxonKey is very important : it is the global GBIF ID for the taxon; a permanent dereferenceable URI has to be defined
- coordinates should be xsd:float's ; same predicates as geo:lat, geo:long
- the collectors do not AFAIK have a GBIF ID
- "modified" key should be used
- "identifier" key should be used; is there an API for this ?
- "eventID" key should be used; is there an API for this ?
- "nameAccordingTo": "TAXREF v12" was taken for granted; should be processed; case of other taxon registries to study
- applies a flat RDF structure (except for persons); it's not fully compliant to DSW ... but simple
DONE
- given "basisOfRecord": "HUMAN_OBSERVATION", the class dwc:HumanObservation should also be assigned
Here is the GBIF API result for this (observation) occurrence
```json
wget -O - https://api.gbif.org/v1/occurrence/2487147795 |jq .
{
"key": 2487147795,
"datasetKey": "85a97a5f-751a-49ca-ad7d-238a80c0d30c",
"publishingOrgKey": "1928bdf0-f5d2-11dc-8c12-b8a03c50a862",
"installationKey": "07ea29ef-e386-4278-ae0f-095778a1b061",
"publishingCountry": "FR",
"protocol": "DWC_ARCHIVE",
"lastCrawled": "2020-06-03T01:11:30.173+0000",
"lastParsed": "2020-06-08T16:39:04.529+0000",
"crawlId": 1,
"extensions": {},
"basisOfRecord": "HUMAN_OBSERVATION",
"taxonKey": 5366215,
"kingdomKey": 6,
"phylumKey": 7707728,
"classKey": 220,
"orderKey": 691,
"familyKey": 5015,
"genusKey": 8079058,
"speciesKey": 8285546,
"acceptedTaxonKey": 8370002,
"scientificName": "Potentilla crantzii (Crantz) Fritsch",
"acceptedScientificName": "Potentilla crantzii subsp. crantzii",
"kingdom": "Plantae",
"phylum": "Tracheophyta",
"order": "Rosales",
"family": "Rosaceae",
"genus": "Potentilla",
"species": "Potentilla crantzii",
"genericName": "Potentilla",
"specificEpithet": "crantzii",
"taxonRank": "SPECIES",
"taxonomicStatus": "SYNONYM",
"decimalLongitude": 5.39777,
"decimalLatitude": 46.15952,
"coordinateUncertaintyInMeters": 5000,
"issues": [
"GEODETIC_DATUM_ASSUMED_WGS84",
"RECORDED_DATE_INVALID"
],
"modified": "2019-02-28T00:00:00.000+0000",
"lastInterpreted": "2020-06-08T16:39:04.529+0000",
"license": "http://creativecommons.org/licenses/by-nc/4.0/legalcode",
"identifiers": [],
"media": [],
"facts": [],
"relations": [],
"geodeticDatum": "WGS84",
"class": "Magnoliopsida",
"countryCode": "FR",
"recordedByIDs": [],
"identifiedByIDs": [],
"country": "France",
"identifier": "82e3ca10-dcbf-24f6-e053-2614a8c008ee",
"eventID": "82e3ca10-dcbf-24f6-e053-2614a8c008ee",
"dataGeneralizations": "Géographie transmise soumise à floutage (grille avec mailles de 10x10km) pour le grand public » en conformité avec les règles de diffusion du SINP | Geographic information generalized during aggregation (grid with 10x10km cells) for the general public, according to SINP communication rules",
"county": "01",
"identificationVerificationStatus": "Control could not be conclusive due to insufficient knowledge",
"gbifID": "2487147795",
"occurrenceID": "82e3ca10-dcbf-24f6-e053-2614a8c008ee",
"taxonID": "139270",
"occurrenceStatus": "Présent",
"recordedBy": "A.Cl. BOLOMIER - (Non renseigné)",
"locationRemarks": "Data isn’t the original geo referenced one, but attached to the nearest 10x10 km grid cell",
"institutionCode": "Non renseigné",
"originalNameUsage": "Potentilla verna",
"datasetID": "4A9DDA1F-B72E-3E13-E053-2614A8C02B7C",
"nameAccordingTo": "TAXREF v12",
"identifiedBy": "Non renseigné (Non renseigné)"
}
```
**Links**
- https://dwc.tdwg.org/rdf/
- https://dwc.tdwg.org/terms/
- http://baskauf.blogspot.com/2019/