{"id":24318246,"url":"https://github.com/rdmpage/ibol-dwca","last_synced_at":"2025-03-10T18:34:59.539Z","repository":{"id":142280193,"uuid":"66028431","full_name":"rdmpage/ibol-dwca","owner":"rdmpage","description":"International Barcode of Life project (iBOL) to Darwin Core","archived":false,"fork":false,"pushed_at":"2017-05-16T11:07:44.000Z","size":660,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-17T14:43:14.481Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rdmpage.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-08-18T20:27:20.000Z","updated_at":"2021-07-25T17:39:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"5d1210f7-1d72-4287-b8f4-a2a553853807","html_url":"https://github.com/rdmpage/ibol-dwca","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rdmpage%2Fibol-dwca","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rdmpage%2Fibol-dwca/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rdmpage%2Fibol-dwca/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rdmpage%2Fibol-dwca/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rdmpage","download_url":"https://codeload.github.com/rdmpage/ibol-dwca/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242903121,"owners_count":20204216,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-17T14:37:39.622Z","updated_at":"2025-03-10T18:34:59.513Z","avatar_url":"https://github.com/rdmpage.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# International Barcode of Life project (iBOL) to Darwin Core\n\n## Interesting examples\n\n### AMSF109-09\n\nBarcode AMSF109-09 has museum id I.44764-003, which is *AM I.44764-003*, which is GBIF http://www.gbif.org/occurrence/1100680435  Has occurrenceID c8baeef3-c4b2-426c-937a-fe0093816a25 and event id urn:australianmuseum.net.au:Events:1114726, can search OZCAM for event: http://ozcam.ala.org.au/occurrences/search?q=\u0026fq=event_id%3A%22urn%3Aaustralianmuseum.net.au%3AEvents%3A1114726%22#tab_recordsView  Can also get another id urn:lsid:ozcam.taxonomy.org.au:AM:Ichthyology:I.44764-003 from http://ozcam.ala.org.au/occurrence/c8baeef3-c4b2-426c-937a-fe0093816a25 (note this URL uses UUID that GBIF has as occurrenceID). Occurrence is identified as _Ophidion genyopus_ (Ogilby, 1897).\n\n## People with ORCIDs\n\nXingyue Liu http://orcid.org/0000-0002-9168-0659 (e.g., ASMEG469-09 )\n\nRodolphe Rougerie http://orcid.org/0000-0003-0937-2815 \n\nBrian Fisher http://orcid.org/0000-0002-4653-3270\n\nDouglas C. Currie http://orcid.org/0000-0002-9217-255X (no pubs)\n\nMateus Pepinelli http://orcid.org/0000-0002-9815-4774\n\nAleksandra Panyutina http://orcid.org/0000-0002-8379-8526\n\nDaniel H. Janzen http://orcid.org/0000-0002-7335-5107\n\n\n## Fetch data\n\n\n### Gotchas\n\n#### File encoding\nThe file iBOL_phase_0.50_COI.tsv is not UTF-8 encoded, so we need to convert it. For example:\n\n```\niconv -f iso-8859-1 -t utf-8 iBOL_phase_0.50_COI.tsv \u003e iBOL_phase_0.50_COI.tsv.new\nrm iBOL_phase_0.50_COI.tsv\nmv iBOL_phase_0.50_COI.tsv.new iBOL_phase_0.50_COI.tsv\n```\n\n#### Image URLs have awkward characters\nBOLD web site has URLs for images that contain ‘#’ and ‘+’ symbols. These need to be URL encoded.\n\n## Load into MySQL\n\nNot strictly necessary, but helps when investigating the data and generating data for extensions.\n\n```\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_0.50_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_0.75_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_1.00_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_1.25_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_1.50_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_1.75_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_2.0_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_2.25_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_2.50_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_2.75_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase3.0_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_3.25_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_3.50_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_3.75_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_4.00_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_4.25_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_4.50_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_4.75_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_5.00_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_5.25_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_5.50_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_5.75_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_6.00_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_6.25_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\nLOAD DATA LOCAL INFILE ‘/Users/rpage/iBOL_phase_6.50_COI.tsv’ REPLACE INTO TABLE ibol_public IGNORE 1 LINES;\n```\n\n\n### Generate image data\n\n```\nSELECT barcodes.processid, barcodes_api.processid, `barcodes_api`.image_urls, `barcodes_api`.copyright_licenses \nFROM barcodes INNER JOIN barcodes_api ON barcodes.processid = CONCAT(barcodes_api.processid, “.COI-5P”)\nWHERE `barcodes_api`.image_urls \u003c\u003e “”;\n```\n\n### Generate identifications\n\n```\nSELECT barcodes.processid, barcodes.species_reg, `barcodes_api`.species_name, `barcodes_api`.identification_provided_by\nFROM barcodes INNER JOIN barcodes_api ON barcodes.processid = CONCAT(barcodes_api.processid, “.COI-5P”)\nWHERE (`barcodes_api`.species_name \u003c\u003e “”) AND (barcodes.species_reg \u003c\u003e `barcodes_api`.species_name)\n```\n\n\n## Publishing\n\n## Step 1 Create dataset on GBIF\n\nCreate a dataset on GBIF using registry API. The **publishingOrganizationKey** is the publisher UUID that you see in the link to the publisher page: http://www.gbif.org/publisher/92f51af1-e917-49bc-a8ed-014ed3a77bec. You also need a **installationKey** provided by GBIF, and you also need to authenticate the call using your GBIF portal username and password.\n\nhttp://api.gbif.org/v1/dataset\n\nPOST\n\n```javascript\n{\n\t“publishingOrganizationKey”:”92f51af1-e917-49bc-a8ed-014ed3a77bec”,\n\t“installationKey”:”645445d5-177a-475d-b2fe-69d3f6c89498”,\n\t“title”:”International Barcode of Life project (iBOL)”,\n\t“type”:”OCCURRENCE” \n}\n```\nRESPONSE\n\n```javascript\n“040c5662-da76-4782-a48e-cdea1892d14c”\n```\n\nWe now have a UUID (040c5662-da76-4782-a48e-cdea1892d14c) for the dataset, which lives here: http://www.gbif.org/dataset/040c5662-da76-4782-a48e-cdea1892d14c\n\n## Step 2 Create and validate Darwin Core archive\n\nNow we need to create the Darwin Core archive. \nI then generated a meta.xml file, and finally the Darwin Core Archive (DwC-A) (which is simply a zip file):\n\n```\nzip ibol-dwca.zip eml.xml meta.xml occurrences.tsv media.txt\n```\n\nNext we need to check that the DwC-A file is valid using the [Darwin Core Archive Validator](http://tools.gbif.org/dwca-validator/).\n\n## Step 3 Create endpoint\n\nNow we need to tell GBIF where to get the data. In this example, the Darwin Core Archive file is hosted by Github (make sure you link to the raw file).\n\nhttp://api.gbif.org/v1/dataset/040c5662-da76-4782-a48e-cdea1892d14c/endpoint\n\nPOST\n```javascript\n{\n  “type”:”DWC_ARCHIVE”,\n  “url”:”https://dl.dropboxusercontent.com/u/639486/ibol-dwca.zip”\n}\n```\n\nRESPONSE \n\nHTTP 201 Created\n\n```javascript\n131032\n```\n\n## Step 4 Wait\n\nWait for GBIF to index the data… this happens in near real time.\n\n## Step 5 Edit and update\n\nIf the data needs to be tweaked, edit the data, put the new archive where it can be harvested (i.e., the endpoint) and ask GBIF to crawl it again.\n\n```\nhttp://api.gbif.org/v1/dataset/040c5662-da76-4782-a48e-cdea1892d14c/crawl\n\nPOST\n\nResponse\n\nHTTP/1.1 201 Created\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frdmpage%2Fibol-dwca","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frdmpage%2Fibol-dwca","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frdmpage%2Fibol-dwca/lists"}