{"id":39497676,"url":"https://github.com/clarin-eric/ieee-metadata-hackathon","last_synced_at":"2026-01-18T05:43:47.607Z","repository":{"id":144999122,"uuid":"116280970","full_name":"clarin-eric/ieee-metadata-hackathon","owner":"clarin-eric","description":"CLARIN hackathon track @ IEEE BDGMM workshop, March 2018, Berlin","archived":false,"fork":false,"pushed_at":"2018-02-28T12:53:27.000Z","size":11911,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-09-10T03:14:51.796Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clarin-eric.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2018-01-04T16:17:05.000Z","updated_at":"2018-02-13T14:37:23.000Z","dependencies_parsed_at":"2023-04-24T22:55:18.830Z","dependency_job_id":null,"html_url":"https://github.com/clarin-eric/ieee-metadata-hackathon","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/clarin-eric/ieee-metadata-hackathon","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fieee-metadata-hackathon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fieee-metadata-hackathon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fieee-metadata-hackathon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fieee-metadata-hackathon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clarin-eric","download_url":"https://codeload.github.com/clarin-eric/ieee-metadata-hackathon/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fieee-metadata-hackathon/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28531368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-18T05:43:47.547Z","updated_at":"2026-01-18T05:43:47.599Z","avatar_url":"https://github.com/clarin-eric.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CLARIN metadata hackathon\n\nPart of the [IEEE Workshop on Big Data Metadata and Management (BDGMM 2018)](https://bigdatawg.nist.gov/bdgmm2018.html)\ntaking place in Berlin, Mar 19-20, 2018 in conjunction with [RDA Plenary 11](https://www.rd-alliance.org/plenaries/rda-eleventh-plenary-meeting-berlin-germany).\nPlease not that registration is required - see the [event page](https://bigdatawg.nist.gov/bdgmm2018.html) for details.\n\n**The information on this page is not yet complete!** More will be added soon, so please check back regularly to find more details, examples and resources.\n\n## Background\n\n[CLARIN](https://www.clarin.eu) makes digital language resources available to scholars, researchers, students and citizen-scientists from all disciplines, especially in the humanities and social sciences. CLARIN harvests metadata (and converts it to CMDI where necessary) a few times per week from many sources. This harvested metadata constitutes the data set for this hackathon. Provided metadata adheres to the CMDI standard - see [clarin.eu/cmdi](https://www.clarin.eu/cmdi). A post-processed, searchable view of this metadata can be found in CLARIN's [Virtual Language Observatory](https://vlo.clarin.eu) (VLO).\n\nMore on IEEE, the organisers of the workshop this hackathon is part of, and big data can be found on the [event page](https://bigdatawg.nist.gov/bdgmm2018.html) and at [bigdata.ieee.org](https://bigdata.ieee.org/).\n\n## Data sets\n\nFor this hackathon, we provide a snapshot of the harvested data at the following location:\n\nhttps://vlo.clarin.eu/resultsets/snapshot/20180105/\n\nThere are several `.tar.bz2` files, which combined provide the content which is presented in the [VLO](https://vlo.clarin.eu). These sets are:\n* [clarin.tar.bz2](https://vlo.clarin.eu/resultsets/snapshot/20180105/clarin.tar.bz2): metadata harvested from [CLARIN centres](https://www.clarin.eu/content/clarin-centres)\n* [europeana.tar.bz2](https://vlo.clarin.eu/resultsets/snapshot/20180105/europeana.tar.bz2): metadata harvested from [Europeana](https://www.europeana.eu/), describing selected digital cultural heritage objects\n* [others.tar.bz2](https://vlo.clarin.eu/resultsets/snapshot/20180105/others.tar.bz2): metadata harvested from other selected sources, describing language resources relevant to humanities and social humanities scholars\n\nIndividual files can also be browsed and accessed via [alpha-vlo.clarin.eu/data/snapshot/20180105](http://alpha-vlo.clarin.eu/data/snapshot/20180105/).\n\n### API endpoints\n\n* [SPARQL](https://www.w3.org/TR/sparql11-query/) endpoint of CMD2RDF: http://147.228.242.24/cmd2rdf/\n  * currently includes only the CMDI 1.1 records of the CLARIN centres\n* [Solr](https://lucene.apache.org/solr/) index of the Virtual Language Observatory ([hackathon instance](http://hackathon.cmdi.clarin.eu/vlo)): `http://hackathon.cmdi.clarin.eu/solr/vlo-index/select`\n  * An [Extended DisMax query parser](https://lucene.apache.org/solr/guide/7_0/the-extended-dismax-query-parser.html) is registered at this URL\n  * Query example: [`select?q=description:ieee\u0026rows=5`](http://hackathon.cmdi.clarin.eu/solr/vlo-index/select?q=description:ieee\u0026rows=5)\n\n### Other data sources\n* CLARIN Component Registry: https://www.clarin.eu/componentregistry \n* CLARIN concept registry: https://concepts.clarin.eu\n* VLO mapping definitions: https://github.com/clarin-eric/VLO-mapping \n\n### Examples\n\nSee the [examples](./examples) folder\n\n## Docker\n\nWe prepared a docker-compose configuration that allows you to quickly set up a private\ninstance of the VLO and to import your own data into it. This might be helpful for\nexperimentation and provide improved performance. See the [docker-compose](./docker-compose)\nfolder for more information.\n\n## Contact\n* [Matej Durco](https://www.oeaw.ac.at/acdh/team/current-team/matej-durco/) (Austrian Centre for Digital Humanities)\n* [Menzo Windhouwer](https://www.clarin.eu/person/menzo-windhouwer) (CLARIN ERIC/Meertens Instituut)\n* [Twan Goosen](https://www.clarin.eu/person/twan-goosen) (CLARIN ERIC)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclarin-eric%2Fieee-metadata-hackathon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclarin-eric%2Fieee-metadata-hackathon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclarin-eric%2Fieee-metadata-hackathon/lists"}