https://github.com/zazuko/xrm-csvw-workflow
All you need to convert CSV files to RDF in a declarative & fully automated way.
https://github.com/zazuko/xrm-csvw-workflow
csv etl-pipeline pipeline rdf
Last synced: about 2 months ago
JSON representation
All you need to convert CSV files to RDF in a declarative & fully automated way.
- Host: GitHub
- URL: https://github.com/zazuko/xrm-csvw-workflow
- Owner: zazuko
- License: other
- Created: 2020-06-25T09:38:13.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-02-13T02:42:56.000Z (over 2 years ago)
- Last Synced: 2025-04-15T02:09:32.509Z (about 2 months ago)
- Topics: csv, etl-pipeline, pipeline, rdf
- Language: JavaScript
- Homepage:
- Size: 782 KB
- Stars: 5
- Watchers: 4
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# CSV to RDF conversion template project
This repository provides all you need to convert CSV files to RDF. It contains:
- A sample CSV file
- A sample XRM mapping that generates CSVW (CSV on the web) mapping files
- A pipeline that converts the input CSV to RDF
- A default GitHub Action configuration that runs the pipeline and creates an artifact for downloadThis is a GitHub template repository. It will not be declared as "fork" once you click on the `Use this template` button above. Simply do that, start adding your data sources and adjust the XRM mapping accordingly:
1. Create/adjust the XRM files in the `mappings` directory.
2. Copy source CSVs to `input` directory.
3. Execute one of the run-scripts to convert your data.Make sure to commit the `input`, `mappings` and `src-gen` directories if you want to build it using GitHub Actions.
See [Further reading](#further-reading) for more information about the XRM mapping language.
## Run the pipelineThe default pipeline can be run with `npm start` or `npm run to-file`. It will:
- Read the CSVW input files
- Convert it to RDF
- Write it into a file as N-Triples (default: `output/transformed`)There are additional pipelines configured in `package.json`:
* `file-to-store`: Uploads the generated output file to an RDF store via SPARQL Graph Store Protocol
* `to-store(-dev)`: Directly uploads to an RDF store (direct streaming in the pipeline) via SPARQL Graph Store ProtocolIf you want to test the upload to an RDF store, a default [Apache Jena Fuseki](https://jena.apache.org/index.html) installation with a database `data` on port `3030` should work out of the box.
Pipeline configuration is done via environment variables and/or adjusting default variables in the pipeline itself. If you want to pass another default, have a look at the `--variable=XYZ` samples in `package.json` or consult the [barnard59 documentation](https://github.com/zazuko/barnard59#passing-arguments-to-the-pipeline). If you want to adjust it in the pipeline, open the file [pipelines/main.ttl](pipelines/main.ttl) and edit ` ...`.
## barnard59 RDF pipelines
This template is built on top of our [Zazuko](https://zazuko.com/) [barnard59](https://github.com/zazuko/barnard59) pipelining system. It is a [Node.js](https://nodejs.org) based, fully configurable pipeline framework aimed at creating RDF data out of various data sources. Unlike many other data pipelining systems, barnard59 is configured instead of programmed. In case you need to do pre- or post-processing, you can implement additional pipeline steps written in JavaScript.
barnard59 is streaming and can be used to convert very large data sets with a small memory footprint.
## Other template repositories
We provide additional template repositories:
* [xrm-r2rml-workflow](https://github.com/zazuko/xrm-r2rml-workflow): A template repository for converting complete relational databases to RDF using the R2RML specification and Ontop as mapper.
* xrm-xml-workflow: TODO## Further reading
* [Expressive RDF Mapping Language (XRM)](https://zazuko.com/products/expressive-rdf-mapper/) and the [documentation](https://github.com/zazuko/expressive-rdf-mapper) for details about the domain-specific language (DSL).
* [CSV on the Web: A Primer](https://www.w3.org/TR/tabular-data-primer/): Introduction to the CSVW mapping language, which is generated by XRM and consumed by barnard59. This is only as a reference, you do not have to learn about it, XRM generates that for you.
* [SPARQL 1.1 Graph Store HTTP Protocol](https://www.w3.org/TR/sparql11-http-rdf-update/): The SPARQL Graph Store specification used to upload data to an RDF store like Apache Jena Fuseki