{"id":20950404,"url":"https://github.com/atomgraph/csv2rdf","last_synced_at":"2025-05-14T03:32:31.800Z","repository":{"id":43557205,"uuid":"160877689","full_name":"AtomGraph/CSV2RDF","owner":"AtomGraph","description":"Streaming, transforming, SPARQL-based CSV to RDF converter. Apache license.","archived":false,"fork":false,"pushed_at":"2023-08-31T19:12:00.000Z","size":83,"stargazers_count":54,"open_issues_count":2,"forks_count":3,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-11-15T04:36:29.390Z","etag":null,"topics":["csv","csv-converter","csv2rdf","docker-image","knowledge-graph","linked-data","open-data","rdf","semantic-web","sparql","streaming","transformation","transformer"],"latest_commit_sha":null,"homepage":"https://hub.docker.com/r/atomgraph/csv2rdf","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AtomGraph.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2018-12-07T21:34:04.000Z","updated_at":"2024-09-10T20:23:34.000Z","dependencies_parsed_at":"2024-02-05T23:46:06.287Z","dependency_job_id":"8621ed9c-77f3-40c3-aec1-2522ddfd54e1","html_url":"https://github.com/AtomGraph/CSV2RDF","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AtomGraph%2FCSV2RDF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AtomGraph%2FCSV2RDF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AtomGraph%2FCSV2RDF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AtomGraph%2FCSV2RDF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AtomGraph","download_url":"https://codeload.github.com/AtomGraph/CSV2RDF/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225275717,"owners_count":17448387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","csv-converter","csv2rdf","docker-image","knowledge-graph","linked-data","open-data","rdf","semantic-web","sparql","streaming","transformation","transformer"],"created_at":"2024-11-19T00:48:21.504Z","updated_at":"2024-11-19T00:48:22.224Z","avatar_url":"https://github.com/AtomGraph.png","language":"Java","readme":"# CSV2RDF\nStreaming, transforming CSV to RDF converter\n\nReads CSV/TSV data as generic CSV/RDF, transforms each row using SPARQL `CONSTRUCT` or `DESCRIBE`, and streams the output triples.\nThe generic CSV/RDF format is based on the minimal mode of [Generating RDF from Tabular Data on the Web](https://www.w3.org/TR/2015/REC-csv2rdf-20151217/#dfn-minimal-mode).\n\nSuch transformation-based approach enables:\n* building resource URIs on the fly\n* fixing/remapping datatypes\n* mapping different groups of values to different RDF structures\n\nCSV2RDF differs from [tarql](https://tarql.github.io) in the way how mapping queries use graph patterns in the `WHERE` clause. tarql queries operate on a table of bindings\n(provided as an implicit `VALUES` block) in which CSV column names become variable names. CSV2RDF generates an intermediary RDF graph for each CSV row (using column names as relative-URI properties)\nthat the `WHERE` patterns explicitly match against.\n\nBuild\n-----\n\n    mvn clean install\n\nThat should produce an executable JAR file `target/csv2rdf-2.0.0-jar-with-dependencies.jar` in which dependency libraries will be included.\n\nUsage\n-----\n\nThe CSV data is read from `stdin`, the resulting RDF data is written to `stdout`.\n\nCSV2RDF is available as a `.jar` as well as a Docker image [atomgraph/csv2rdf](https://hub.docker.com/r/atomgraph/csv2rdf) (recommended).\n\nParameters:\n* `query-file` - a text file with SPARQL 1.1 [`CONSTRUCT`](https://www.w3.org/TR/sparql11-query/#construct) query string\n* `base` - the base URI for the data (also becomes the `BASE` URI of the SPARQL query). Property namespace is constructed by adding `#` to the base URI.\n\nOptions:\n* `-d`, `--delimiter` - value delimiter character, by default `,`.\n* `--max-chars-per-column` - max characters per column value, by default 4096\n* `--input-charset` - CSV input encoding, by default UTF-8\n* `--output-charset` - RDF output encoding, by default UTF-8\n\n_Note that delimiters might have a [special meaning](https://www.tldp.org/LDP/abs/html/special-chars.html) in shell._ Therefore, always enclose them in single quotes, e.g. `';'` when executing CSV2RDF from shell.\n\nIf you want to retrieve the raw CSV/RDF output, use the [identity transform](https://en.wikipedia.org/wiki/Identity_transform) query `CONSTRUCT WHERE { ?s ?p ?o }`.\n\nExample\n-------\n\nCSV data in `parking-facilities.csv`:\n    \n    postDistrict,roadCode,houseNumber,name,FID,long,lat,address,postcode,parkingSpace,owner,parkingType,information\n    1304 København K,24,5,Adelgade 5 p_hus.0,p_hus.0,12.58228733,55.68268042,Adelgade 5,1304,92,Privat,P-Kælder,\"Adelgade 5-7, Q-park.\"\n\n`CONSTRUCT` query in `parking-facilities.rq`:\n\n```sparql\nPREFIX schema:     \u003chttps://schema.org/\u003e \nPREFIX geo:        \u003chttp://www.w3.org/2003/01/geo/wgs84_pos#\u003e \nPREFIX xsd:        \u003chttp://www.w3.org/2001/XMLSchema#\u003e \nPREFIX rdf:        \u003chttp://www.w3.org/1999/02/22-rdf-syntax-ns#\u003e\n\nCONSTRUCT\n{\n    ?parking a schema:ParkingFacility ;\n        geo:lat ?lat ;\n        geo:long ?long ;\n        schema:name ?name ;\n        schema:streetAddress ?address ;\n        schema:postalCode ?postcode ;\n        schema:maximumAttendeeCapacity ?spaces ;\n        schema:additionalProperty ?parkingType ;\n        schema:comment ?information ;\n        schema:identifier ?id .\n}\nWHERE\n{\n    ?parkingRow \u003c#FID\u003e ?id ;\n        \u003c#name\u003e ?name ;\n        \u003c#address\u003e ?address ;\n        \u003c#lat\u003e ?lat_string ;\n        \u003c#postcode\u003e ?postcode ;\n        \u003c#parkingSpace\u003e ?spaces_string ;\n        \u003c#parkingType\u003e ?parkingType ;\n        \u003c#information\u003e ?information ;\n        \u003c#long\u003e ?long_string . \n\n    BIND(URI(CONCAT(STR(\u003c\u003e), ?id)) AS ?parking) # building URI from base URI and ID\n    BIND(xsd:integer(?spaces_string) AS ?spaces)\n    BIND(xsd:float(?lat_string) AS ?lat)\n    BIND(xsd:float(?long_string) AS ?long)\n}\n```\nJava execution from shell:\n\n    cat parking-facilities.csv | java -jar csv2rdf-2.0.0-jar-with-dependencies.jar parking-facilities.rq https://localhost/ \u003e parking-facilities.ttl\n\nAlternatively, Docker execution from shell:\n\n    cat parking-facilities.csv | docker run --rm -i -a stdin -a stdout -a stderr -v \"$(pwd)/parking-facilities.rq\":/tmp/parking-facilities.rq atomgraph/csv2rdf /tmp/parking-facilities.rq https://localhost/ \u003e parking-facilities.ttl\n\nNote that using Docker you need to:\n* [bind](https://docs.docker.com/engine/reference/commandline/run/#attach-to-stdinstdoutstderr--a) `stdin`/`stdout`/`stderr` streams\n* [mount](https://docs.docker.com/storage/volumes/) the query file to the container, and use the filepath from _within the container_ as `query-file`\n\nOutput in `parking-facilities.ttl`:\n\n    \u003chttps://localhost/p_hus.0\u003e \u003chttp://www.w3.org/1999/02/22-rdf-syntax-ns#type\u003e \u003chttps://schema.org/ParkingFacility\u003e .\n    \u003chttps://localhost/p_hus.0\u003e \u003chttp://www.w3.org/2003/01/geo/wgs84_pos#long\u003e \"12.58228733\"^^\u003chttp://www.w3.org/2001/XMLSchema#float\u003e .\n    \u003chttps://localhost/p_hus.0\u003e \u003chttps://schema.org/identifier\u003e \"p_hus.0\" .\n    \u003chttps://localhost/p_hus.0\u003e \u003chttps://schema.org/additionalProperty\u003e \"P-Kælder\" .\n    \u003chttps://localhost/p_hus.0\u003e \u003chttps://schema.org/comment\u003e \"Adelgade 5-7, Q-park.\" .\n    \u003chttps://localhost/p_hus.0\u003e \u003chttps://schema.org/postalCode\u003e \"1304\" .\n    \u003chttps://localhost/p_hus.0\u003e \u003chttp://www.w3.org/2003/01/geo/wgs84_pos#lat\u003e \"55.68268042\"^^\u003chttp://www.w3.org/2001/XMLSchema#float\u003e .\n    \u003chttps://localhost/p_hus.0\u003e \u003chttps://schema.org/streetAddress\u003e \"Adelgade 5\" .\n    \u003chttps://localhost/p_hus.0\u003e \u003chttps://schema.org/name\u003e \"Adelgade 5 p_hus.0\" .\n    \u003chttps://localhost/p_hus.0\u003e \u003chttps://schema.org/maximumAttendeeCapacity\u003e \"92\"^^\u003chttp://www.w3.org/2001/XMLSchema#integer\u003e .\n\nQuery examples\n--------------\n\nMore mapping query examples can be found under [LinkedDataHub](https://github.com/AtomGraph/LinkedDataHub)'s [`northwind-traders`](https://github.com/AtomGraph/LinkedDataHub-Apps/tree/master/demo/northwind-traders/queries/imports) demo app.\n\nPerformance\n-----------\n\nLargest dataset tested so far: 2.8 GB / 3709725 rows of CSV to 21.7 GB / 151348939 triples in under 27 minutes. Hardware: x64 Windows 10 PC with Intel Core i5-7200U 2.5 GHz CPU and 16 GB RAM.\n\nDependencies\n------------\n\n* [Apache Jena](https://jena.apache.org/)\n* [uniVocity-parsers](https://www.univocity.com/pages/univocity_parsers_tutorial)\n* [picocli](https://picocli.info)","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatomgraph%2Fcsv2rdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fatomgraph%2Fcsv2rdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatomgraph%2Fcsv2rdf/lists"}