Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bloomberg/pycsvw

A tool to read CSV files with CSVW metadata and transform them into other formats.
https://github.com/bloomberg/pycsvw

csv csvw data-transformation rdf

Last synced: 2 months ago
JSON representation

A tool to read CSV files with CSVW metadata and transform them into other formats.

Awesome Lists containing this project

README

        

# pycsvw

Python implementation of a *variant* of the [W3C CSV on the Web specification](http://w3c.github.io/csvw/), primarily for
efficient RDF and JSON generation from a CSV file and its metadata. The supported variant of the recommendation has some additional features, mostly around specifying RDF to be an ordered container, and also some restrictions as listed below.

## Features:
1. Specify a cell to have a rdf:List-valued object. See [rdf:List valued objects for a cell](../master/docs/RdfListCell.md) for details.
2. For data types of time, date and dateTime, cell values that are recognized by [dateutil parser](https://dateutil.readthedocs.io/en/stable/parser.html) are accepted.

## Restrictions:
1. CSV metadata can be specified only through a separate JSON file.
2. Only minimal_mode is supported.
3. CSV file has to have a single header row.
4. The attribute "format" is ignored for any data type except "boolean". Value for any cell should be valid XSD value for XSD data types. However, for date, time and dateTime, values recognized by [dateutil parser](https://dateutil.readthedocs.io/en/stable/parser.html) are accepted.

All outputs are generated in UTF-8 encoding.

For implementation details, see [details](../master/docs/Implementation.md).
## Usage

```
$ pycsvw --help
Usage: pycsvw [OPTIONS]

Command line interface for pycsvw.

Options:
--csv-url TEXT URL of the CSVW
--csv-path TEXT System path to the CSVW
--metadata-url TEXT URL of the CSVW metadata
--metadata-path TEXT System path to the CSVW metadata
--json-dest TEXT Destination of the JSON file to generate
--rdf-dest TEXT... Pair of format and destination path of RDF e.g.
'turtle out.ttl'
--temp-dir TEXT Use as the temporary folder for (intermediate) nt
serialization
--riot-path TEXT The path to the riot command e.g.
'/usr/bin/jena/bin/riot'
--help Show this message and exit.
```

## Example run

```
pycsvw --csv-path tests/examples/tree-ops-ext.csv --metadata-path tests/examples/tree-ops-ext.csv-metadata.json --rdf-dest turtle test.ttl
```
generates a `test.ttl` containing:
```
@prefix schema: .
@prefix rr: .
@prefix grddl: .
@prefix wdr: .
@prefix duv: .
@prefix owl: .
@prefix xhv: .
@prefix xsd: .
@prefix dqv: .
@prefix skos: .
@prefix rdfs: .
@prefix rif: .
@prefix sd: .
@prefix qb: .
@prefix oa: .
@prefix ma: .
@prefix xml: .
@prefix og: .
@prefix rdfa: .
@prefix dcterms: .
@prefix dcat: .
@prefix wrds: .
@prefix prov: .
@prefix foaf: .
@prefix csvw: .
@prefix sioc: .
@prefix dctypes: .
@prefix cc: .
@prefix rev: .
@prefix void: .
@prefix skosxl: .
@prefix org: .
@prefix vcard: .
@prefix gr: .
@prefix dc11: .
@prefix as: .
@prefix ical: .
@prefix rdf: .
@prefix v: .
@prefix ldp: .
@prefix ctag: .
@prefix dc: .


" included bark" , "cavity or decay" , " trunk decay" , " root decay" , " codominant leaders" , " large leader or limb decay" , " beware of BEES" , " previous failure root damage" ;

29 ;

"2010-06-01"^^xsd:date ;

"-122.156299,37.441151"^^rdf:XMLLiteral ;

"ADDISON AV" ;

true ;

"Robinia pseudoacacia" ;

"Large Tree Routine Prune"@en .


11 ;

"2010-06-02"^^xsd:date ;

"-122.156749,37.440958"^^rdf:XMLLiteral ;

"EMERSON ST" ;

false ;

"Liquidambar styraciflua" ;

"Large Tree Routine Prune"@en .


11 ;

"2010-10-18"^^xsd:date ;

"-122.156485,37.440963"^^rdf:XMLLiteral ;

"ADDISON AV" ;

false ;

"Celtis australis" ;

"Large Tree Routine Prune"@en .
```