https://github.com/afiore/tripleloop

Simple Ruby utility for extracting RDF statements from hash-like objects
https://github.com/afiore/tripleloop

Last synced: 10 months ago
JSON representation

Simple Ruby utility for extracting RDF statements from hash-like objects

Host: GitHub
URL: https://github.com/afiore/tripleloop
Owner: afiore
License: mit
Created: 2013-02-22T14:39:09.000Z (over 13 years ago)
Default Branch: master
Last Pushed: 2013-03-05T17:19:11.000Z (over 13 years ago)
Last Synced: 2025-07-21T14:04:05.595Z (11 months ago)
Language: Ruby
Homepage:
Size: 188 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Tripleloop

A DSL for extracting data from hash-like objects into RDF statements (i.e. triples or quads).

## Usage

Start by creating some extractor classes. Each extractor maps one or several document fragments

to RDF statments.

```ruby

class ArticleCoreExtractor < Tripleloop::Extractor

  bind(:doi) { |doc| RDF::DOI.send(doc[:doi]) }

  map(:title)          { |title|   [doi, RDF::DC11.title, title, RDF::NPGG.articles] }

  map(:published_date) { |date |   [doi, RDF::DC11.date, Date.parse(date), RDF::NPGG.articles] }

  map(:product)        { |product| [doi, RDF::NPG.product, RDF::NPGP.nature, RDF::NPGG.articles] }

end

class SubjectsExtractor < Tripleloop::Extractor

  bind(:doi) { |doc| RDF::DOI.send(doc[:doi]) }

  map(:subjects) { |subjects|

    subjects.map { |s|

      [doi, RDF::NPG.hasSubject, RDF::NPGS.send(s) ]

    }

  }

end

```

Once defined, extractors can be composed into a DocumentProcessor class.

```ruby

class NPGProcessor < Tripleloop::DocumentProcessor

  extractors :article_core, :subjects

end

```

The processor can then be fed with a collection of hash like documents and return RDF data grouped by

extractor name.

```ruby

data = NPGProcessor.batch_process(documents)

=> { :article_core => [[, 

                        , 

                       "Developmental biology: Watching cells die in real time"],...], 

     :subjects => [...] }

```

Notice that the output retuned by the `batch_process` method is still a plain ruby data structure, and not an instance of RDF::Statement.

The actual job of instantiating RDF statements and writing them to disc is in fact responsability of the `Tripleloop::RDFWriter` class, which can be used as follows:

```ruby

Tripleloop::RDFWriter.new(data, :dataset_path => Pathname.new("my-datasets")).write

```

This will create the following two files:

- `my-dataset/article_core.nq`

- `my-dataset/subjects.nq`

When `#write` method is executed, `RDFWriter` will internally generate RDF triples, delegating the RDF serialisation job to RDF.rb's [`RDF::Writer`](http://rubydoc.info/github/ruby-rdf/rdf/master/RDF/Writer).

The only logic involved in the implementation of `Tripleloop::RDFWriter#write` concerns the assignment of the right RDF serialisation format and file extension. When all the RDF statements

generated by an extractor do specify also a graph (as in the example above), the writer will use the `RDF::NQuads::Writer`, falling back to `RDF::NTriples::Writer` otherwise.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/afiore/tripleloop

Awesome Lists containing this project

README