https://github.com/afiore/tripleloop
Simple Ruby utility for extracting RDF statements from hash-like objects
https://github.com/afiore/tripleloop
Last synced: 10 months ago
JSON representation
Simple Ruby utility for extracting RDF statements from hash-like objects
- Host: GitHub
- URL: https://github.com/afiore/tripleloop
- Owner: afiore
- License: mit
- Created: 2013-02-22T14:39:09.000Z (over 13 years ago)
- Default Branch: master
- Last Pushed: 2013-03-05T17:19:11.000Z (over 13 years ago)
- Last Synced: 2025-07-21T14:04:05.595Z (11 months ago)
- Language: Ruby
- Homepage:
- Size: 188 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Tripleloop
A DSL for extracting data from hash-like objects into RDF statements (i.e. triples or quads).
## Usage
Start by creating some extractor classes. Each extractor maps one or several document fragments
to RDF statments.
```ruby
class ArticleCoreExtractor < Tripleloop::Extractor
bind(:doi) { |doc| RDF::DOI.send(doc[:doi]) }
map(:title) { |title| [doi, RDF::DC11.title, title, RDF::NPGG.articles] }
map(:published_date) { |date | [doi, RDF::DC11.date, Date.parse(date), RDF::NPGG.articles] }
map(:product) { |product| [doi, RDF::NPG.product, RDF::NPGP.nature, RDF::NPGG.articles] }
end
class SubjectsExtractor < Tripleloop::Extractor
bind(:doi) { |doc| RDF::DOI.send(doc[:doi]) }
map(:subjects) { |subjects|
subjects.map { |s|
[doi, RDF::NPG.hasSubject, RDF::NPGS.send(s) ]
}
}
end
```
Once defined, extractors can be composed into a DocumentProcessor class.
```ruby
class NPGProcessor < Tripleloop::DocumentProcessor
extractors :article_core, :subjects
end
```
The processor can then be fed with a collection of hash like documents and return RDF data grouped by
extractor name.
```ruby
data = NPGProcessor.batch_process(documents)
=> { :article_core => [[,
,
"Developmental biology: Watching cells die in real time"],...],
:subjects => [...] }
```
Notice that the output retuned by the `batch_process` method is still a plain ruby data structure, and not an instance of RDF::Statement.
The actual job of instantiating RDF statements and writing them to disc is in fact responsability of the `Tripleloop::RDFWriter` class, which can be used as follows:
```ruby
Tripleloop::RDFWriter.new(data, :dataset_path => Pathname.new("my-datasets")).write
```
This will create the following two files:
- `my-dataset/article_core.nq`
- `my-dataset/subjects.nq`
When `#write` method is executed, `RDFWriter` will internally generate RDF triples, delegating the RDF serialisation job to RDF.rb's [`RDF::Writer`](http://rubydoc.info/github/ruby-rdf/rdf/master/RDF/Writer).
The only logic involved in the implementation of `Tripleloop::RDFWriter#write` concerns the assignment of the right RDF serialisation format and file extension. When all the RDF statements
generated by an extractor do specify also a graph (as in the example above), the writer will use the `RDF::NQuads::Writer`, falling back to `RDF::NTriples::Writer` otherwise.