{"id":26294605,"url":"https://github.com/afiore/tripleloop","last_synced_at":"2025-08-05T02:06:20.125Z","repository":{"id":7072330,"uuid":"8359027","full_name":"afiore/tripleloop","owner":"afiore","description":"Simple Ruby utility for extracting RDF statements from hash-like objects","archived":false,"fork":false,"pushed_at":"2013-03-05T17:19:11.000Z","size":192,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-21T14:04:05.595Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/afiore.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-02-22T14:39:09.000Z","updated_at":"2013-12-07T01:51:39.000Z","dependencies_parsed_at":"2022-08-20T08:31:06.826Z","dependency_job_id":null,"html_url":"https://github.com/afiore/tripleloop","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/afiore/tripleloop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiore%2Ftripleloop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiore%2Ftripleloop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiore%2Ftripleloop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiore%2Ftripleloop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/afiore","download_url":"https://codeload.github.com/afiore/tripleloop/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiore%2Ftripleloop/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268820501,"owners_count":24312402,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-05T02:00:12.334Z","response_time":2576,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-15T03:31:10.384Z","updated_at":"2025-08-05T02:06:20.077Z","avatar_url":"https://github.com/afiore.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tripleloop\n\nA DSL for extracting data from hash-like objects into RDF statements (i.e. triples or quads).\n\n## Usage\n\nStart by creating some extractor classes. Each extractor maps one or several document fragments\nto RDF statments.\n\n```ruby\nclass ArticleCoreExtractor \u003c Tripleloop::Extractor\n  bind(:doi) { |doc| RDF::DOI.send(doc[:doi]) }\n\n  map(:title)          { |title|   [doi, RDF::DC11.title, title, RDF::NPGG.articles] }\n  map(:published_date) { |date |   [doi, RDF::DC11.date, Date.parse(date), RDF::NPGG.articles] }\n  map(:product)        { |product| [doi, RDF::NPG.product, RDF::NPGP.nature, RDF::NPGG.articles] }\nend\n\nclass SubjectsExtractor \u003c Tripleloop::Extractor\n  bind(:doi) { |doc| RDF::DOI.send(doc[:doi]) }\n\n  map(:subjects) { |subjects|\n    subjects.map { |s|\n      [doi, RDF::NPG.hasSubject, RDF::NPGS.send(s) ]\n    }\n  }\nend\n```\n\nOnce defined, extractors can be composed into a DocumentProcessor class.\n\n```ruby\nclass NPGProcessor \u003c Tripleloop::DocumentProcessor\n  extractors :article_core, :subjects\nend\n```\n\nThe processor can then be fed with a collection of hash like documents and return RDF data grouped by\nextractor name.\n\n```ruby\ndata = NPGProcessor.batch_process(documents)\n=\u003e { :article_core =\u003e [[\u003cRDF::URI:0x00000002651ce0(http://dx.doi.org/10.1038/481241e)\u003e, \n                        \u003cRDF::URI:0x1b0c060(http://purl.org/dc/elements/1.1/title)\u003e, \n                       \"Developmental biology: Watching cells die in real time\"],...], \n     :subjects =\u003e [...] }\n```\n\nNotice that the output retuned by the `batch_process` method is still a plain ruby data structure, and not an instance of RDF::Statement.\nThe actual job of instantiating RDF statements and writing them to disc is in fact responsability of the `Tripleloop::RDFWriter` class, which can be used as follows:\n\n```ruby\nTripleloop::RDFWriter.new(data, :dataset_path =\u003e Pathname.new(\"my-datasets\")).write\n```\n\nThis will create the following two files:\n\n- `my-dataset/article_core.nq`\n- `my-dataset/subjects.nq`\n\nWhen `#write` method is executed, `RDFWriter` will internally generate RDF triples, delegating the RDF serialisation job to RDF.rb's [`RDF::Writer`](http://rubydoc.info/github/ruby-rdf/rdf/master/RDF/Writer).\nThe only logic involved in the implementation of `Tripleloop::RDFWriter#write` concerns the assignment of the right RDF serialisation format and file extension. When all the RDF statements\ngenerated by an extractor do specify also a graph (as in the example above), the writer will use the `RDF::NQuads::Writer`, falling back to `RDF::NTriples::Writer` otherwise.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fafiore%2Ftripleloop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fafiore%2Ftripleloop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fafiore%2Ftripleloop/lists"}