{"id":15647320,"url":"https://github.com/cmungall/sparqlprog","last_synced_at":"2026-01-07T20:45:19.273Z","repository":{"id":29928241,"uuid":"119122686","full_name":"cmungall/sparqlprog","owner":"cmungall","description":"logic programming with SPARQL","archived":false,"fork":false,"pushed_at":"2023-01-16T13:38:09.000Z","size":481,"stargazers_count":47,"open_issues_count":5,"forks_count":7,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-02-05T00:46:12.641Z","etag":null,"topics":["bioinformatics","datalog","ontology","prolog","rdf","semantic-web","sparql","swi-prolog"],"latest_commit_sha":null,"homepage":"http://www.swi-prolog.org/pack/list?p=sparqlprog","language":"Prolog","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmungall.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-01-27T01:47:27.000Z","updated_at":"2024-11-01T20:59:25.000Z","dependencies_parsed_at":"2023-02-10T03:45:13.704Z","dependency_job_id":null,"html_url":"https://github.com/cmungall/sparqlprog","commit_stats":null,"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmungall%2Fsparqlprog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmungall%2Fsparqlprog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmungall%2Fsparqlprog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmungall%2Fsparqlprog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmungall","download_url":"https://codeload.github.com/cmungall/sparqlprog/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246257755,"owners_count":20748448,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","datalog","ontology","prolog","rdf","semantic-web","sparql","swi-prolog"],"created_at":"2024-10-03T12:18:26.465Z","updated_at":"2026-01-07T20:45:19.246Z","avatar_url":"https://github.com/cmungall.png","language":"Prolog","funding_links":[],"categories":["SPARQL","Benchmarks"],"sub_categories":["SPARQL Applications"],"readme":"# sparqlprog - programming with SPARQL\n\n[![Build Status](https://travis-ci.org/cmungall/sparqlprog.svg?branch=master)](https://travis-ci.org/cmungall/sparqlprog)\n[![Join the chat at https://gitter.im/sparqlprog/Lobby](https://badges.gitter.im/sparqlprog/Lobby.svg)](https://gitter.im/sparqlprog/Lobby?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n[**pack**](http://www.swi-prolog.org/pack/list?p=sparqlprog)\n\nsparqlprog is a programming language and environment that can be used\nto write composable modular building blocks that can be executed as\nfederated SPARQL queries.\n\nExample of use (command line):\n\n```\npl2sparql  -u sparqlprog/ontologies/ebi -u sparqlprog/ontologies/faldo  -s ebi \"\\\n  protein_coding_gene(G), \\\n  location(G,L,B,E,grcm38:'11'), \\\n  B \u003e= 101100523,E =\u003c 101190725, \\\n  orthologous_to(G,H),in_taxon(H,taxon:'9606')\" \\\n  \"h(G,H)\"\n```\n\nThe command passes a *logic program query* to sparqlprog. In this\ncase, the query is a conjunction of conditions involving different\nvariables (each indicated with a leading upper-case letter):\n\n 1. `G` is a *protein coding gene*\n 2. `G` is located on mouse chromosome 11, with an interval bounded by `B` (begin) and `E` (end)\n 3. The interval is within a certain range\n 4. `G` is *homologus to* `H`\n 5. `H` is a human gene (indicated by taxon ID 9606)\n 6. The results are bound to a tuples `h(G,H)` (i.e. two column table)\n\nThis logic query compiles down to a SPARQL query for fetching G and\nH. The query is then executed on the [EBI RDF\nPlatform](https://www.ebi.ac.uk/rdf/services/sparql), giving:\n\n|Mouse Gene|Human Gene|\n|---|---|\n|ensembl:ENSMUSG00000035198|ensembl:ENSG00000131462|\n|ensembl:ENSMUSG00000017167|ensembl:ENSG00000108797|\n|ensembl:ENSMUSG00000044052|ensembl:ENSG00000184451|\n|ensembl:ENSMUSG00000017802|ensembl:ENSG00000141699|\n|ensembl:ENSMUSG00000045007|ensembl:ENSG00000037042|\n|ensembl:ENSMUSG00000035172|ensembl:ENSG00000068137|\n\nHow does this work? The query compilation makes use of pre-defined\nn-ary predicates, such as this one defined in the [faldo\nmodule](https://www.swi-prolog.org/pack/file_details/sparqlprog/prolog/sparqlprog/ontologies/faldo.pl):\n\n```\nlocation(F,L,B,E,R) :-\n  rdf(F,faldo:location,L),\n  begin(L,PB),position(PB,B),reference(PB,R),\n  end(L,PE),position(PE,E),reference(PE,R).\n```\n\nThe `:-` connects a rule head to a rule body. In this case the body is\na conjuncation of goals. Each of these may be defined in their own\nrules. Typically everything bottoms out at a call over a 3-ary\npredicate `rdf(S,P,O)` which maps to a single triple. In this case the vocabulary used for genomic locations is [faldo](https://github.com/OBF/FALDO).\n\nThis approach allows for *composability* of queries. Rather that\nrepeating the same verbose SPARQL each time in different queries,\nreusable modules can be defined.\n\nIn addition to providing a composable language that compiles to\nSPARQL, this package provides a complete turing-complete environment\nfor mixing code and queries in a relational/logic programming\nparadigm. See below for examples.\n\n## Quick Start (for prolog hackers)\n\nSee the [sparqlprog module docs](https://www.swi-prolog.org/pack/file_details/sparqlprog/prolog/sparqlprog.pl)\n\nSee also the [specification](SPECIFICATION.md)\n\n## Quick Start (for Python hackers)\n\nSee the [sparqlprog-python](https://github.com/cmungall/sparqlprog-python) package\n\nThis provides a Python interface to a sparqlprog service\n\nYou can also see demonstration notebooks:\n\n * [Basic SPARQLProg](https://nbviewer.jupyter.org/github/cmungall/sparqlprog-python/blob/master/Notebook_01_Basics.ipynb)\n * [sending programs over the wire](https://nbviewer.jupyter.org/github/cmungall/sparqlprog-python/blob/master/Notebook_02_Programs.ipynb)\n\n## Quick Start (for everyone else)\n\nThere are a variety of ways to use this framework:\n\n * Executing queries on remote services via command line\n * Compiling logic queries to SPARQL queries, for use in another framework\n * Programmatically within a logic program (interleaving remote and local operations)\n * Programmatically from a language like python/javascript, using a __sparqlprog service__\n\nConsult the appropriate section below:\n\n### Running queries from the command line\n\nSee the [examples](./examples/) directory for all command line examples\n\nFirst [install](INSTALL.md), making sure the [bin](bin) directory is\nin your path. This will give you access to the the pl2sparql script.\n\nFor full options, run:\n\n```\npl2sparql --help\n```\n\nNote you should also have a number of convenience scripts in your\npath. For example the `pq-wd` script is simply a shortcut for\n\n```\npl2sparql -s wikidata -u sparqlprog/ontologies/wikidata  ARGS\n```\n\nThis will give you access to a number of convenience predicates such\nas positive_therapeutic_predictor/2 (for drug queries). The `-u`\noption uses the wikidata module, and the `-s` option sets the service\nto the one with handle `dbpedia` (the mapping from a handle to the\nfull service URL is defined in the wikidata module).\n\nThe best way to learn is to look at the [examples/](examples),\ntogether with the corresponding set of rules in\n[prolog/sparqlprog/ontologies](prolog/sparqlprog/ontologies).\n\nFor example [examples/monarch-examples.sh](examples/monarch-examples.sh) has:\n\n```\npq-mi  'label(D,DN),literal_exact_match(DN,\"peroxisome biogenesis disorder\"),\\\n   rdfs_subclass_of(D,C),owl_equivalent_class(C,E),has_phenotype(E,Z)'\\\n   'x(C,CN,E,Z)'\n```\n\nThis finds a disease with a given name, finds equivalent classes of\ntransitive reflexive subclasses, and then finds phenotypes for each\n\n### Compiling logic programs to SPARQL\n\nYou can use pl2sparql (see above for installation) to compile a\nprogram with bindings to a SPARQL query by using the `-C` option. The\nSPARQL query can then be used without any dependence on\nsparqlprog. E.g.\n\n```\npq-ebi -C \"\\\n  protein_coding_gene(G), \\\n  location(G,L,B,E,grcm38:'11'), \\\n  B \u003e= 101100523,E =\u003c 101190725, \\\n  homologous_to(G,H),in_taxon(H,taxon:'9606')\" \\\n  \"h(G,H)\"\n```\n\nwill generate the following SPARQL:\n\n```\nSELECT ?g ?h WHERE {?g \u003chttp://www.w3.org/1999/02/22-rdf-syntax-ns#type\u003e \u003chttp://purl.obolibrary.org/obo/SO_0001217\u003e . ?g \u003chttp://biohackathon.org/resource/faldo#location\u003e ?l . ?l \u003chttp://biohackathon.org/resource/faldo#begin\u003e ?v0 . ?v0 \u003chttp://biohackathon.org/resource/faldo#position\u003e ?b . ?v0 \u003chttp://biohackathon.org/resource/faldo#reference\u003e \u003chttp://rdf.ebi.ac.uk/resource/ensembl/90/mus_musculus/GRCm38/11\u003e . ?l \u003chttp://biohackathon.org/resource/faldo#end\u003e ?v1 . ?v1 \u003chttp://biohackathon.org/resource/faldo#position\u003e ?e . ?v1 \u003chttp://biohackathon.org/resource/faldo#reference\u003e \u003chttp://rdf.ebi.ac.uk/resource/ensembl/90/mus_musculus/GRCm38/11\u003e . FILTER (?b \u003e= 101100523) . FILTER (?e \u003c= 101190725) . ?g \u003chttp://semanticscience.org/resource/SIO_000558\u003e ?h . ?h \u003chttp://purl.obolibrary.org/obo/RO_0002162\u003e \u003chttp://identifiers.org/taxonomy/9606\u003e}\n```\n\nwithOUT executing it remotely\n\nnote: indentation and URI shortening are on the cards for future releases.\n\n### Using a public sparqlprog service\n\nPublic pengines service: https://evening-falls-87315.herokuapp.com/pengine\n\n[Pengines](http://pengines.swi-prolog.org/) is a framework for running logic program environments as a\nweb service. They can be used by clients in any language (client\nlibraries in python, javascript seem to be mature; as well as separate\nprolog clients as well).\n\nSee the docs on the [pengines framework](http://pengines.swi-prolog.org/).\n\nThere is an example of how to contact this service in javascript in\n[bin/sprog-client.js](bin/sprog-client.js). You will need to do a `npm\ninstall pengines`, and change the server URL.\n\nPengines allows the client to send logic programs to the server, and\nthen to invoke them. For example:\n\n```\npengines = require('pengines');\n\npeng = pengines({\n    server: \"https://evening-falls-87315.herokuapp.com/pengine\",\n    ask: \"q(X)\",\n    chunk: 100,\n    sourceText: \"q(X):- (wd ?? continent(X)).\\n\"\n}\n).on('success', handleSuccess).on('error', handleError);\nfunction handleSuccess(result) {\n    console.log('# Results: '+ result.data.length);\n    for (var i = 0; i \u003c result.data.length; i++) {\n        console.log(result.data[i])\n    }\n    if (result.data.length == 0) {\n        console.log(\"No results!\")\n    }\n}\nfunction handleError(result) {\n    console.error(result)\n}\n```\n\nNote that *any* safe subset of prolog can be passed as a program. In\nthis case we are passing a small program:\n\n`q(X):- (wd ?? continent(X))`\n\nThis trivially defines a unary predicate `q/1`. The argument is bound\nto any continent. The `??` is a special infix binary predicate, the\nleft side is the service name and the right side is the query to be\ncompiled.\n\nThe `ask` portion of the javascript will simply pass the query to the\nserver.\n\n### Using a local sparqlprog service\n\nYou can start a sparqlprog service running locally:\n\n    docker run -p 9083:9083 cmungall/sparqlprog\n\n(requires docker)\n\nThis creates a pengines service at http://localhost:9083/pengine\n\nThere is an example of how to contact this service in javascript in\n[sprog-client.js](bin/sprog-client.js). You will need to do:\n\n    npm install pengines\n\n### SWISH\n\nTODO\n\n### Use within logic programs\n\nFor this example, consider writing a music band recommender, based on\nsimilarity of genres. dbpedia has triples linking bands to genres, so\nwe will use that.\n\nWe will write a program\n[dbpedia_rules.pl](examples/dbpedia/dbpedia_rules.pl) that contains\ndefinitions of predicates we will use.\n\nFirst we define a binary predicate that counts the number of bands per genre:\n\n```\ngenre_num_bands(G,Count) :-\n        aggregate_group(count(distinct(B)),[G],(rdf(B,dbont:genre,G),band(B)),Count).\n```\n\nyou can try this with:\n\n`pq-dbpedia -c examples/dbpedia/dbpedia_rules.pl \"genre_num_bands(G,Count)\"`\n\nthis will give results like:\n\n```\nhttp://dbpedia.org/resource/Independent_music,184\nhttp://dbpedia.org/resource/Funky_Club_Music,1\nhttp://dbpedia.org/resource/Ghettotech,2\nhttp://dbpedia.org/resource/Indian_folk_music,1\nhttp://dbpedia.org/resource/Bakersfield_Sound,1\nhttp://dbpedia.org/resource/Punk_Rawk,1\nhttp://dbpedia.org/resource/Go-go,6\nhttp://dbpedia.org/resource/Jazz_pop,3\nhttp://dbpedia.org/resource/Dubstep,74\nhttp://dbpedia.org/resource/Alt.folk,1\nhttp://dbpedia.org/resource/AfroHouse,1\nhttp://dbpedia.org/resource/Electro-disco,1\nhttp://dbpedia.org/resource/Math_Rock,15\n```\n\nwe are doing this because we want to weight band similarity according\nto how rare a genre is. If two bands share the genre of 'independent\nmusic' it is not remarkable, but if two bands share a rarer genre like\n'Ghettotech' then we will weight that higher.\n\nwe can explicitly bind this to dbpedia using `??/2`:\n\n```\nget_genre_num_bands(G,Count) :-\n        ??(dbpedia,genre_num_bands(G,Count)).\n```\n\nwe can define the Information Content (IC) of a genre `G` as `-log2(Pr(G))`:\n\n```\ngenre_ic(G,IC) :-\n        get_genre_num_bands(G,Count),\n        get_num_bands(Total),\n        seval(-log(Count/Total)/log(2), IC).\n```\n\nThis makes use of:\n\n```\n:- table get_num_bands/1.\nget_num_bands(Count) :-\n        ??(dbpedia,num_bands(Count)).\nnum_bands(Count) :-\n        aggregate(count(distinct(B)),band(B),Count).\n```\n\nNote we are tabling (memoizing) the call to fetch the total number of\nbands. This means it will only be called once per sparqlprog session.\n\nFinally we can define a 3-ary predicate that compares any two bands\nand bindings the 3rd arg to a similarity score that is the sum of the\nICs of all genres held in common. (for simplicity, we do not penalize\nunmatched genres, or try to use sub/super genre categories yet):\n\n```\npair_genre_ic(A,B,SumIC) :-\n        get_all_genres(A,SA),\n        get_all_genres(B,SB),\n        ord_intersection(SA,SB,I),\n        aggregate(sum(IC),G^(member(G,I),genre_ic(G,IC)),SumIC).\n```\n\nThis is a normal prolog goal and can be executed in a normal prolog context, or from the command line:\n\n`pq-dbpedia -c examples/dbpedia/dbpedia_rules.pl -e  \"pair_genre_ic(dbr:'Metallica',dbr:'Megadeth',IC)\"`\n\nThe `-e` option tells the script to execute the query directly rather\nthan try and compile everything to a single SPARQL query (this may be\npossible, but could be highly inefficient). It is only when the prolog\nengine executes the `??` goals that a remote SPARQL will be executed.\n\nIf we want to adapt this program to search rather than compare two\ngiven bands, we can modify it slightly so that it does not waste\ncycles querying on bands that have no genres in common:\n\n```\npair_genre_ic(A,B,SumIC) :-\n        get_all_genres(A,SA),\n        ??(dbpedia,has_shared_genre(A,B,_)),\n        get_all_genres(B,SB),\n        ord_intersection(SA,SB,I),\n        aggregate(sum(IC),G^(member(G,I),genre_ic(G,IC)),SumIC).\n```\n\nExample of running this:\n\n`pq-dbpedia -c examples/dbpedia/dbpedia_rules.pl -e  \"pair_genre_ic(dbr:'Voivod_(band)',B,IC),IC\u003e=10\"`\n\nNote this is slow, as it will iterate across each band performing\nqueries to gather stats. There are various approaches to optimizing\nthis, but the core idea here is that the logic can be shuffled back\nand forth between the portion that is compiled to SPARQL and executed\nremotely, and the portion that is executed locally by a logic engine.\n\n### Using a local triplestore\n\nYou can use sparqlprog with any local or remote triplestore that\nsupports the SPARQL protocol. If you have RDF files and want to get\nstarted, here is one quick route (assuming you have docker):\n\n 1. Place your files in [data](examples/data)\n 2. Run `make bg-run`\n\nThis will run blazegraph within a docker container\n\n## Discussion\n\n\nSPARQL provides a declarative way of querying a triplestore. One of\nits limitations is the lack of ability to *compose* queries and reuse\nrepeated patterns across multiple queries. Sparqlprog is an extension\nof SPARQL and a subset of Prolog for relational rule-oriented\nprogramming using SPARQL endpoints.\n\n## Prolog programmers guide\n\nThis package provides a more natural (from a Prolog point of view) interface\nto SPARQL endpoints. There are two layers. The first, lower layer, defines a\nDCG for generating SPARQL queries from a structured term. The second provides\na translation from representation that looks more or less like a Prolog goal\nbuilt from rdf/3 goals (with conjunction, disjunction etc) to a term in the\nterm language understood by the SPARQL DCG.\n\nIn addition, the library provides a mechanism to register known SPARQL endpoints\nso that they can be referred to by a short name, or to enable a query to be\nrun against all registered endpoints.\n\nThe library is based on the idea implemented in Yves Raimond's swic package,\nbut the code has been completely re-implemented.\n\nYou just need SWI Prolog with its Semantic Web libraries.\n\n## Simple usage\n\nThe `(??)/2`  and `(??)/1` operators have a high precedence so that conjuction and disjunctive\nqueries can be written to the right of it without parentheses:\n\n```\n?- rdf_register_prefix(foaf,'http://xmlns.com/foaf/0.1/')\n?- rdf_register_prefix(dbont,'http://dbpedia.org/ontology/')\n?- sparql_endpoint( dbp, 'http://dbpedia.org/sparql/').\n?- debug(sparkle).  % to show queries\n\n?-\tdbp ?? rdf(Class,rdf:type,owl:'Class'), rdf(Instance,rdf:type,Class).\n?- dbp ?? rdf(Person,rdf:type,foaf:'Person'), \n          rdf(Person,foaf:Name,Name),\n          filter(regex('Colt.*',Name)).\n?- dbp ?? rdf(A,rdf:type,dbont:'Photographer'); rdf(A, rdf:type, dbont:'MusicalArtist').\n```\n\n\n## Clause expansion\n\nIf the following clause is defined:\n\n```\ncls(Class) :-\n        rdf(Class,rdf:type,owl:'Class').\n```\n\nThen cls/1 can be used in queries, e.g.\n\n```\n?-  dbp ?? cls(X).\n```\n\nThe cls/1 goal will be expanded.\n\nMore complex goals can be defined; for example, this queries for existential restrictions:\n\n```\nsubclass_of(C,D) :- rdf(C,rdfs:subClassOf,Restr).\nsvf_edge(C,P,D) :-\n        subclass_of(C,Restr),\n        rdf(Restr,owl:onProperty,P),\n        rdf(Restr,owl:someValuesFrom,D).\n```\n\nOnly a subset of prolog can be expanded in this way. Conjunction,\ndisjunction (or multiple clauses), negation are supported. Terminals\nrdf/3, rdf/4, and some predicates from the rdfs library are\nsupported. In future a wider set of constructs may be supported,\ne.g. setof/3.\n\nIt is also possible to use create_sparql_construct/3 and\ncreate_sparl_construct/4 to generate SPARQL queries for a\nlimited subset of pure prolog that can be executed outside\nthe prolog environment - effectively a limited prolog to SPARQL\ncompiler.\n\n## Comparison with SPIN\n\nTODO https://spinrdf.org/\n\n## Credits\n\nThe majority of code in this repo was developed by Samer Abdallah, as\npart of the [sparkle\npackage](http://www.swi-prolog.org/pack/list?p=sparkle). Some of this\ncode came from Yves Raimond's swic package.\n\nExtensions were implemented by Chris Mungall. In particular\n\n - goal rewriting\n - DCG extensions: aggregates, filter operators\n - predicate definitions for vocabularies used by various triplestores (faldo, ebi, wikidata, dbpedia, go, monarch)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmungall%2Fsparqlprog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmungall%2Fsparqlprog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmungall%2Fsparqlprog/lists"}