{"id":25833315,"url":"https://github.com/swirrl/matcha","last_synced_at":"2025-03-17T15:12:15.302Z","repository":{"id":37665612,"uuid":"147434136","full_name":"Swirrl/matcha","owner":"Swirrl","description":":tea: An in memory graph database with SPARQL-like DSL for querying Linked Data Models","archived":false,"fork":false,"pushed_at":"2025-02-01T02:25:10.000Z","size":236,"stargazers_count":22,"open_issues_count":32,"forks_count":0,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-01T00:13:28.894Z","etag":null,"topics":["clojure","datalog","dsl","linked-data","query-engine","rdf","sparql"],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Swirrl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-04T23:55:30.000Z","updated_at":"2024-11-19T18:02:48.000Z","dependencies_parsed_at":"2024-01-18T12:55:35.286Z","dependency_job_id":"894c1649-7fa0-4c1e-b989-53ee3cf8f131","html_url":"https://github.com/Swirrl/matcha","commit_stats":{"total_commits":139,"total_committers":5,"mean_commits":27.8,"dds":"0.28057553956834536","last_synced_commit":"77237a2e4760120fa87ca4b19b028a812dab5ee7"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Swirrl%2Fmatcha","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Swirrl%2Fmatcha/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Swirrl%2Fmatcha/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Swirrl%2Fmatcha/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Swirrl","download_url":"https://codeload.github.com/Swirrl/matcha/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244056425,"owners_count":20390719,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","datalog","dsl","linked-data","query-engine","rdf","sparql"],"created_at":"2025-02-28T22:47:45.890Z","updated_at":"2025-03-17T15:12:15.279Z","avatar_url":"https://github.com/Swirrl.png","language":"Clojure","readme":"# Matcha\n\n[![Clojars Project](https://img.shields.io/clojars/v/grafter/matcha.alpha.svg)](https://clojars.org/grafter/matcha.alpha) [![cljdoc badge](https://cljdoc.org/badge/grafter/matcha.alpha)](https://cljdoc.org/d/grafter/matcha.alpha)\n\nA Clojure DSL to query in memory triple models with a SPARQL like\nlanguage.  Matcha provides simple BGP (Basic Graph Pattern) style\nqueries on in memory graphs of linked data triples.\n\n![Matcha](https://raw.githubusercontent.com/Swirrl/matcha/master/doc/matcha.jpg \"Matcha\")\n\nWhilst Matcha is intended to query RDF models it can also be used to\nquery arbitrary clojure data, so long as it consists of Clojure values\nstored in 3/tuple vectors, each entity of the triple is assumed to\nfollow Clojure value equality semantics.\n\nThe primary use cases for Matcha are to make handling graphs of RDF\ndata easy by querying data with SPARQL-like queries.  A typical\nworkflow is to `CONSTRUCT` data from a backend SPARQL query, and then\nuse Matcha to query this graph locally.\n\n## Features\n\n- SPARQL-like BGP queries across multiple triple patterns.\n- Parameterised queries using just clojure `let`.\n- Ability to index your database, with `index-triples`.  In order to\n  be queried Matcha needs to have indexed the data; if your data is\n  unindexed it will index it before running the query, and then\n  dispose of the index.  This can lead to poor performance when you\n  want to query the same set of data multiple times.\n- Construct graph query results directly into clojure datastructures.\n- Support for `VALUES` clauses (unlike in SPARQL we do not yet support\n  binding arbitrary tuples/tables).  So we only support the\n  `VALUES ?x { ... }` form.\n- Support for `OPTIONAL`s with SPARQL-like semantics.\n\n## Limitations\n\nThe initial implementation is macro heavy.  This means use cases where\nyou want to dynamically create in memory queries may be more awkward.\n\nCurrently there is no support for the following SPARQL-like features:\n\n1. Reasoning on in memory vocabularies with RDFS (maybe OWL)\n2. Clojurescript support (planned)\n\n## Usage\n\nMatcha defines some primary query functions `select`, `select-1`,\n`build`, `build-1`, `construct`, `construct-1` and `ask`.\n\nFirst lets define an in memory database of triples, in reality this\ncould come from a SPARQL query `CONSTRUCT`, but here we'll just define\nsome RDF-like data inline.\n\nTriples can be vectors of clojure values or any datastructure that\nsupports positional destructuring via `clojure.lang.Indexed`, this\nallows Matcha to work `grafter.rdf.protocols.Statement` records.\nMatcha works with any clojure values in the triples, be they java\nURI's, or clojure keywords.\n\n```clojure\n(def friends-db [[:rick :rdfs/label \"Rick\"]\n                 [:martin :rdfs/label \"Martin\"]\n                 [:katie :rdfs/label \"Katie\"]\n                 [:julie :rdfs/label \"Julie\"]\n\n                 [:rick :foaf/knows :martin]\n                 [:rick :foaf/knows :katie]\n                 [:katie :foaf/knows :julie]\n\n                 [:rick :a :foaf/Person]\n                 [:katie :a :foaf/Person]\n                 [:martin :a :foaf/Person]])\n```\n\nNow we can build our query functions:\n\n### General Query Semantics\n\nThere are two main concepts to Matcha queries.  They typically define:\n\n1. a projection, which states what variables to return to your Clojure\nprogram, and the datastructure they should be returned in.\n2. a Basic Graph Pattern (BGP), that defines the pattern of the graph\n   traversal.\n\nBGPs have some semantics you need to be aware of:\n\n- Clojure symbols beginning with a `?` are treated specially as query\n  variables.\n- Other symbols are resolved to their values.\n\n### `build`\n\n`build` always groups returned solutions into a sequence of clojure\nmaps, where the subjects are grouped into maps, and the maps are\ngrouped by their properties. If a property has multiple values they\nwill be rolled up into a set, otherwise they will be a scalar value.\n\nEach map returned by `build` typically represents a resource in the\nbuilt graph, which is projected into a sequence of maps, with\npotentially multi-valued keys.\n\nIt takes a binding for `?subject` of the map, a map form specifying\nthe projection of other property/value bindings a `bgp` and a\ndatabase.\n\n``` clojure\n(build ?person\n       {:foaf/knows ?friends}\n       [[?person :foaf/knows ?friends]]\n       friends-db)\n\n;; =\u003e ({:grafter.rdf/uri :rick, :foaf/knows #{:martin :katie}}\n;;     {:grafter.rdf/uri :katie, :foaf/knows :julie}\n```\n\nNOTE: `:foaf/knows` is projected into a set of values for `:rick`, but\na single scalar value for `:katie`.\n\nThe `?subject` is by default associated with the key\n`:grafter.rdf/uri`. If you wish to specify this key yourself you can\nby providing a key/value pair as the subject: e.g. substituting\n?person for `[:id ?person]` changes the return values like so:\n\n``` clojure\n(build [:id ?person]\n       {:foaf/knows ?friends}\n       [[?person :foaf/knows ?friends]]\n         friends-db)\n;; =\u003e ({:id :rick, :foaf/knows #{:martin :katie}}\n;;     {:id :katie, :foaf/knows :julie}\n```\n\nBecause `build` knows it is always returning a sequence of maps, it\nwill remove any keys corresponding to unbound variables introduced\nthrough optionals.  This is unlike `construct`.\n\n### `select`\n\n`select` compiles a query from your arguments, that returns results as a\nsequence of tuples. It is directly analagous to SPARQL's `SELECT` query.\n\nThe `bgp` argument is analagous to a SPARQL `WHERE` clause and should be\na BGP.\n\nWhen called with one argument, `select` projects all `?qvar`s used in the\nquery.  This is analagous to `SELECT *` in SPARQL:\n\n```clojure\n(def rick-knows\n  (select\n    [[:rick :foaf/knows ?p2]\n    [?p2 :rdfs/label ?name]]))\n\n(rick-knows friends-db)\n;; =\u003e [\"Martin\" \"Katie\"]\n```\n\nWhen called with two arguments `select` expects the first argument to be a\nvector of variables to project into the solution sequence.\n\n```clojure\n(def rick-knows (select [?name]\n                  [[:rick :foaf/knows ?p2]\n                   [?p2 :rdfs/label ?name]]))\n\n(rick-knows friends-db)\n;; =\u003e [\"Martin\" \"Katie\"]\n```\n\nThere is also `select-1` which is just like `select` but returns just\nthe first solution.\n\n### `construct`\n\nNOTE: if you're using you `construct` to return maps, you should first\nconsider using `build` which fixes some issues present in common\n`construct` usage.\n\n`CONSTRUCT`s allow you to construct arbitrary clojure data structures\ndirectly from your query results, and position the projected query\nvariables where ever you want within the projected datastructure\ntemplate.\n\nArgs:\n * `construct-pattern`: an arbitrary clojure data structure. Results\n   will be projected into the `?qvar` \"holes\".\n * `bgps`: this argument is analagous to a SPARQL `WHERE` clause and should be\n   a BGPs.\n * `db-or-idx`: A matcha \"database\".\n\nWhen called with two arguments `construct` returns a query function\nthat accepts a `db-or-idx` as its only argument. When called, the\nfunction returns a sequence of matching tuples in the form of the\n`construct-pattern`.\n\n```clojure\n(construct {:grafter.rdf/uri :rick\n            :foaf/knows {:grafter.rdf/uri ?p\n                         :rdfs/label ?name}}\n  [[:rick :foaf/knows ?p]\n   [?p :rdfs/label ?name]])\n\n;; =\u003e (fn [db-or-idx] ...)\n```\n\nWhen called with 3 arguments, queries the `db-or-idx` directly, returning a\nsequence of results in the form of the `construct-pattern`.\n\n```clojure\n(construct {:grafter.rdf/uri :rick\n            :foaf/knows {:grafter.rdf/uri ?p\n                         :rdfs/label ?name}}\n  [[:rick :foaf/knows ?p]\n   [?p :rdfs/label ?name]]\n  friends-db)\n\n;; =\u003e {:grafter.rdf/uri :rick\n;;     :foaf/knows #{{:grafter.rdf/uri :martin, :rdfs/label \"Martin\"}\n;;                   {:grafter.rdf/uri :katie, :rdfs/label \"Katie\"}}}\n```\n\nMaps in a projection that contain the special key of\n`:grafter.rdf/uri` trigger extra behaviour, and cause the query\nengine to group solutions by subject, and merge values into clojure\nsets.  For example in the above query you'll notice that `foaf:knows`\ngroups its solutions.  If you don't want these maps to be grouped,\ndon't include the magic key `:grafter.rdf/uri` in the top level\nprojection.\n\nThere is also `construct-1` which is just like `construct` but returns\nonly the first solution.\n\nSee the [unit\ntests](https://github.com/Swirrl/matcha/blob/ae2449483d5a7849ac60a3e5b6a29e459d74ad8e/test/grafter/matcha/alpha_test.clj#L113)\nfor more examples, including examples that use Matcha with Grafter\nStatements and vocabularies.\n\n### `ask`\n\n`ask` is the only query that doesn't specify an explicit projection.\nIt accepts a BGP, like the other query types and returns a boolean\nresult if there were any matches found.\n\n```clojure\n(def any-triples? (ask [[?s ?p ?o]])\n\n(any-triples? friends-db) ;; =\u003e true\n```\n\n### Parameterising queries\n\nYou can parameterise Matcha queries simply by adding a lexical binding or wrapping a function call over your Matcha query.  For example\n\n```clojure\n(defn lookup-friends [person-id database]\n  (-\u003e\u003e database\n       (construct {:grafter.rdf/uri ?friend\n                   :name ?name}\n                   [[person-id :foaf/knows ?friend]\n                    [?friend :rdfs/label ?name]]))\n\n(lookup-friends :rick friends-db)\n\n;; =\u003e [{:grafter.rdf/uri :martin, :name \"Martin\"}\n;;     {:grafter.rdf/uri :katie, :name \"Katie\"}]\n```\n\n### OPTIONALs\n\nWe support SPARQL-like `OPTIONAL`s in all query types with the following syntax:\n\n```clojure\n(defn lookup-name [person-id database]\n  (select [?name]\n    [[person-id :a :foaf/Person]\n     (optional [[person :rdfs/label ?name]])\n     (optional [[person :foaf/name ?name]])]))\n```\n\n### VALUEs\n\nWe support dynamic VALUEs clauses in all query types like so:\n\n```clojure\n(defn lookup-names [person-ids database]\n  (select [?name]\n    [(values ?person-id person-ids)\n     [?person-id :rdfs/label ?name]]))\n\n(lookup-names [:rick :katie] friends-db) ;; =\u003e [\"Rick\", \"Katie\"]\n```\n\nYou can also hardcode the values into the query:\n\n```clojure\n(defn lookup-names [person-ids database]\n  (select [?name]\n    [(values ?person-id [:rick :katie])\n     [?person-id :rdfs/label ?name]]))\n```\n\nAny \"flat collection\" (i.e. a `sequential?` or a `set?`) is valid\non the right hand side of a `values` binding.\n\n## Performance\n\nMatcha is intended to be used on modest sizes of data, typically\nthousands of triples, and usually no more than a few hundred thousand\ntriples.  Proper benchmarking hasn't yet been done but finding all\nsolutions on a database of a million triples can be done on a laptop\nin less than 10 seconds.  Query time scaling seems to be roughly\nlinear with the database size.\n\n## Avoiding clj-kondo lint errors with matcha macros\n\nMatcha exports some clj-kondo configuration which prevents clj-kondo\nwarning about unbound variables when using the matcha query macros.\n\nYou can import these configs into your project with the following\ncommand:\n\n```\n$ clj-kondo --copy-configs --dependencies --lint \"$(clojure -Spath)\"\nImported config to .clj-kondo/grafter/matcha.alpha. To activate, add \"grafter/matcha.alpha\" to :config-paths in .clj-kondo/config.edn.\n```\n\nThen simply add the following to `.clj-kondo/config.edn`:\n\n```\n{:config-paths [\"grafter/matcha.alpha\"]}\n```\n\n## Developing Matcha\n\nMatcha uses [`tools.build`](https://clojure.org/guides/tools_build) and\n[`tools.deps`](https://clojure.org/guides/deps_and_cli) for builds,\ndevelopment and testing.\n\nThe command:\n\n```\n$ clojure -T:build test\n```\n\nWill run the tests, whilst\n\n```\n$ clojure -T:build build\n$ clojure -T:build install\n```\n\ncan be used to build and install a jar into your local mvn repository.\n\nHowever for consuming local Matcha changes in local projects you are\nusually better using `tools.deps` `:classpath-overrides`, or creating\na branch and consuming via a `:git/url`.\n\n## Deploying to Clojars\n\nFor [deployments CircleCI is setup](https://github.com/Swirrl/matcha/blob/fafe7478ae605c4cb2a0253714c3bd286e1ca185/.circleci/config.yml#L46-L55)\nto automatically deploy tags of the form `vX.Y.Z` where `X.Y.Z` are\n`major.minor.patch` numbers.  If you have permissions (i.e. you are\na Swirrl developer) the recommended workflow is to create a new\nrelease of the `main` branch in github with a tag that bumps the\nversion number appropriately.\n\n_NOTE_: For this step to work you will need appropriate deployment\nprivileges on clojars.org.\n\n## License\n\nCopyright © Swirrl IT Ltd 2018\n\nDistributed under the Eclipse Public License either version 1.0 or (at\nyour option) any later version.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fswirrl%2Fmatcha","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fswirrl%2Fmatcha","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fswirrl%2Fmatcha/lists"}