{"id":27389252,"url":"https://github.com/vvvvalvalval/xml-pull","last_synced_at":"2025-10-18T01:13:32.816Z","repository":{"id":75693663,"uuid":"188970494","full_name":"vvvvalvalval/xml-pull","owner":"vvvvalvalval","description":"Pulling nested Clojure data structures from XML documents, declaratively and efficiently.","archived":false,"fork":false,"pushed_at":"2019-05-28T09:21:54.000Z","size":15,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-03T19:47:49.281Z","etag":null,"topics":["clojure","xml"],"latest_commit_sha":null,"homepage":null,"language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vvvvalvalval.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-05-28T06:47:21.000Z","updated_at":"2024-07-10T15:01:30.000Z","dependencies_parsed_at":"2023-06-07T09:34:48.382Z","dependency_job_id":null,"html_url":"https://github.com/vvvvalvalval/xml-pull","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vvvvalvalval/xml-pull","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vvvvalvalval%2Fxml-pull","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vvvvalvalval%2Fxml-pull/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vvvvalvalval%2Fxml-pull/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vvvvalvalval%2Fxml-pull/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vvvvalvalval","download_url":"https://codeload.github.com/vvvvalvalval/xml-pull/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vvvvalvalval%2Fxml-pull/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278160418,"owners_count":25940190,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-03T02:00:06.070Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","xml"],"created_at":"2025-04-13T19:13:32.132Z","updated_at":"2025-10-03T12:23:25.684Z","avatar_url":"https://github.com/vvvvalvalval.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"# xml-pull\n\nA declarative, data-based query language and effiencient query engine for pulling Clojure data structures out of XML documents.\n\n**Status:** alpha quality - breaking changes are possible.\n\n## Rationale\n\nAt the time of writing, extracting information from XML documents in Clojure is impractical on several accounts.\n There's a semantic mismatch between XML trees and the EDN-ish associative data structures favoured by Clojure's standard library.\n [clojure.zip](https://clojure.github.io/clojure/clojure.zip-api.html) is the most common way of querying XML in Clojure,\n but it's not very efficient, and Zippers don't address all information extraction needs very well (they're good at diving\n deeply in an XML document for fetching one type of information at a time, not so much extracting and re-bundling several\n types of information); Zippers are also not very declarative, as they're predicate-based rather than data-based.\n\nOn the other hand, Clojure and its ecosystem offer very good facilities for dealing with EDN-style data.\n This suggests making a library which sole purpose is to bridge the gap between XML documents and associative data structures.\n\n`xml-pull` does so by:\n\n1. providing a high-level query language, inspired by [Datomic Pull](https://docs.datomic.com/on-prem/pull.html),\n for declaring how to traverse an XML tree and what data to collect along the way;\n2. making this query language based on data structures, for programmability;\n3. providing an efficient execution engine for this language, in particular via ahead-of-time compilation;\n4. limiting itself to very basic capabilities for validating / re-shaping the data: once you have you data in EDN-style data structures,\n you have a wealth of existing solutions for that, therefore `xml-pull` will do no more than giving you the opportunity\n to use them without compromising on performance;\n5. not taking care of text parsing - there's already [clojure.xml](https://clojuredocs.org/clojure.xml/parse) and [clojure.data.xml](https://github.com/clojure/data.xml) for that,\n so `xml-pull` deals with the output of those.\n\n\n## Usage\n\nImagine you start from the following input:\n\n```xml\n\u003c?xml version=\"1.0\" encoding=\"UTF-8\"?\u003e\n\u003cperson id=\"a9b051d2-4499-484d-a0eb-f4f672f88180\"\u003e\n  \u003cfirst-name\u003eJohn\u003c/first-name\u003e\n  \u003clast-name\u003eDoe\u003c/last-name\u003e\n  \u003ccontact-infos\u003e\n    \u003ccontact-info type=\"email\"\u003ejohn.doe@yahoo.com\u003c/contact-info\u003e\n    \u003ccontact-info type=\"phone-number\"\u003e+33 4 56 09 12 88\u003c/contact-info\u003e\n  \u003c/contact-infos\u003e\n  \u003chobbies\u003e\n    \u003chobby\u003eGuitar\u003c/hobby\u003e\n    \u003chobby\u003eRock-climbing\u003c/hobby\u003e\n    \u003chobby\u003eGardening\u003c/hobby\u003e\n  \u003c/hobbies\u003e\n  \u003caddress\u003e\n    \u003ccity\u003eBabylon\u003c/city\u003e\n  \u003c/address\u003e\n\u003c/person\u003e\n```\n\nYou would like the following output:\n\n```clojure\n{:person/id \"a9b051d2-4499-484d-a0eb-f4f672f88180\"\n :person/first-name \"John\"\n :person/last-name \"Doe\"\n :person/email \"john.doe@yahoo.com\"\n :person/phone-number \"+33 4 56 09 12 88\"\n :person/hobbies [\"Guitar\" \"Rock-climbing\" \"Gardening\"]}\n```\n\nYou do so by defining a _query_ for extracting the desired information.\n An xml-pull query is a data structure declaring how to traverse an XML document and what data to collect along the way.\n xml-pull queries are a bit verbose as data structures, so for convenience we'll define this one with the small functional\n DSL provided by xml-pull:\n\n```clojure\n(require '[xml-pull.query-dsl :as xpd])\n\n(def person-info-query\n  (xpd/query\n    [(xpd/to-attr \"id\" (xpd/as-key :person/id))\n     (xpd/to-tag-content-1 \"first-name\" :person/first-name)\n     (xpd/to-tag-content-1 \"last-name\" :person/last-name)\n     (xpd/to-tag \"contact-infos\" xpd/no-key\n       [(xpd/to-tag-with-attr \"contact-info\" \"type\" \"email\" xpd/no-key\n          [(xpd/to-content-1 (xpd/as-key :person/email))])\n        (xpd/to-tag-with-attr \"contact-info\" \"type\" \"phone-number\" xpd/no-key\n          [(xpd/to-content-1 (xpd/as-key :person/phone-number))])])\n     (xpd/to-tag \"hobbies\" xpd/no-key\n       [(xpd/to-tag \"hobby\" xpd/tag-many (xpd/as-key :person/hobbies)\n          [(xpd/to-content-1 (xpd/as-key :hobby/name))]\n          (xpd/post-process #(mapv :hobby/name %)))])]))\n```\n\nWe can now _compile_ that query into a function:\n\n```clojure\n(require '[xml-pull.engines.jvm-default :as xpe])\n\n(def pull-person-info\n  (xpe/compile {} person-info-query))\n```\n\nThis function can now be called on the output of e.g `clojure.data.xml/parse-str`:\n\n```clojure\n(require '[clojure.data.xml :as xml])\n\n(pull-person-info\n  (xml/parse-str\n    (slurp \"person_john-doe.xml\")))\n=\u003e {:person/id \"a9b051d2-4499-484d-a0eb-f4f672f88180\"\n    :person/first-name \"John\"\n    :person/last-name \"Doe\"\n    :person/email \"john.doe@yahoo.com\"\n    :person/phone-number \"+33 4 56 09 12 88\"\n    :person/hobbies [\"Guitar\" \"Rock-climbing\" \"Gardening\"]\n    :xml-pull.result/errors []}\n```\n\nNotice the `:xml-pull.result/errors` key in the result.\n By default, if an Exception is thrown during processing, `xml-pull` will put it here\n instead of aborting the query; this allows you to deal with partial failure.\n\nFor transparency, here's the `person-info-query` query we defined with the DSL:\n\n```clojure\nperson-info-query\n=\u003e\n#:xml-pull.query{:paths [#:xml-pull.path{:type :xml-pull.path-type/attr, :attr \"id\", :key :person/id}\n                         #:xml-pull.path{:type :xml-pull.path-type/content-tag,\n                                         :tag \"first-name\",\n                                         :no-key true,\n                                         :query #:xml-pull.query{:paths [#:xml-pull.path{:type :xml-pull.path-type/content-1,\n                                                                                         :key :person/first-name}]}}\n                         #:xml-pull.path{:type :xml-pull.path-type/content-tag,\n                                         :tag \"last-name\",\n                                         :no-key true,\n                                         :query #:xml-pull.query{:paths [#:xml-pull.path{:type :xml-pull.path-type/content-1,\n                                                                                         :key :person/last-name}]}}\n                         #:xml-pull.path{:type :xml-pull.path-type/content-tag,\n                                         :tag \"contact-infos\",\n                                         :no-key true,\n                                         :query #:xml-pull.query{:paths [#:xml-pull.path{:type :xml-pull.path-type/content-tag-with-attr,\n                                                                                         :tag \"contact-info\",\n                                                                                         :attr \"type\",\n                                                                                         :attr-value \"email\",\n                                                                                         :no-key true,\n                                                                                         :query #:xml-pull.query{:paths [#:xml-pull.path{:type :xml-pull.path-type/content-1,\n                                                                                                                                         :key :person/email}]}}\n                                                                         #:xml-pull.path{:type :xml-pull.path-type/content-tag-with-attr,\n                                                                                         :tag \"contact-info\",\n                                                                                         :attr \"type\",\n                                                                                         :attr-value \"phone-number\",\n                                                                                         :no-key true,\n                                                                                         :query #:xml-pull.query{:paths [#:xml-pull.path{:type :xml-pull.path-type/content-1,\n                                                                                                                                         :key :person/phone-number}]}}]}}\n                         #:xml-pull.path{:type :xml-pull.path-type/content-tag,\n                                         :tag \"hobbies\",\n                                         :no-key true,\n                                         :query #:xml-pull.query{:paths [{:xml-pull.path/type :xml-pull.path-type/content-tag,\n                                                                          :xml-pull.path/tag \"hobby\",\n                                                                          :xml-pull.tag/cardinality :tag.cardinality/many,\n                                                                          :xml-pull.path/key :person/hobbies,\n                                                                          :xml-pull.path/query #:xml-pull.query{:paths [#:xml-pull.path{:type :xml-pull.path-type/content-1,\n                                                                                                                                        :key :hobby/name}]},\n                                                                          :xml-pull/post-process-fn #object[xml_pull.query_test$fn__2449\n                                                                                                            0x51731b6a\n                                                                                                            \"xml_pull.query_test$fn__2449@51731b6a\"]}]}}]}\n```\n\n## License\n\nCopyright © 2019 Valentin Waeselynck and contributors\n\nDistributed under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvvvvalvalval%2Fxml-pull","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvvvvalvalval%2Fxml-pull","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvvvvalvalval%2Fxml-pull/lists"}