{"id":13836528,"url":"https://github.com/igrishaev/remus","last_synced_at":"2025-07-13T10:41:30.222Z","repository":{"id":32869383,"uuid":"144404564","full_name":"igrishaev/remus","owner":"igrishaev","description":"Attentive RSS/Atom feed parser for Clojure","archived":false,"fork":false,"pushed_at":"2023-02-17T10:19:43.000Z","size":408,"stargazers_count":62,"open_issues_count":2,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-17T05:10:50.027Z","etag":null,"topics":["atom","clojure","feed","http","rss"],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/igrishaev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-08-11T17:18:47.000Z","updated_at":"2025-04-04T07:36:12.000Z","dependencies_parsed_at":"2024-01-13T17:00:01.511Z","dependency_job_id":"59628dfb-ce5b-4982-88a6-1fa8a1003da6","html_url":"https://github.com/igrishaev/remus","commit_stats":{"total_commits":45,"total_committers":7,"mean_commits":6.428571428571429,"dds":"0.24444444444444446","last_synced_commit":"97807daf3c05f247915d728e3509fe4588fee145"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrishaev%2Fremus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrishaev%2Fremus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrishaev%2Fremus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrishaev%2Fremus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/igrishaev","download_url":"https://codeload.github.com/igrishaev/remus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250195054,"owners_count":21390230,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atom","clojure","feed","http","rss"],"created_at":"2024-08-04T15:00:48.535Z","updated_at":"2025-07-13T10:41:30.209Z","avatar_url":"https://github.com/igrishaev.png","language":"Clojure","readme":"# Remus\n\n[rome-site]: https://rometools.github.io/rome/\n\n[http-client]: https://github.com/babashka/http-client\n\nAn attentive RSS and Atom feed parser for Clojure. It's built on top of\nwell-known and powerful [ROME Tools][rome-site] Java library. Remus deals with\nweird encoding and non-standard XML tags. The library fetches as much\ninformation from a feed as possible.\n\n![](art/romulus-remus.jpg)\n\n# Table of Contents\n\n\u003c!-- toc --\u003e\n\n- [Benefits](#benefits)\n- [Installation](#installation)\n- [Usage](#usage)\n  * [Parsing a URL](#parsing-a-url)\n  * [Parsing a source](#parsing-a-source)\n- [HTTP tweaks](#http-tweaks)\n  * [Error cases](#error-cases)\n  * [Saving extra data](#saving-extra-data)\n- [Non-standard tags](#non-standard-tags)\n- [Encoding issues](#encoding-issues)\n- [Misc](#misc)\n\n\u003c!-- tocstop --\u003e\n\n## Benefits\n\n- Gets all the known fields from a feed and turns them into plain Clojure data\n  structures;\n- relies on the built-in Java HTTP client (via the [babashka-http][http-client]\n  library);\n- supports HTTP2 protocol;\n- preserves non-standard XML tags for further processing (see an example below).\n\n## Installation\n\nLeiningen/Boot:\n\n```clojure\n[remus \"0.2.5\"]\n```\n\nClojure CLI/deps.edn\n\n```clojure\nremus/remus {:mvn/version \"0.2.5\"}\n```\n\n## Usage\n\nThe library provides a one-word top namespace `remus` so it's easier to\nremember.\n\n```clojure\n(ns your.project\n  (:require [remus]]))\n```\n\n### Parsing a URL\n\nLet's parse [Planet Clojure](http://planet.clojure.in/):\n\n```clojure\n(def result (remus/parse-url \"http://planet.clojure.in/atom.xml\"))\n```\n\nThe variable `result` is a map of two keys: `:response` and `:feed`. These are\nan HTTP response and a parsed feed. Below, there is a truncated version of a\nfeed:\n\n```clojure\n(def feed (:feed result))\n\n(println feed)\n\n;;;;\n;; just a small subset\n;;;;\n\n{:description nil,\n :feed-type \"atom_1.0\"\n :entries\n [{:description nil\n   :updated-date #inst \"2018-08-13T10:00:00.000-00:00\"\n   :extra {:tag :extra, :attrs nil, :content ()}\n   :title\n   \"PurelyFunctional.tv Newsletter 287: DataScript, GraphQL, CRDTs\"\n   :author \"Eric Normand\"\n   :link\n   \"https://purelyfunctional.tv/issues/purelyfunctional-tv-newsletter-287-datascript-graphql-crdts/\"\n   :uri \"https://purelyfunctional.tv/?p=28660\"\n   :contents\n   ({:type \"html\"\n     :mode nil\n     :value\n     \"\u003cdiv class=\\\" reset\\\"\u003e\\n\u003cp\u003e\u003cem\u003eIssue 287 August 13, 2018 \u003ca href=\\\"https://purelyfunctional.tv/newsletter-archives/\\\"\u003eArchives\u003c/a\u003e \u003ca href=\\\"https://purelyfunctional.tv/newsletter/\\\" title=\\\"Thanks, Jeff!\\\"\u003eSubscribe\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\\n\u003cp\u003eHi Clojurationists,\u003c/p\u003e\\n\u003cp\u003eI've just been digging \u003ca href=\\\"https://twitter.com/puredanger/status/1028103654241443840\\\" title=\\\"\\\"\u003ethis lovely tweet from Alex Miller\u003c/a\u003e.\u003c/p\u003e\\n\u003cp\u003eRock on!\u003cbr /\u003e\u003ca href=\\\"http://twitter.com/ericnormand\\\"\u003eEric Normand\u003c/a\u003e \u0026lt;\u003ca href=\\\"mailto: ... \"}),\n :published-date #inst \"2018-08-13T11:59:11.000-00:00\"\n :entry-links\n ({:rel \"alternate\"\n   :href \"http://planet.clojure.in/\"\n   :length 0}\n  {:rel \"self\"\n   :href \"http://planet.clojure.in/atom.xml\",\n   :length 0})\n :title \"Planet Clojure\"\n :language nil\n :link \"http://planet.clojure.in/\"\n :uri \"http://planet.clojure.in/atom.xml\"\n :authors ()}\n```\n\n\u003c/details\u003e\n\nAs for HTTP response, it's a data structure returned by an HTTP client. You\nmight need it to save some of HTTP headers for further requests (see below).\n\n### Parsing a source\n\nThe function `parse` accepts any kind of a source that can be coerced to an\ninput stream: a file, a reader, and so on:\n\n~~~clojure\n(remus/parse \"/path/to/file/xml\"\n(remus/parse (get-some-input-stream...))\n~~~\n\nThere is a couple of deprecated functions called `parse-file` and `parse-stream`\nthat act like `parse` (left for compatibility).\n\nAll these functions return a parsed feed.\n\n## HTTP tweaks\n\nSince `Remus` relies on HTTP interaction, sometimes you need to tweak it:\ncontrol redirects, security validation, authentication, etc. When calling\n`parse-url`, specify an optional map with HTTP parameters:\n\n```clojure\n;; Do not check an untrusted SSL certificate.\n(remus/parse-url \"http://planet.clojure.in/atom.xml\"\n                 {:insecure true})\n\n\n;; Parse a user/pass protected HTTP resource.\n(remus/parse-url \"http://planet.clojure.in/atom.xml\"\n                 {:basic-auth [\"username\" \"password\"]})\n\n\n;; Pretending being a browser. Some sites protect access by \"User-Agent\" header.\n(remus/parse-url \"http://planet.clojure.in/atom.xml\"\n                 {:headers {\"User-Agent\" \"Mozilla/5.0 (Macintosh; Intel Mac....\"}})\n\n;; Setting a timeout\n(remus/parse-url \"...\" {:timeout 5000}) ;; wait up to 5 seconds\n```\n\nRemus overrides the following HTTP options:\n- `:as` is always `:stream`;\n- `:throw` is false. It prevents the HTTP layer from throwing exceptions\n  immediately should a non-200 status met. Later on, an exception with\n  detailed message is thrown.\n- The `accept-encoding` HTTP header is set to `gzip` and `deflate`.\n\n### Error cases\n\nThe library will argue on non-200 HTTP responses:\n\n~~~clojure\n;; 404\n(remus/parse-url \"http://planet.clojure.in/dunno\")\n\nExecution error at remus/parse-http-resp (remus.clj:108).\nNon-200 status code, status: 404, url: http://planet.clojure.in/dunno, content-type: text/html\n~~~\n\nThe same applies to non-XML Content-Type header values:\n\n~~~clojure\n;; 200 but not XML\n(remus/parse-url \"http://planet.clojure.in/\")\n\nExecution error at remus/parse-http-resp (remus.clj:106).\nNon-XML response, status: 200, url: http://planet.clojure.in/, content-type: text/html\n~~~\n\n### Saving extra data\n\n[cond-get]: https://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers\n\nWhen parsing a URL, a good option would be to pass the `If-None-Match` and\n`If-Modified-Since` headers with the values from the `Etag` and `Last-Modified`\nones from the previous response. This trick is know as [conditional\nGET][cond-get]. It might prevent server from sending the data you've already\nreceived before:\n\n```clojure\n;; returns the whole feed\n(def result (remus/parse-url \"http://planet.lisp.org/rss20.xml\"))\n\n;; split the result\n(def feed (:feed result))\n(def response (:response result))\n\n;; ensure we got the data\n(:length response)\n48082\n\n;; save the headers\n(def etag (-\u003e response :headers :etag))\n;; \"5b71766f-2f597\"\n\n(def last-modified (-\u003e response :headers :last-modified))\n;; Mon, 19 Oct 2020 12:15:27 GMT\n\n;;;;\n;; Now, try to fetch data passing conditionals headers:\n;;;;\n\n(def result-new\n  (remus/parse-url \"http://planet.lisp.org/rss20.xml\"\n                   {:headers {\"If-None-Match\" etag\n                              \"If-Modified-Since\" last-modified}}))\n\n(-\u003e result-new :response :status)\n304\n\n(-\u003e result-new :response :length)\n0\n\n(-\u003e result-new :feed)\nnil\n```\n\nSince the server returned non-200 but positive status code (304 in our case), we\ndon't parse the response at all. So the `:feed` field in the `result-new`\nvariable will be `nil`.\n\n## Non-standard tags\n\n[youtube-rss]: https://www.youtube.com/feeds/videos.xml?channel_id=UCaLlzGqiPE2QRj6sSOawJRg\n\nSometimes, a feed ships additional data with non-standard tags. A good example\nmight be a typical [YouTube feed][youtube-rss]. Let's examine one of its\nentries:\n\n```xml\n\u003centry\u003e\n  \u003cid\u003eyt:video:TbthtdBw93w\u003c/id\u003e\n  \u003cyt:videoId\u003eTbthtdBw93w\u003c/yt:videoId\u003e\n  \u003cyt:channelId\u003eUCaLlzGqiPE2QRj6sSOawJRg\u003c/yt:channelId\u003e\n  \u003ctitle\u003eDatomic Ions in Seven Minutes\u003c/title\u003e\n  \u003clink rel=\"alternate\" href=\"https://www.youtube.com/watch?v=TbthtdBw93w\"/\u003e\n  \u003cauthor\u003e\n    \u003cname\u003eClojureTV\u003c/name\u003e\n    \u003curi\u003e\n      https://www.youtube.com/channel/UCaLlzGqiPE2QRj6sSOawJRg\n    \u003c/uri\u003e\n  \u003c/author\u003e\n  \u003cpublished\u003e2018-07-03T21:16:16+00:00\u003c/published\u003e\n  \u003cupdated\u003e2018-08-09T16:29:51+00:00\u003c/updated\u003e\n  \u003cmedia:group\u003e\n    \u003cmedia:title\u003eDatomic Ions in Seven Minutes\u003c/media:title\u003e\n    \u003cmedia:content url=\"https://www.youtube.com/v/TbthtdBw93w?version=3\" type=\"application/x-shockwave-flash\" width=\"640\" height=\"390\"/\u003e\n    \u003cmedia:thumbnail url=\"https://i1.ytimg.com/vi/TbthtdBw93w/hqdefault.jpg\" width=\"480\" height=\"360\"/\u003e\n    \u003cmedia:description\u003e\n      Stuart Halloway introduces Ions for Datomic Cloud on AWS.\n    \u003c/media:description\u003e\n    \u003cmedia:community\u003e\n      \u003cmedia:starRating count=\"67\" average=\"5.00\" min=\"1\" max=\"5\"/\u003e\n      \u003cmedia:statistics views=\"1977\"/\u003e\n    \u003c/media:community\u003e\n  \u003c/media:group\u003e\n\u003c/entry\u003e\n```\n\nIn addition to the standard fields, the feed carries information about the video\nID, channel ID and statistics: views count, the number of times the video was\nstarred and its average rating. You would probably want to use that data.\n\nAlternately, if you parse a geo-related feed, you'll get lat/lot coordinates,\nlocation names, tracks, etc.\n\nOther RSS parsers either drop this data or require you to write a custom\nextension. `Remus` provides all the non-standard tags as a parsed XML\nstructure. It puts that data into an `:extra` field for each entry and on the\ntop level of a feed. This is how you can reach it:\n\n```clojure\n(def result (remus/parse-url \"https://www.youtube.com/feeds/videos.xml?channel_id=UCaLlzGqiPE2QRj6sSOawJRg\"))\n\n(def feed (:feed result))\n\n;;;;\n;; Get entry-specific custom data\n;;;;\n\n;; Extra data from the first entry:\n(-\u003e feed :entries first :extra)\n\n{:tag :rome/extra\n :attrs nil\n :content\n ({:tag :yt/videoId :attrs nil :content [\"faoXSarGgEI\"]}\n  {:tag :yt/channelId :attrs nil :content [\"UCaLlzGqiPE2QRj6sSOawJRg\"]}\n  {:tag :media/group\n   :attrs nil\n   :content\n   ({:tag :media/title :attrs nil :content [\"Datomic Cloud - Datoms\"]}\n    {:tag :media/content\n     :attrs\n     {:url \"https://www.youtube.com/v/faoXSarGgEI?version=3\"\n      :type \"application/x-shockwave-flash\"\n      :width \"640\"\n      :height \"390\"}\n     :content nil}\n    {:tag :media/thumbnail\n     :attrs\n     {:url \"https://i3.ytimg.com/vi/faoXSarGgEI/hqdefault.jpg\"\n      :width \"480\"\n      :height \"360\"}\n     :content nil}\n    {:tag :media/description\n     :attrs nil\n     :content\n     [\"Check out the live animated tutorial: https://docs.datomic.com/cloud/livetutorial/datoms.html\\n\\nYour Datomic database consists of datoms. What are Datoms?\"]}\n    {:tag :media/community\n     :attrs nil\n     :content\n     ({:tag :media/starRating\n       :attrs {:count \"72\" :average \"5.00\" :min \"1\" :max \"5\"}\n       :content nil}\n      {:tag :media/statistics :attrs {:views \"2014\"} :content nil})})})}\n\n\n;;;;\n;; Get feed-specific extra:\n;;;;\n\n(-\u003e feed :extra)\n\n{:tag :rome/extra\n :attrs nil\n :content\n ({:tag :yt/channelId :attrs nil :content [\"UCaLlzGqiPE2QRj6sSOawJRg\"]})}\n```\n\nThe `:extra` fields follow the standard XML-friendly structure so they can be\nprocessed with any XML-related technics like walking, zippers, etc.\n\n## Encoding issues\n\nAll the parsing functions above take additional ROME-related options. Use them\nto solve XML-decoding issues when dealing with weird or non-set HTTP\nheaders. ROME's got a solid algorithm to guess encoding, but sometimes it might\nneed your help.\n\nAt the moment, Remus supports `:lenient`, `:encoding` and `content-type` options\nwith has the following meaning:\n\n- `lenient`: a boolean flag which makes Rome to be more loyal to some mistakes\n  in XML markup;\n\n- `encoding`: a string which represents the encoding of the feed.  When parsing\n  a URL, it comes from the `Content-Encoding` HTTP header.  Possible values are\n  listed here: https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html\n\n- `content-type`: a string meaning the MIME type of the feed,\n  e.g. `application/rss` or something. When parsing a URL, it comes from the\n  `Content-Type` header.\n\nDealing with Windows encoding and unset `Content-type` or `Content-Encoding`\nheaders:\n\n```clojure\n(remus/parse-url \"https://some/rss.xml\"\n                 nil ;; skip http options\n                 {:lenient true :encoding \"cp1251\"})\n```\n\nThe same options work for parsing a file or a stream:\n\n```clojure\n(remus/parse-file \"https://another/atom.xml\" {:lenient true :encoding \"cp1251\"})\n\n(remus/parse-stream in-source {:lenient true :encoding \"cp1251\"})\n```\n\n## Misc\n\n~~~\n©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©\nIvan Grishaev, 2025. © UNLICENSE ©\n©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©\n~~~\n","funding_links":[],"categories":["Clojure"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Figrishaev%2Fremus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Figrishaev%2Fremus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Figrishaev%2Fremus/lists"}