{"id":40935304,"url":"https://github.com/gsauthof/feed-util","last_synced_at":"2026-01-22T04:15:24.456Z","repository":{"id":45660369,"uuid":"78292576","full_name":"gsauthof/feed-util","owner":"gsauthof","description":"e.g. for generating a better heise.de feed","archived":false,"fork":false,"pushed_at":"2025-10-19T21:49:31.000Z","size":134,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-20T03:57:49.399Z","etag":null,"topics":["atom-feed-parser","minimal-feed-readers","rss-feed-parser"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gsauthof.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-01-07T17:17:26.000Z","updated_at":"2025-10-19T21:49:34.000Z","dependencies_parsed_at":"2023-01-30T01:30:40.199Z","dependency_job_id":"c480ac5c-ce74-41f8-9487-3e9da85163ee","html_url":"https://github.com/gsauthof/feed-util","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gsauthof/feed-util","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Ffeed-util","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Ffeed-util/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Ffeed-util/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Ffeed-util/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gsauthof","download_url":"https://codeload.github.com/gsauthof/feed-util/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Ffeed-util/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28653970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T01:17:37.254Z","status":"online","status_checked_at":"2026-01-22T02:00:07.137Z","response_time":144,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atom-feed-parser","minimal-feed-readers","rss-feed-parser"],"created_at":"2026-01-22T04:15:23.775Z","updated_at":"2026-01-22T04:15:24.449Z","avatar_url":"https://github.com/gsauthof.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"This repository contains news feed related utilities.\n\n- [`heiser.py`](#heiserpy) - a program that augments\n  the heise.de news feed with content of the referenced articles\n- [`lwn.py`](#lwnpy) - create an atom feed with content for lwn.net\n  articles\n- `rss2atom.py` - convert an RSS 2 feed into an Atom one\n  and deep copy the entry links as Atom content\n- `cast.py` - create a minimal audio-cast Atom feed via\n  extracting the information from some HTML pages\n- [`betterflix.py`](#betterflixpy) - create atom feed of newly added Neflix/Prime\n  movies that don't have a poor IMDB score\n- [`castproxy.py`](#castproxypy) - aggregate multiple audiocasts (podcasts) into\n  a single Atom feed, optionally filtering each episode\n- [`estr.py`](#estr) - fetch the daily Euro short-term rate (€STR/ESTR)\n\n2017, Georg Sauthoff \u003cmail@gms.tf\u003e\n\n## `heiser.py`\n\nExample:\n\n    $ ./heiser.py -o /srv/website/heiser.xml\n\nOr running it periodically via a [crontab][crontab] entry:\n\n    */20 * * * * /path/to/heiser.py -o /srv/website/heiser.xml\n\n(runs it every 20 minutes)\n\nSay a http daemon serves `/srv/website` as `https://example.org/`\nthen you can retrieve the augmented feed via subscribing to\n`https://example.org/heiser.xml`.\n\n\n### Motivation\n\nThe standard heise.de news feed just contains a summary of each\narticle thus it is poorly suited for minimal feed readers and\nusage on mobile devices.\n\n### What it does\n\n`heiser.py` fetches the heise.de feed and edits it such that in\nthe end each entry contains proper author information and the\ncomplete article content. It also converts relative URL\nreferences into absolute ones..\n\n### How it works\n\nThe program is written in Python 3. [html5lib][html5lib] is used\nfor parsing the real-world HTML articles into XML trees (e.g.\nlibxml2 fails on those inputs - even in HTML mode). For parsing\nXML and other XML massaging (i.e. the input and output feeds are\nencoded in [Atom][atom]) the [ElementTree-API][et] that is part\nof the Python standard library is used. Referenced articles are\nlocally cached for a few days to avoid redundant retrievals. The\nconvenient [requests][requests] library is used for all HTTP\noperations. They go through a `requests.Session` object such that\na connection is reused for multiple HTTP GET operations that\ntarget the same server.\n\n## `lwn.py`\n\nSimilar to `heiser.py` this program creates a rich [atom][atom] feed of\nthe latest [LWN.net][lwn] articles. Such a feed is optimal for\nminimal feed readers and has value even without a proper internet\nconnection.\n\nIn contrast to the heise situation, since lwn.net only provides\nRSS feeds, the `lwn.py` program doesn't augment anything.\nInstead, it creates the atom feed, from scratch.\n\nBesides being in a weird format and missing the article contents,\nthe RSS feeds published on lwn.net usually rotate too soon. That\nmeans before all the latest articles are de-embargoed (lwn.net\nhas a time-limited paywall for new articles).\n\n## `betterflix.py`\n\nNetflix more and more develops into a dumping ground for low\nquality movies and series. In addition, it comes with a horrible\nUI that doesn't allow to filter the content in any meaningful\nway. For example, it isn't even possible to filter for just\nmovies, to exclude movies one has downvoted, one has already\nwatched or to exclude movies made by certain directors one\ndislikes.\n\nIt also doesn't help that Netflix completely fails to come up\nwith a good recommendation system. FWIW, I rated hundreds of\nmovies on Netflix (good and bad ones) and it doesn't make a\ndifference. The Netflix system still stubbornly suggests me the\nmost garbage movies and includes negatively rated movies in all\nof their automatically 'curated' categories.\n\nI thus created `betterflix.py`, a small script that fetches a\nlist of movies newly added by Netflix (or Prime) from the German\n[wer-streamt.es][wse] service, fetches the [IMDB][imdb] score of\neach movie and only selects movies with a score above a threshold\n(say 6.5) into an [Atom][atom] feed.\n\nOf course, using the IMDB score isn't perfect, but in my\nexperience it works surprisingly well - at least as a first\nfilter. Perhaps there is less fake reviewing happening on IMDB\nthan - say - on Amazon (for products), although IMDB is nowadays\nalso owned by Amazon.\n\nOne exception I noticed is when a popular Hollywood actor\nswitches the usual pattern and participates in an independent art\nhouse production. In those cases a high quality movie might get\nan unusual bad IMDB score because suddenly a crowd of fans that\nare used to mainstream genres watch something completely\ndifferent (because their favourite actor stars in it) and might\nbe easily frustrated.\n\n### Example Usage\n\nCreate an atom feed for new releases on Netflix:\n\n    ./betterflix.py -o net-flix.xml\n\nCreate a similar feed for movies recently added to Amazon Prime:\n\n    ./betterflix.py --prime -o prime-flix.xml\n\nUse a different filter threshold (greater or equal):\n\n    ./betterflix.py --thresh 7.1 -o net-flix.xml\n\nAs always, one can add such a call to a crontab on your private\nweb server such that your private feed is updated once a day,\nfor consumption by a mobile device.\n\n\n## `castproxy.py`\n\nCastproxy aggregates multiple audiocasts (podcasts) into a single\nAtom feed, based on a TOML feed configuration file.\n\nIt's killer feature is being able to configure a filter command\nthat is applied to each episode.\n\nSuch a filter can be used to convert weird audio formats,\nnormalize the volume or cut certain crap out of the audio file.\n\nFor example, integrating [cutbynoise][cutbynoise] - a\naudiocast/podcast [ad blocker][adblock], may look like this:\n\n```\n[[feed]]\nurl = 'https://feeds.lagedernation.org/feeds/ldn-mp3.xml'\nname = 'Lage der Nation'\nshort = 'ldn'\nlimit = 3\nfilter = [ 'cutbynoise', '-w', '-b', 'ldn-end.flac', '-v', '%src', '-o', '%dst' ]\n\n[[feed]]\nurl = 'https://minkorrekt.podigee.io/feed/mp3'\nname = 'Methodisch Inkorrekt'\nshort = 'mi'\nlimit = 3\nfilter = [ 'cutbynoise', '-w', '-b', 'mi-begin.flac', '-e', 'mi-end.flac', '-v', '%src', '-o', '%dst' ]\n```\n\nSuch filtering is optional. Also, in case a filter fails, the\noriginal episode file is delivered.\n\nBesides the filtering, the aggregation can be useful, in its own\nright. For example, when the target client is running on a mobile\ndevice, only requesting an aggregated feed may save battery, save\nmobile bandwidth and increase feed refresh speed, in comparison\nto having to request each feed separately, from their sources.\n\nAlso, such an aggregated feed may improve your privacy, as it\nlimits tracking of your (mobile) IP connection and may block\nadditional tracking and advertisement links in the HTML included\nin a feed.\n\n\n### Setup\n\nCastproxy is intended to run periodically, i.e. as a cron job or\na systemd timer.\n\nSince it uses the fine [Configargparse][cargparse] package, its\noptions can be placed in a configuration file, for clarity.\nA somewhat minimal example of such a configuration:\n\n```\n# NB: work directory needs to be on the same filesystem as the media directory\nwork   = /path/to/tmp/dir\nfeeds  = /path/to/feedcfg/castproxy.toml\nurl    = https://example.org/somebase\noutput = /srv/example.org/somebase/feed.xml\nmedia  = /srv/example.org/somebase/media\n```\n\nCron job call:\n\n```\n/path/to/castproxy -c /path/to/castproxy.ini\n```\n\n### How it works\n\nCastproxy goes to some lengths to eliminate superfluous HTTP\nrequests.  Thus, it keeps some state in its work directory (in\nJSON files) to store [ETag][etag] and last-modified header values for\nthe next follow-up request.\nIn that way, when a feed hasn't changed since the last request,\nthe server can simply respond with HTTP 304 Not Modified and\ncastproxy is saved from fetching and processing that feed,\nneedlessly.\n\nSimilarly, the aggregated feed (cf. `--output`) is only written\nwhen at least one of the sources did change. Hence, a downstream\nclient that properly implements this protocol, also only ever\nupdates the aggregated feed on real changes.\n\nTo simplify the parsing of audiocast (podcast) feeds, which can\nbe quite diverse to due to wild growth of RSS versions and\npodcast format extensions, castproxy relies on\n[feedparser][feedparser] for this task.\n\nIn contrast, the aggregated output is just a minimal Atom\nconforming feed, generated directly using the Python\n[ElementTree][et] API.\n\nFor all HTTP needs castproxy uses [pycurl][pycurl] and to\nnormalize dates it relies on [dateutil][dateutil].\n\n\n## ESTR\n\nThe utility `estr.py` fetches the current [Euro short-term\nrate](https://www.ecb.europa.eu/stats/financial_markets_and_interest_rates/euro_short-term_rate/html/index.en.html)\n([€STR a.k.a. ESTR](https://en.wikipedia.org/wiki/%E2%82%ACSTR))\nfrom the European Central Bank.\n\nThe ESTR is based on real money-market transactions between\nfinancial institutions. Building a local time-series over time can\nbe handy for a couple a use cases:\n\n- Determining the spread between the ESTR and your [money market\n  account](https://en.wikipedia.org/wiki/Money_market_account)\n  interest rate.\n  In case it's too high it may be a sign that you want to\n  complain to your bank and look for alternatives.\n- Checking that an investment in an ETF that tracks the ESTR\n  still makes sense.\n\nAn ETF that tracks the ESTR (such as\n[FR0010510800](https://isin.toolforge.org/?isin=FR0010510800) or\n[LU0290358497](https://isin.toolforge.org/?isin=LU0290358497))\ncan be seen as an alternative to a traditional money market\naccount (MMA, German: Tagesgeldkonto) your bank offers.\nWhere banks often calculate relatively high costs for MMAs (e.g. 0.8 %\nto 2.9 % when the ESTR is at 3.4 %), the example ETFs\nonly have a cost rate of 0.1 %.\n\nWhile European MMAs are part of the mandatory and\nstate-guaranteed [deposit insurance\nsystem](https://en.wikipedia.org/wiki/Deposit_insurance) (up to\n100k € per customer per bank, basically), the fund assets have\nspecial protection (that is not limited to 100k €!) if the issuer\ncollapses.\n\nTechnically, with an ETF, there is a tiny risk that a fraudulent\nbroker scams you out of your shares, and then the German bank\nregulation safety net might limit compensation to [20k € per\nperson](https://de.wikipedia.org/wiki/Anlegerentsch%C3%A4digungsgesetz).\nAlso, although the kind of\n[swaps](https://en.wikipedia.org/wiki/Swap_(finance)) used by an\nESTR tracking ETF is considered low risk by many, incompetent\nfund management could still screw up big time, internal and\nexternal control could fail at the same time and/or fund\nmanagement could deceive the public and regulators, given enough\ncriminal energy is available.\nHowever, the probability of something like this may be relatively\ntiny in comparison to other bad things that might happen to you.\n\nAs the ESTR approaches and crosses the inflation rate one may\nwant to reevaluate and reconsider any investments into ESTR\ntracking ETFs.\n\n\n### Usage Examples\n\nFetch last ESTR and print in CSV format:\n\n```\n./estr.py --csv\nref_date,pub_date,rate,initial_volume,number_banks,number_trnx,sh_vol_top_banks,pub_mode,vol_dist_25,vol_dist_75,pub_type\n2024-10-03,2024-10-04,3.407,37573,31,497,58,Normal,3.39,3.43,Standard\n```\n\nGenerate create statement for a table of ESTR data:\n\n```\n./estr.py --create\nCREATE TABLE estr (\n    ref_date         timestamp PRIMARY KEY,\n    pub_date         timestamp,\n    rate             decimal,\n    initial_volume   bigint,\n    number_banks     bigint,\n    number_trnx      bigint,\n    sh_vol_top_banks bigint,\n    pub_mode         varchar(14),\n    vol_dist_25      decimal,\n    vol_dist_75      decimal,\n    pub_type         varchar(14)\n);\n```\n\nCommand you can put into a cron job that imports the last ESTR on\neach work day:\n\n```\npsql -d mydb --no-psqlrc --quiet --echo-errors -c \"$(/usr/local/bin/estr --sql)\"\n```\n\n\n[atom]: https://en.wikipedia.org/wiki/Atom_(standard)\n[et]: https://docs.python.org/3/library/xml.etree.elementtree.html\n[html5lib]: https://github.com/html5lib/html5lib-python\n[requests]: http://docs.python-requests.org/en/master/\n[crontab]: https://en.wikipedia.org/wiki/Cron\n[lwn]: https://lwn.net/\n[rss]: https://en.wikipedia.org/wiki/RSS\n[wse]: https://www.werstreamt.es\n[imdb]: https://en.wikipedia.org/wiki/IMDb\n[cutbynoise]: https://github.com/gsauthof/cutbynoise\n[adblock]: https://en.wikipedia.org/wiki/Ad_blocking\n[cargparse]: https://github.com/bw2/ConfigArgParse\n[etag]: https://en.wikipedia.org/wiki/HTTP_ETag\n[feedparser]: https://github.com/kurtmckee/feedparser\n[pycurl]: http://pycurl.io/\n[dateutil]: https://github.com/dateutil/dateutil\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsauthof%2Ffeed-util","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgsauthof%2Ffeed-util","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsauthof%2Ffeed-util/lists"}