{"id":13600854,"url":"https://github.com/impredicative/irc-rss-feed-bot","last_synced_at":"2025-10-24T06:42:32.552Z","repository":{"id":48989738,"uuid":"174647959","full_name":"impredicative/irc-rss-feed-bot","owner":"impredicative","description":"Dockerized IRC bot to post RSS/Atom and scraped HTML/JSON/CSV feeds to channels","archived":false,"fork":false,"pushed_at":"2023-07-25T20:41:52.000Z","size":1370,"stargazers_count":28,"open_issues_count":22,"forks_count":4,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-11-07T03:42:20.377Z","etag":null,"topics":["irc-bot","irc-rss-bot","rss"],"latest_commit_sha":null,"homepage":"https://hub.docker.com/r/ascensive/irc-rss-feed-bot","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/impredicative.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-09T04:28:28.000Z","updated_at":"2024-06-20T20:52:03.000Z","dependencies_parsed_at":"2024-11-07T03:34:01.123Z","dependency_job_id":"62ee9a96-11f0-473e-b070-b6d816390d01","html_url":"https://github.com/impredicative/irc-rss-feed-bot","commit_stats":null,"previous_names":[],"tags_count":122,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/impredicative%2Firc-rss-feed-bot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/impredicative%2Firc-rss-feed-bot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/impredicative%2Firc-rss-feed-bot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/impredicative%2Firc-rss-feed-bot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/impredicative","download_url":"https://codeload.github.com/impredicative/irc-rss-feed-bot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248324950,"owners_count":21084837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["irc-bot","irc-rss-bot","rss"],"created_at":"2024-08-01T18:00:49.865Z","updated_at":"2025-09-19T17:39:28.936Z","avatar_url":"https://github.com/impredicative.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# irc-rss-feed-bot\n**irc-rss-feed-bot** is a dockerized Python 3.11 and IRC based RSS/Atom and scraped HTML/JSON/CSV feed posting bot.\nIt essentially posts the entries of feeds in IRC channels, one entry per message.\nMore specifically, it posts the titles and shortened URLs of entries.\n\n## Contents\n- [Features](#features)\n- [Links](#links)\n- [Examples](#examples)\n- [Development](#development)\n- [Usage](#usage)\n  * [Configuration: secret](#configuration-secret)\n  * [Configuration: non-secret](#configuration-non-secret)\n    + [Global settings](#global-settings)\n      - [Mandatory](#mandatory)\n      - [Recommended](#recommended)\n      - [Developer](#developer)\n    + [Feed-specific settings](#feed-specific-settings)\n      - [Mandatory](#mandatory-1)\n      - [Optional](#optional)\n      - [Parser](#parser)\n      - [Conditional](#conditional)\n    + [Feed default settings](#feed-default-settings)\n  * [Commands](#commands)\n    + [Administrative](#administrative)\n- [Deployment](#deployment)\n- [Maintenance](#maintenance)\n  * [Service](#service)\n  * [Config](#config)\n  * [Database](#database)\n  * [Disk cache](#disk-cache)\n\n## Features\n* Multiple channels on an IRC server are supported, with each channel having its own set of feeds.\nFor use with multiple servers, a separate instance of the bot process can be run for each server.\n* Entries are posted only if the channel has not had any conversation for a certain minimum amount of time, \nthereby avoiding the interruption of any preexisting conversations.\nThis amount of time is 15 minutes for any feed which has a polling period greater than 12 minutes.\nThere is however no delay for any feed which has a polling period less than or equal to 12 minutes as such a feed is\nconsidered urgent.\n* A SQLite database file records hashes of the entries that have been posted, thereby preventing them from being\nreposted.\n* Posted URLs are shortened using the [da.gd](https://da.gd/) service.\n* The [`hext`](https://pypi.org/project/hext/), [`jmespath`](https://pypi.org/project/jmespath/), and \n[`pandas`](https://pandas.pydata.org/) DSLs are supported for flexibly parsing arbitrary HTML, JSON, and CSV content \nrespectively. These parsers also support configurable recursive crawling.\n* Entry titles are formatted for neatness.\nAny HTML tags and excessive whitespace are stripped, all-caps are replaced,\nand excessively long titles are sanely truncated. \n* A TTL and ETag based compressed disk cache of URL content is used for preventing unnecessary URL reads.\nAny websites with a mismatched _strong_ ETag are probabilistically detected, and this caching is then disabled for them\nfor the duration of the process. Note that this detection is skipped for a _weak_ ETag.\n* Encoded Google News and FeedBurner URLs are decoded.\n\nFor several more features, see the customizable [global](#global-settings) and [feed-specific](#feed-specific-settings) settings, and [commands](#commands).\n\n## Links\n| Caption   | Link                                                       |\n|-----------|------------------------------------------------------------|\n| Repo      | https://github.com/impredicative/irc-rss-feed-bot          |\n| Changelog | https://github.com/impredicative/irc-rss-feed-bot/releases |\n| Image     | https://hub.docker.com/r/ascensive/irc-rss-feed-bot        |\n\n## Examples\n```text\n\u003cFeedBot\u003e [ArXiv:cs.AI] Concurrent Meta Reinforcement Learning → https://arxiv.org/abs/1903.02710v1\n\u003cFeedBot\u003e [ArXiv:cs.AI] Attack Graph Obfuscation → https://arxiv.org/abs/1903.02601v1\n\u003cFeedBot\u003e [InfoWorld] What is a devops engineer? And how do you become one? → https://da.gd/dvXh9\n\u003cFeedBot\u003e [InfoWorld] What is Jupyter Notebook? Data analysis made easier → https://da.gd/yrCi\n\u003cFeedBot\u003e [AWS:OpenData] COVID-19 Open Research Dataset (CORD-19): Full-text and metadata dataset of\n            COVID-19 research articles. → https://registry.opendata.aws/cord-19\n```\n![](images/sample_posts.png)\n\n## Development\nFor software development purposes only, the project can be set up on Ubuntu as below.\n```bash\nmake setup-ppa\nmake install-py\nmake setup-venv\nmake shell\nmake install\nmake test\nmake build\n```\n\n## Usage\n### Configuration: secret\nPrepare a private `secrets.env` environment file using the sample below.\n```ini\nIRC_PASSWORD=YourActualPassword\nGITHUB_TOKEN=c81a62ca23caa140715bbfc175997c02d0fdd768\n```\n\n#### GITHUB_TOKEN\nThis is optional. Refer to the [`publish.github`](#optional) feature.\n\n### Configuration: non-secret\nPrepare a version-controlled `config.yaml` file using the sample below.\nA full-fledged real-world example is also\n[available](https://github.com/impredicative/irc-bots/blob/master/libera/news-bot/config.yaml).\n```yaml\nhost: irc.libera.chat\nssl_port: 6697\n#ssl_verify: true\nnick: MyFeedBot\nadmin: mynick!myident@myhost\nalerts_channel: '#mybot-alerts'\nmode:\n#mirror: '#mybot-mirror'\n#publish:\n#  github: MyGithubServiceAccountUsername/IrcServerName-MyBotName-live\n#defaults:\n#  new: all\nfeeds:\n  \"##mybot-alerts\":\n    irc-rss-feed-bot:\n      url: https://github.com/impredicative/irc-rss-feed-bot/releases.atom\n      period: 12\n      shorten: false\n  \"#some_chan1\":\n    AWS:OpenData:\n      url: https://registry.opendata.aws/rss.xml\n      message:\n        summary: true\n    CDC:FoodSafety:\n      url: https://tools.cdc.gov/api/v2/resources/media/316422.rss\n      redirect: true\n    j:AJCN:\n      url: https://academic.oup.com/rss/site_6122/3981.xml\n      mirror: false\n      period: 12\n      blacklist:\n        title:\n          - ^Calendar\\ of\\ Events$\n    LitCovid:\n      url: https://www.ncbi.nlm.nih.gov/research/coronavirus-api/export\n      pandas: |-\n        read_csv(file, comment=\"#\", sep=\"\\t\") \\\n        .assign(link=lambda r: \"https://pubmed.ncbi.nlm.nih.gov/\" + r[\"pmid\"].astype(\"str\")) \\\n        .convert_dtypes()\n    MedicalXpress:nutrition:\n      url: https://medicalxpress.com/rss-feed/search/?search=nutrition\n    r/FoodNerds:\n      url: https://www.reddit.com/r/FoodNerds/new/.rss\n      shorten: false\n      sub:\n        url:\n          pattern: ^https://www\\.reddit\\.com/r/.+?/comments/(?P\u003cid\u003e.+?)/.+$\n          repl: https://redd.it/\\g\u003cid\u003e\n  \"##some_chan2\":\n    ArXiv:cs.AI: \u0026ArXiv\n      url: http://export.arxiv.org/rss/cs.AI\n      period: 1.5\n      https: true\n      shorten: false\n      group: ArXiv:cs\n      alerts:\n        empty: false\n      format:\n        re:\n          title: '^(?P\u003cname\u003e.+?)\\.?\\ \\(arXiv:.+(?P\u003cver\u003ev\\d+)\\ '\n        str:\n          title: '{name}'\n          url: '{url}{ver}'\n    ArXiv:cs.NE:\n      \u003c\u003c: *ArXiv\n      url: http://export.arxiv.org/rss/cs.NE\n    ArXiv:stat.ML:\n      \u003c\u003c: *ArXiv\n      url: http://export.arxiv.org/rss/stat.ML\n      group: null\n    AWS:status:\n      url: https://status.aws.amazon.com/rss/all.rss\n      period: .2\n      https: true\n      new: none\n      sub:\n        title:\n          pattern: ^(?:Informational\\ message|Service\\ is\\ operating\\ normally):\\ \\[RESOLVED\\]\n          repl: '[RESOLVED]'\n      format:\n        re:\n          id: /\\#(?P\u003cservice\u003e[^_]+)\n        str:\n          title: '[{service}] {title} | {summary}'\n          url: '{id}'\n    Fb:Research:\n      url: https://research.fb.com/publications/\n      hext: |-\n        \u003cdiv\u003e\n            \u003ca href:link\u003e\u003ch3 @text:title/\u003e\u003c/a\u003e\n            \u003cdiv class=\"areas-wrapper\"\u003e\u003ca href @text:category/\u003e\u003c/div\u003e\n        \u003c/div\u003e\n        \u003cdiv\u003e\u003cform class=\"download-form\" action/\u003e\u003c/div\u003e\n      whitelist:\n        category:\n          - ^(?:Facebook\\ AI\\ Research|Machine\\ Learning|Natural\\ Language\\ Processing\\ \\\u0026\\ Speech)$\n    InfoWorld:\n      url: https://www.infoworld.com/index.rss\n      order: reverse\n    j:MDPI:N:  # https://www.mdpi.com/journal/nutrients (open access)\n      url: https://www.mdpi.com/rss/journal/nutrients\n      www: false\n    KDnuggets:\n      url: https://us-east1-ml-feeds.cloudfunctions.net/kdnuggets\n      new: some\n    libraries.io/pypi/scikit-learn:\n      url: https://libraries.io/pypi/scikit-learn/versions.atom\n      new: none\n      period: 8\n      shorten: false\n    MedRxiv:\n      url:\n        - https://connect.medrxiv.org/medrxiv_xml.php?subject=Health_Informatics\n        - https://connect.medrxiv.org/medrxiv_xml.php?subject=Nutrition\n      alerts:\n        read: false\n      https: true\n    r/MachineLearning:100+:\n      url: https://www.reddit.com/r/MachineLearning/hot/.json?limit=50\n      jmespath: 'data.children[*].data | [?score \u003e= `100`].{title: title, link: join(``, [`https://redd.it/`, id])}'\n      shorten: false\n    r/wallstreetbets:50+:\n      url: https://www.reddit.com/r/wallstreetbets/hot/.json?limit=98\n      jmespath: 'data.children[*].data | [?(not_null(link_flair_text) \u0026\u0026 score \u003e= `50`)].{title: join(``, [`[`, link_flair_text, `] `, title]), link: join(``, [`https://redd.it/`, id]), category: link_flair_text}'\n      emoji: false\n      shorten: false\n      blacklist:\n        category:\n          - ^(?:Daily\\ Discussion|Gain|Loss|Meme|Weekend\\ Discussion|YOLO)$\n    PwC:Latest:\n      url: https://us-east1-ml-feeds.cloudfunctions.net/pwc/latest\n      period: 0.5\n      dedup: channel\n    PwC:Trending:\n      url: https://us-east1-ml-feeds.cloudfunctions.net/pwc/trending\n      period: 0.5\n      dedup: channel\n    SeekingAlpha:\n      period: 0.2\n      sub:\n        url:\n          pattern: ^(?P\u003cmain_url\u003ehttps://seekingalpha\\.com/[a-z]+/[0-9]+).*$\n          repl: \\g\u003cmain_url\u003e\n      shorten: false\n      topic:\n        \"Daily calendar\": \\b(?i:economic\\ calendar)\\b\n        \"Daily prep\": '^Wall\\ Street\\ Breakfast:\\ '\n        \"Hourly status\": ^On\\ the\\ hour$\n      url:\n        - https://seekingalpha.com/market_currents.xml\n        - https://seekingalpha.com/feed.xml\n        - https://seekingalpha.com/tag/etf-portfolio-strategy.xml\n        - https://seekingalpha.com/tag/wall-st-breakfast.xml\n    SSRN:\n      url: https://papers.ssrn.com/sol3/Jeljour_results.cfm?form_name=journalBrowse\u0026journal_id=3526423\u0026Network=no\u0026lim=false\u0026npage=1\n      hext:\n        select: \u003ca href:link href^=\"https://ssrn.com/abstract=\" @text:title /\u003e\n        follow: \u003ca class=\"jeljour_pagination_number\" @text:prepend(\"https://papers.ssrn.com/sol3/Jeljour_results.cfm?form_name=journalBrowse\u0026journal_id=3526423\u0026Network=no\u0026lim=false\u0026npage=\"):url/\u003e\n      period: 6\n    TalkRL:\n      url: https://www.talkrl.com/feed\n      period: 8\n      message:\n        title: false\n        summary: true\n    YT:3Blue1Brown: \u0026YT\n      url: https://www.youtube.com/feeds/videos.xml?channel_id=UCYO_jab_esuFRV4b17AJtAw\n      period: 12\n      shorten: false\n      style:\n        name:\n          bg: red\n          fg: white\n          bold: true\n      sub:\n        url:\n          pattern: ^https://www\\.youtube\\.com/watch\\?v=(?P\u003cid\u003e.+?)$\n          repl: https://youtu.be/\\g\u003cid\u003e\n    YT:AGI:\n      url: https://www.youtube.com/results?search_query=%22artificial+general+intelligence%22\u0026sp=CAISBBABGAI%253D\n      hext: \u003ca href:filter(\"/watch\\?v=(.+)\"):prepend(\"https://youtu.be/\"):link href^=\"/watch?v=\" title:title/\u003e\n      period: 12\n      shorten: false\n      alerts:\n        emptied: true\n      blacklist:\n        title:\n          - \\bWikipedia\\ audio\\ article\\b\n    YT:LexFridman:\n      \u003c\u003c: *YT\n      url: https://www.youtube.com/feeds/videos.xml?channel_id=UCSHZKyawb77ixDdsGog4iWA\n      whitelist:\n        title:\n          - \\bAGI\\b\n```\n\n#### Global settings\n\n##### Mandatory\n* **`host`**: IRC server address.\n* **`ssl_port`**: IRC server SSL port.\n* **`ssl_verify`**: If `false`, the TLS/SSL certificate is not verified. Its default is `true`.\n* **`nick`**: This is a registered IRC nick. If the nick is in use, it will be regained.\nEnsure that the email verification of the registered nick, as applicable to many IRC servers, is complete.\nWithout this email verification being completed, the bot can fail to receive the required event 900 and therefore fail to function.\n\n##### Recommended\n* **`admin`**: Administrative commands by this user pattern are accepted and executed.\nIts format is `nick!ident@host`. An example is `JDoe11!sid654321@gateway/web/irccloud.com/x-*`.\nA case-insensitive pattern match is tested for using [`fnmatch`](https://docs.python.org/3/library/fnmatch.html).\n* **`alerts_channel`**: Some but not all warning and error alerts are sent to this channel.\nIts default value is `##{nick}-alerts`. The key `{nick}`, if present in the value, is formatted with the actual nick.\nFor example, if the nick is `MyFeedBot`, alerts will by default be sent to `##MyFeedBot-alerts`.\nSince a channel name starts with #, the name if provided **must be quoted**.\nIt is recommended that the alerts channel be registered and monitored.\n* **`mode`**: This can for example be `+igR` for [Libera](https://libera.chat/guides/usermodes) \nand `+igpR` for [Rizon](https://wiki.rizon.net/index.php?title=User_Modes).\n  \n##### Optional\n* **`mirror`**: If specified as a channel name, all posts across all channels are mirrored to this channel.\nThis however doubles the time between consecutive posts in any given channel.\nMirroring can however individually be disabled for a feed by setting `\u003cfeed\u003e.mirror`.\n* **`publish.github`**: This is the username and repo name of a GitHub repo, e.g. [`feedarchive/libera-feedbot-live`](https://github.com/feedarchive/libera-feedbot-live).\nAll posts are published to the repo, thereby providing a basic option to archive them.\nA new CSV file is written to the repo for each posted feed having one or more new posts.\nThe following requirements apply:\n  * The repo must exist; it is not created by the bot. It is recommended that an empty new repo is used.\nIf the repo is of public interest, it can be requested to be moved into the [`feedarchive`](https://github.com/feedarchive) organization by filing an issue.\n  * The GitHub user must have access to write to the repo. It is recommended that a dedicated new service account be used, not your primary user account.\n  * A GitHub [personal access token](https://github.com/settings/tokens) is required with access to the entire `repo` scope.\nThe `repo` scope is used for making commits.\nThe token is provisioned for the bot via the `GITHUB_TOKEN` secret environment variable.\n\n##### Developer\n* **`log.irc`**: If `true`, low level IRC events are logged by `miniirc`. These are quite noisy. Its default is `false`.\n* **`once`**: If `true`, each feed is queued only once. It is for testing purposes. Its default is `false`.\n* **`tracemalloc`**: If `true`, memory allocation tracing is enabled. The top usage and positive-diff statistics are then logged hourly.\nIt is for diagnostic purposes. Its default is `false`.\n\n#### Feed-specific settings\nA feed is defined under a channel as in the sample configuration. The feed's key represents its name.\n\nThe order of execution of the interacting operations is:\n`redirect`, `blacklist`, `whitelist`, `https`, `www`, `emoji`, `sub`, `format`, `shorten`.\nRefer to the sample configuration for usage examples.\n\nYAML [anchors and references](https://en.wikipedia.org/wiki/YAML#Advanced_components) can be used to reuse nodes.\nExamples of this are in the sample.\n\n##### Mandatory\n* **`\u003cfeed\u003e.url`**: This is either a single URL or a list of URLs of the feed.\nIf a list, the URLs are read in sequence with an interval of one second between them.\n\n##### Optional\nThese are optional and are independent of each other:\n* **`\u003cfeed\u003e.alerts.empty`**: If `true`, an alert is sent if any source URL of the feed has no entries before their validation. \nIf `false`, such an alert is not sent. Its default value is `true`.\n* **`\u003cfeed\u003e.alerts.emptied`**: If `true`, an alert is sent if the feed has entries before but not after their validation.\nIf `false`, such an alert is not sent. Its default value is `false`.\n* **`\u003cfeed\u003e.alerts.read`**: If `true`, an alert is sent if an error occurs three or more consecutive times \nwhen reading or processing the feed, but no more than once every 15 minutes.\nIf `false`, such an alert is not sent. Its default value is `true`.\n* **`\u003cfeed\u003e.blacklist.category`**: This is an arbitrarily nested dictionary or list or their mix of regular \nexpression patterns that result in an entry being skipped if a \n[search](https://docs.python.org/3/library/re.html#re.search) finds any of the patterns in any of the categories of the \nentry.\nThe nesting permits lists to be creatively reused between feeds via YAML anchors and references.\n* **`\u003cfeed\u003e.blacklist.title`**: This is an arbitrarily nested dictionary or list or their mix of regular \nexpression patterns that result in an entry being skipped if a \n[search](https://docs.python.org/3/library/re.html#re.search) finds any of the patterns in the title.\nThe nesting permits lists to be creatively reused between feeds via YAML anchors and references.\n* **`\u003cfeed\u003e.blacklist.url`**: Similar to `\u003cfeed\u003e.blacklist.title`.\n* **`\u003cfeed\u003e.dedup`**: This indicates how to deduplicate posts for the feed, thereby preventing them from being \nreposted.\nThe default value is `feed` (per-feed per-channel), and an alternate possible value is `channel` (per-channel).\n* **`\u003cfeed\u003e.emoji`**: If `false`, emojis in entry titles are removed. Its default value is `null`.\n* **`\u003cfeed\u003e.group`**: If a string, this delays the processing of a feed that has just been read until all \nother feeds having the same group are also read.\nThis encourages multiple feeds having the same group to be be posted in succession, except if interrupted by\nconversation.\nIt is however possible that unrelated feeds of any channel gets posted between ones having the same group.\nTo explicitly specify the absence of a group when using a YAML reference, the value can be specified as `null`.\nIt is recommended that feeds in the same group have the same `period`.\n* **`\u003cfeed\u003e.https`**: If `true`, entry links that start with `http://` are changed to start with `https://` \ninstead. Its default value is `false`.\n* **`\u003cfeed\u003e.message.summary`**: If `true`, the entry summary (description) is included in its message.\nThe entry title, if included, is then formatted bold.\nThis is applied using IRC formatting if a `style` is defined for the feed, otherwise using unicode formatting.\nThe default value is `false`.\n* **`\u003cfeed\u003e.message.title`**: If `false`, the entry title is not included in its message.\nIts default value is `true`.\n* **`\u003cfeed\u003e.mirror`**: If `false`, mirroring is disabled for this feed. \n  Its default value is `true`, subject to the global-setting for mirroring.\n* **`\u003cfeed\u003e.new`**: This indicates up to how many entries of a new feed to post.\nA new feed is defined as one with no prior posts in its channel.\nThe default value is `some` which is interpreted as 3.\nThe default is intended to limit flooding a channel when one or more new feeds are added.\nA string value of `none` is interpreted as 0 and will skip all entries for a new feed.\nA value of `all` will skip no entries for a new feed; it is not recommended and should be used sparingly if at all.\nIn any case, future entries in the feed are not affected by this option on subsequent reads,\nand they are all forwarded without a limit.\n* **`\u003cfeed\u003e.order`**: If `reverse`, the order of the entries is reversed.\n* **`\u003cfeed\u003e.period`**: This indicates how frequently to read the feed in hours on an average.\nIts default value is 1.\nConservative polling is recommended. Any value below 0.2 is changed to a minimum of 0.2.\nNote that 0.2 hours is equal to 12 minutes.\nTo make service restarts safer by preventing excessive reads, the first read is delayed by half the period.\nTo better distribute the load of reading multiple feeds, a uniformly distributed random ±5% is applied to the period for\neach read.\n* **`\u003cfeed\u003e.redirect`**: This indicates whether to substitute each entry URL with its redirect target.\nThe default value is `false`.\n* **`\u003cfeed\u003e.shorten`**: This indicates whether to post shortened URLs for the feed.\nThe default value is `true`.\nThe alternative value `false` is recommended if the URL is naturally small, or if `sub` or `format` can be used to make\nit small.\nIf a \"Blacklisted long URL\" error is experienced for a reasonable website which should not be blacklisted, it can be reported [here](https://github.com/dagd/dagd/issues), using [this](https://github.com/dagd/dagd/issues/50) issue as an example.\n* **`\u003cfeed\u003e.style.name.bg`**: This is a string representing the name of a background color applied to the \nfeed's name.\nIt can be one of: white, black, blue, green, red, brown, purple, orange, yellow, lime, teal, aqua, royal, pink, grey,\nsilver. The channel modes must allow formatting for this option to be effective.\n* **`\u003cfeed\u003e.style.name.bold`**: If `true`, bold formatting is applied to the feed's name. \nIts default value is `false`.\nThe channel modes must allow formatting for this option to be effective.\n* **`\u003cfeed\u003e.style.name.fg`**: Foreground color similar to `\u003cfeed\u003e.style.name.bg`.\n* **`\u003cfeed\u003e.topic`**: This updates the channel topic with the short URL of a matching entry.\nIt requires auto-op (+O) to allow the topic to be updated.\nThe topic is divided into logical sections separated by ` | ` (`\u003cspace\u003e\u003cpipe\u003e\u003cspace\u003e`).\nFor any matching entry, only its matching section in the topic is updated.\nIts value can be a dictionary in which each key is a section name and each value is a regular expression pattern.\nIf a regular expression [search](https://docs.python.org/3/library/re.html#re.search) matches an entry's title,\nthe section in the topic is updated with the entry's short URL.\nThe topic's length is not checked.\n* **`\u003cfeed\u003e.whitelist.category`**: This is an arbitrarily nested dictionary or list or their mix of regular \nexpression patterns that result in an entry being skipped unless a \n[search](https://docs.python.org/3/library/re.html#re.search) finds any of the patterns in any of the categories of the \nentry.\nThe nesting permits lists to be creatively reused between feeds via YAML anchors and references.\n* **`\u003cfeed\u003e.whitelist.explain`**: This applies only to `\u003cfeed\u003e.whitelist.title`.\nIt can be useful for understanding which portion of a post's title matched the whitelist.\nIf `true`, the first match of each posted title is italicized.\nThis is applied using IRC formatting if a `style` is defined for the feed, otherwise using unicode formatting.\nFor example, \"This is a _matching sample_ title\".\nThe default value is `false`.\n* **`\u003cfeed\u003e.whitelist.title`**: This is an arbitrarily nested dictionary or list from which all leaf values are used.\nThe leaf values are regular expression patterns.\nThis result in an entry being skipped unless a [search](https://docs.python.org/3/library/re.html#re.search) finds any \nof the patterns in the title.\nThe nesting permits lists to be creatively reused between feeds via YAML anchors and references.\n* **`\u003cfeed\u003e.whitelist.url`**: Similar to `\u003cfeed\u003e.whitelist.title`.\n* **`\u003cfeed\u003e.www`**: If `false`, entry links that contain the `www.` prefix are changed to remove this prefix.\nIts default value is `null`.\n\n##### Parser\nFor a non-XML feed, one of the following non-default parsers can be used.\nMultiple parsers cannot be used for a feed.\nThe parsers are searched for in the alphabetical order listed below, and the first to be found is used.\nEach parsed entry must at a minimum return a `title`, a `link`, an optional `summary` (description),\nand zero or more values for `category`\nThe `title` can be a string or a list of strings.\n\n* **`\u003cfeed\u003e.hext`**: This is a string representing the [hext](https://hext.thomastrapp.com/documentation) DSL \nfor parsing a list of entry [dictionaries](https://en.wikipedia.org/wiki/Associative_array#Example) from an HTML web \npage. \nBefore using, it can be tested in the form [here](https://hext.thomastrapp.com/).\nNote that `max_searches` is set to 100_000 to protect against resource exhaustion.\n* **`\u003cfeed\u003e.jmespath`**: This is a string representing the [jmespath](http://jmespath.org/examples.html) DSL \nfor parsing a list of entry [dictionaries](https://en.wikipedia.org/wiki/Associative_array#Example) from JSON.\nBefore using, it can be tested in the form [here](http://jmespath.org/).\n* **`\u003cfeed\u003e.pandas`**: This is a string command evaluated using [pandas](https://pandas.pydata.org/) for \nparsing a dataframe of entries. The raw content is made available to the parser as a file-like object named `file`.\nThis parser uses [`eval`](https://docs.python.org/3/library/functions.html?#eval) which is unsafe, and so its\nuse must be confirmed to be safe.\nThe provisioned packages are `json`, `numpy` (as `np`), and `pandas` (as `pd`).\nThe value requires compatibility with the versions of `pandas` and `numpy` defined in \n[`requirements.txt`](requirements.txt), noting that these version requirements are expected to be routinely updated.\n\nFor recursive crawling, the value of a parser can alternatively be:\n* **`\u003cfeed\u003e.\u003cparser\u003e.select`**: This is the string which was hitherto documented as the value for \n`\u003cfeed\u003e.\u003cparser\u003e.`. The parser uses it to return the entries to post.\n* **`\u003cfeed\u003e.\u003cparser\u003e.follow`**: The is an optional string which the parser uses to return zero or more \nadditional URLs to read.\nThe returned URLs can a list of strings or a list of dictionaries with the key `url`.\nCrawling applies recursively to each returned URL. Each unique URL is read once.\nThere is an interval of at least one second between the end of a read and the start of the next read.\nCare should nevertheless be taken to avoid crawling a large number of URLs.\n\nSome sites require a custom user agent or other custom headers for successful scraping; such a customization can be\nrequested by creating an issue.\n\n##### Conditional\nThe sample configuration above contains examples of these:\n* **`\u003cfeed\u003e.format.re.title`**: This is a single regular expression pattern that is\n[searched](https://docs.python.org/3/library/re.html#re.search) for in the title.\nIt is used to collect named [key-value pairs](https://docs.python.org/3/library/re.html#re.Match.groupdict) from the\nmatch if there is one.\n* **`\u003cfeed\u003e.format.re.url`**: Similar to `\u003cfeed\u003e.format.re.title`.\n* **`\u003cfeed\u003e.format.str.title`**: The key-value pairs collected using `\u003cfeed\u003e.format.re.title` and \n`\u003cfeed\u003e.format.re.url`, \nboth of which are optional, are combined along with the default additions of `title`, `url`, `categories`, and `feed.url` as keys.\nAny additional keys returned by the parser are also available.\nThe key-value pairs are used to [format](https://docs.python.org/3/library/stdtypes.html#str.format_map) the provided\nquoted title string.\nIf the title formatting fails for any reason, a warning is logged, and the title remains unchanged.\nThe default value is `{title}`.\n* **`\u003cfeed\u003e.format.str.url`**: Similar to `\u003cfeed\u003e.format.str.title`. The default value is `{url}`.\nIf this is specified, it can sometimes be relevant to set `shorten` to `false` for the feed.\n* **`\u003cfeed\u003e.sub.summary.pattern`**: This is a single regular expression pattern that if found results in the \nentry summary being [substituted](https://docs.python.org/3/library/re.html#re.sub).\n* **`\u003cfeed\u003e.sub.summary.repl`**: If `\u003cfeed\u003e.sub.summary.pattern` is found, the entry summary is replaced \nwith this replacement, otherwise it is forwarded unchanged.\n* **`\u003cfeed\u003e.sub.title.pattern`**: Similar to `\u003cfeed\u003e.sub.summary.pattern`.\n* **`\u003cfeed\u003e.sub.title.repl`**: Similar to `\u003cfeed\u003e.sub.summary.repl`.\n* **`\u003cfeed\u003e.sub.url.pattern`**: Similar to `\u003cfeed\u003e.sub.summary.pattern`.\nIf a pattern is specified, it can sometimes be relevant to set `shorten` to `false` for the feed.\n* **`\u003cfeed\u003e.sub.url.repl`**: Similar to `\u003cfeed\u003e.sub.summary.repl`.\n\n#### Feed default settings\nA global default value can optionally be set under `defaults` for some feed-specific settings, \nnamely `new` and `shorten`.\nThis value overrides its internal default.\nIt facilitates not having to set the same value individually for many feeds.\n\nRefer to \"Feed-specific settings\" for the possible values and internal defaults of these settings.\nRefer to the embedded sample configuration for a usage example.\n\n### Commands\nCommands can be sent to the bot either as a private message or as a directed public message.\nPrivate messages may however be prohibited for security purposes using the `mode` configuration.\nPublic messages to the bot must be directed as `MyBotNick: my_command`.\n#### Administrative\nAdministrative commands are accepted from the configured `admin`. If `admin` is not configured, the commands are not processed.\nIt is expected but not required that administrative commands to the bot will typically be sent in the `alerts_channel`.\nThe supported commands are:\n* **`exit`**: Gracefully exit with code 0. The exit is delayed until any feeds that are currently being posted finish posting and being written to the database.\nIf running the bot as a Docker Compose service, using this command with `restart: on-failure` will (due to code 0) prevent the bot from automatically restarting.\nNote that a repeated invocation of this command has no effect.\n* **`fail`**: Similar to `exit` but with code 1.\nIf running the bot as a Docker Compose service, using this command with `restart: on-failure` will (due to a nonzero code) cause the bot to automatically be restarted.\n* **`quit`**: Alias of `exit`.\n\n## Deployment\n* As a reminder, it is recommended that the alerts channel be registered and monitored.\n\n* It is recommended that the bot be auto-voiced (+V) in each channel.\nFailing this, messages from the bot risk being silently dropped by the server.\nThis is despite the bot-enforced limit of two seconds per message across the server.\n\n* It is recommended that the bot be run as a Docker container using using Docker ≥18.09.2, possibly with\nDocker Compose ≥1.24.0.\nTo run the bot using Docker Compose, create or add to a version-controlled `docker-compose.yml` file such as:\n```yaml\nversion: '3.7'\nservices:\n  irc-rss-feed-bot:\n    container_name: irc-rss-feed-bot\n    image: ascensive/irc-rss-feed-bot:\u003cVERSION\u003e\n     network_mode: host  # Required often for da.gd URL shortener, also if having DNS name resolution issues.\n#    restart: on-failure\n    restart: always\n    logging:\n      options:\n        max-size: 2m\n        max-file: \"5\"\n    volumes:\n      - ./irc-rss-feed-bot:/config\n    env_file:\n      - ./irc-rss-feed-bot/secrets.env\n    environment:\n      TZ: America/New_York  # Select TZ database name from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones\n```\n\n* In the above service definition in `docker-compose.yml`:\n  * `image`: Use a specific\n  [versioned tag](https://hub.docker.com/r/ascensive/irc-rss-feed-bot/tags?ordering=last_updated), e.g. `0.12.0`.\n  * `volumes`: Customize the relative path to the previously created `config.yaml` file, e.g. `./irc-rss-feed-bot`.\n  This volume source directory must be writable by the container using the UID defined in the Dockerfile; it is 999.\n  A simple way to ensure it is writable is to run a command such as `chmod -R a+w ./irc-rss-feed-bot` once on the host.\n  * `env_file`: Customize the relative path to `secrets.env`.\n  * `environment`: Optionally customize the environment variable `TZ` to the preferred time zone \n  as represented by a [TZ database name](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List).\n  Note that the date and time are prefixed in each log message.\n\n* From the directory containing `docker-compose.yml`, run `docker-compose up -d irc-rss-feed-bot`.\nUse `docker logs -f irc-rss-feed-bot` to see and follow informational logs.\n\n## Maintenance\n### Service\nIt is recommended that the supported administrative commands be used together with Docker Compose or a comparable container service manager to shutdown or restart the service.\n### Config\n* If `config.yaml` is updated, the container must be restarted to use the updated file.\n* If `secrets.env` or the service definition in `docker-compose.yml` are updated, the container must be recreated\n(and not merely restarted) to use the updated file.\n### Database\n* A `posts.v2.db` database file is written by the bot in the same directory as `config.yaml`.\nThis database file must be preserved with routine backups. After restoring a backup, before starting the container,\nensure the database file is writable by running a command such as `chmod a+w ./irc-rss-feed-bot/posts.v2.db`.\n* The database file grows as new posts are made. For the most part this indefinite growth can be ignored.\nCurrently, the standard approach for handling this, if necessary, is to stop the bot and delete the\ndatabase file if it has grown unacceptably large.\nRestarting the bot after deleting the database will then create a new database file, and all configured feeds will be\nhandled as new.\nThis deletion is however discouraged as a routine measure.\n### Disk cache\n* An ephemeral directory `/app/.ircrssfeedbot_cache` is written by the bot in the container.\nIt contains one or more independent disk caches.\nThe size of each independent disk cache in this directory is limited to approximately 2 GiB.\nIf needed, this directory can optionally be mounted as an external volume.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpredicative%2Firc-rss-feed-bot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimpredicative%2Firc-rss-feed-bot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpredicative%2Firc-rss-feed-bot/lists"}