{"id":17184458,"url":"https://github.com/randombit/grab-rss","last_synced_at":"2025-02-23T21:31:39.330Z","repository":{"id":4099737,"uuid":"5208334","full_name":"randombit/grab-rss","owner":"randombit","description":"RSS to email script","archived":true,"fork":false,"pushed_at":"2012-07-27T19:08:30.000Z","size":126,"stargazers_count":11,"open_issues_count":0,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-12T11:17:46.112Z","etag":null,"topics":["email","python","rss"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/randombit.png","metadata":{"files":{"readme":"readme.txt","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-07-27T19:05:13.000Z","updated_at":"2024-09-06T15:50:42.000Z","dependencies_parsed_at":"2022-09-12T17:41:00.313Z","dependency_job_id":null,"html_url":"https://github.com/randombit/grab-rss","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/randombit%2Fgrab-rss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/randombit%2Fgrab-rss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/randombit%2Fgrab-rss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/randombit%2Fgrab-rss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/randombit","download_url":"https://codeload.github.com/randombit/grab-rss/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240384371,"owners_count":19792970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["email","python","rss"],"created_at":"2024-10-15T00:43:21.430Z","updated_at":"2025-02-23T21:31:38.912Z","avatar_url":"https://github.com/randombit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\ngrab_rss is a simple RSS to email gateway; it downloads feeds (RSS,\nAtom, or anything else Universal Feed Parser understands), formats\nthem into plaintext emails, and sends them to you. The idea would be\nto run it once an hour or once a day or whatever in a cron job.\n\n* Why write this?\n\nI had been using Google Reader. Mutt is a better information organizer\nthan Reader, and since I run my own mail server there are fewer\nprivacy concerns (I expect at some point Google will begin analyzing\nwhat people look at in their Readers, what gets the most clicks, how\nlong you spend looking at particular items, etc so as to enhance the\nadvertising experience - assuming they don't do this already of\ncourse). I initially was going to replace it with a standalone GUI RSS\nreader, but the ones I evaluated for Linux were buggy, incomplete,\nvery slow or a combination of the three. So I ended up writing this\nbrand new buggy, incomplete and very slow program instead. Your own\nugly babies are always the cutest ones.\n\n* Why not just use rss2email?\n\nUsing pickle for the state is obnoxious - hard to read, edit, or\nsave/merge using version control. I strongly prefer plain text for\neverything that's even remotely important, because it lets me use\nexisting high quality diff/merge tools built into DVCSes to save my\ndata, and quality editors to modify them as necessary.\n\nIn short: die, binary data, die in a fire.\n\n(And then I realized it was far better to save the seen items state in\na sqlite db than a plain text file in case an exception is thrown and\nwe revert state. Hypocrisy rocks!)\n\n* What's there?\n\nAll the basic functionality works: it reads RSS feeds, sends you\nemails about them, remembers which ones it has already told you about.\nI'm using it as my only RSS source, and I'm happy with it.\n\n* What's missing?\n\nNo provision for HTML mail. Everything is converted to text/plain,\nstripping out everything except links. I don't like HTML mail.\n\nIf you want to read anything that's not written in mostly ASCII you're\nout of luck; all input text is forced to 7-bit ASCII.\n\nProbably other things that haven't even occured to me. Suggestions\nwelcome, patches better.\n\n* What do I need?\n\nCurrently you need Python 2.6 (2.4/2.5 might work, haven't tried) plus\nthe following dependencies:\n\nRequired:\n  feedparser      - http://www.feedparser.org\n  stripogram      - http://www.zope.org/Members/chrisw/StripOGram\n\nOptional:\n  dateutil.parser - http://labix.org/python-dateutil\n  multiprocessing - http://code.google.com/p/python-multiprocessing/\n\nThe multiprocessing module is included in Python 2.6\n\nPatches to reduce dependencies while improving or maintaining\nfunctionality happily accepted.\n\nAll the dependencies are included in Gentoo:\n\n  emerge dev-python/python-dateutil \\\n         dev-python/feedparser \\\n         dev-python/stripogram\n\nand in Debian:\n\n  apt-get install zope-stripogram \\\n                  python-dateutil \\\n                  python-feedparser\n\nand probably in most other reasonably sane Linux distros.\n\n* How do I use it?\n\ngrab_rss uses 3 files, which are located in either $GRAB_RSS_DIR or ~/.grab_rss:\n\n  - grab_rss.conf: A configuration file (see more about that below)\n\n  - feeds.txt: A list of feed locations, one per line. Like so:\n\n\"\"\"\nhttp://globalguerrillas.typepad.com/globalguerrillas/atom.xml\nhttp://randombit.net/bitbashing/index.atom\nhttp://taint.org/feed\n\"\"\"\n\n  - seen.db: A sqlite database listing already seen/sent posts\n    (you shouldn't need to ever look at this)\n\n* What do I put in grab_rss.conf?\n\nThe only required item is what email address you want the output sent to:\n\n\"\"\"\n[GrabRSS]\nto = user@example.com\n\"\"\"\n\nUsing the full set of options:\n\n\"\"\"\n[GrabRSS]\nto = user@example.com\nfrom = grab-rss@example.com\nsmtp_host = mail.example.net\nsocket_timeout = 10\npool_size = 4\nuser_agent = Lynx/2.8.7\n\"\"\"\n\nThe pool_size specifies how many processes to use for downloading\nfeeds (using the multiprocessing module, which means you have to have\nPython 2.6 or have installed it specially for this to work).\n\nRunning several downloads in parallel can substantially speed up how\nfast grab_rss runs (for my 60-something feeds, from just under a\nminute with 1 process to under 10 seconds with 8 procs; further\nincreases in pool size didn't decrease runtimes). The optimal size\nwill depend a lot on your local hardware and network as well as how\nmany feeds you are trying to get (and all of their hardware and\nnetworks). The default pool size is 0, which means don't use the\nmultiprocessing module at all. Play with the --pool-size option if you\nwant to experiment.\n\nThe default socket timeout is 30 seconds, which is probably fine\nunless you have a very wonky network or are running only a single\nprocess (in which case a single down server can hang your entire run\nby the full socket timeout - with multiple processes useful work will\nstill be happening while one process waits around for the timeout).\nBe warned this timeout applies to both pulling down the feeds and to\nthe SMTP timeout, though I haven't encountered any problems with that.\n\nThe SMTP host defaults to localhost which is probably the right thing\nto do at least a third of the time.\n\nNo provisions for SMTP authentication currently, might be useful but I\ndon't need it so I haven't written it.\n\n* How do I filter this?\n\nYou can filter by the sender (set in grab_rss.conf) and/or the\nexistence of the header X-GrabRSS-Feed. The value of X-GrabRSS-Feed is\nset to the URL of the feed that this post was from, if you want to\nfilter posts to different mboxes based on source.\n\nIn procmail speak:\n\n:0:\n* ^X-GrabRSS-Feed:\nRSSFeeds\n\n* Can I reuse this?\n\nSure. License is stock GPLv2. If you need it under some other license\nfor some reason, contact me.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frandombit%2Fgrab-rss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frandombit%2Fgrab-rss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frandombit%2Fgrab-rss/lists"}