{"id":18628566,"url":"https://github.com/sr-murthy/web-feeds-scraper","last_synced_at":"2025-11-04T02:30:31.949Z","repository":{"id":92607749,"uuid":"75238488","full_name":"sr-murthy/web-feeds-scraper","owner":"sr-murthy","description":"Command line client to scrape, tag and save RSS feed content to a local SQLite3 database.","archived":false,"fork":false,"pushed_at":"2017-12-31T19:11:16.000Z","size":34,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-27T06:43:56.303Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/sr-murthy/web-feeds-scraper","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sr-murthy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-12-01T00:07:21.000Z","updated_at":"2020-02-08T22:22:10.000Z","dependencies_parsed_at":"2023-06-18T12:23:26.705Z","dependency_job_id":null,"html_url":"https://github.com/sr-murthy/web-feeds-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sr-murthy%2Fweb-feeds-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sr-murthy%2Fweb-feeds-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sr-murthy%2Fweb-feeds-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sr-murthy%2Fweb-feeds-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sr-murthy","download_url":"https://codeload.github.com/sr-murthy/web-feeds-scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239425338,"owners_count":19636346,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T04:48:17.642Z","updated_at":"2025-02-18T06:44:24.249Z","avatar_url":"https://github.com/sr-murthy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Feeds Scraper\n\nA command-line client to scrape web feeds (initially RSS only, but extensible to supported feeds) and save their article HTML content (and other article attributes) to a local SQLite3 database (this could be replaced by any relational database with a suitable Python binding, with minimal change of code in the database module). The current version uses mocked tagging (mocked tagging of articles and saving dummy tag objects to the database) but this will be replaced by a fully functional tag extraction and save feature.\n\nThe database has the following simple schema:\n    \n    create table article (\n        uuid        text primary key not null,\n        feed_url    text not null,\n        url         text not null,\n        title       text,\n        description text,\n        pub_date    date,\n        image_url   text,\n        html        text\n    );\n\n    create table tag (\n        uuid         text primary key not null,\n        type         text not null,\n        tags         text not null,\n        feed_url     text not null,\n        article_uuid text not null,\n        foreign key(feed_url) references article(feed_url) on update cascade on delete cascade,\n        foreign key(article_uuid) references article(uuid) on update cascade on delete cascade\n    );\n\nUsage:\n\n    $ ./scraper.py\n\n    Welcome to the RSS feed scraper!\n\n    DB does not exist, creating DB feeds.db ... \n    creating DB schema \n\n    Enter a comma-separated list of RSS feed URLs to scrape (the scraper saves all articles in the feed to a local database), or type \"Q\" to exit.\n\n    \u003e\u003e http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml, http://feeds.bbci.co.uk/news/england/rss.xml?edition=uk, http://feeds.skynews.com/feeds/rss/uk.xml, http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/in_depth/uk/2001/uk_and_the_euro/rss.xml, http://www.telegraph.co.uk/sport/rss.xml, https://www.theguardian.com/uk/rss\n\n\n    SCRAPER: getting article urls for feed http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml\n    SCRAPER: getting article urls for feed http://feeds.bbci.co.uk/news/england/rss.xml?edition=uk\n    SCRAPER: getting article urls for feed http://feeds.skynews.com/feeds/rss/uk.xml\n    SCRAPER: getting article urls for feed http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/in_depth/uk/2001/uk_and_the_euro/rss.xml\n    SCRAPER: getting article urls for feed http://www.telegraph.co.uk/sport/rss.xml\n    SCRAPER: getting article urls for feed https://www.theguardian.com/uk/rss\n\n    SCRAPER: 384 articles to be scraped from 6 RSS feeds.\n\n    SCRAPER: Scraped 384/384 articles from 6 RSS feeds in 13.553 seconds (@ 28.334 articles per second). 0 errors encountered.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsr-murthy%2Fweb-feeds-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsr-murthy%2Fweb-feeds-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsr-murthy%2Fweb-feeds-scraper/lists"}