https://github.com/conoro/flat-rss-example
Generate usable RSS feeds from web-pages using GitHub OCTO Flat Data
https://github.com/conoro/flat-rss-example
Last synced: 3 months ago
JSON representation
Generate usable RSS feeds from web-pages using GitHub OCTO Flat Data
- Host: GitHub
- URL: https://github.com/conoro/flat-rss-example
- Owner: conoro
- License: apache-2.0
- Created: 2021-07-08T11:00:59.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2022-01-19T04:33:18.000Z (over 4 years ago)
- Last Synced: 2025-02-06T09:48:27.781Z (over 1 year ago)
- Language: HTML
- Size: 20.8 MB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# flat-rss-example
Generate usable RSS feeds from web-pages using GitHub OCTO Flat Data
This is really just an example and is based on my other RSS feed generators which run on AWS Lambda.
To use for a different page, just set the URL in flat.yaml and then write your own scraping code in postprocess.ts. Luckily Deno is close enough to Node.js that Cheerio works almost out of the box.
Most news-style pages are generally easy to scrape. You're just looking for a repeated div to base your grabbing of each article's title/url/image/summary
In this example, the latest copy of the source HTML is saved to the repo (not necessary) as tokyo-marathon.html and then the generated RSS is saved to the repo as tokyo-marathon.xml. But to be accessible to Feed readers like [Feedly](https://feedly.com) you need to serve up the output as content-type application/xml. I do this by using [jsDelivr](https://www.jsdelivr.com/)
So the RSS feed for the [Tokyo Marathon news page](https://www.marathon.tokyo/en/news/) can be found at https://cdn.jsdelivr.net/gh/conoro/flat-rss-example/tokyo-marathon.xml
# TO-DO
* Let one repo do the scraping for many sites.
* Persist data over time for historical purposes
* Error handling
* More examples based on previous scrapers I've done like South China Morning Post
Copyright Conor O'Neill, 2021 (conor@conoroneill.com)
License Apache 2.0