https://github.com/pyrmont/feedstock
A Ruby library for data extraction that can be used to make RSS feeds from webpages
https://github.com/pyrmont/feedstock
atom-feed data-scraping erb-template rss-feed ruby
Last synced: 11 months ago
JSON representation
A Ruby library for data extraction that can be used to make RSS feeds from webpages
- Host: GitHub
- URL: https://github.com/pyrmont/feedstock
- Owner: pyrmont
- License: unlicense
- Created: 2019-09-18T05:17:39.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2025-02-21T06:57:42.000Z (over 1 year ago)
- Last Synced: 2025-07-29T01:54:39.933Z (11 months ago)
- Topics: atom-feed, data-scraping, erb-template, rss-feed, ruby
- Language: Ruby
- Homepage:
- Size: 75.2 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Feedstock
[![Gem Version][gem-badge]][gem-link]
[gem-badge]: https://badge.fury.io/rb/feedstock.svg
[gem-link]: https://rubygems.org/gems/feedstock
Feedstock is a Ruby library for extracting information from an HTML/XML document
and inserting it into an ERB template. Its primary purpose is to create a feed
for a webpage that doesn't offer one.
## Rationale
I love RSS feeds.
That's why I think it's a shame not every website has a feed. However, even when
a website does have a feed, sometimes it doesn't include quite the mix
information that I want. I made Feedstock to solve those two problems.
Feedstock is a Ruby library that you can use to create an Atom or RSS feed. It
requires a URL to a document and a hash of rules. The rules tell Feedstock how
to extract and transform the data found on the webpage. That data is stuffed
into a hash and then run through an ERB template. Feedstock comes with a
template but you can use your own, too.
## Example
The [feeds.inqk.net repository][example] includes an example of how the
Feedstock library can be used in practice.
[example]: https://github.com/pyrmont/feeds.inqk.net/
"An example of using the Feedstock library"
## Installation
Feedstock is available as a gem:
```shell
$ gem install feedstock
```
## Usage
Feedstock extracts information from a document at a given _URL_ using a
collection of _rules_. The feed is generated by calling `Feedstock.feed` as
below:
```ruby
# Define the URL
url = "https://example.org"
# Define the rules
rules = { info: { id: url,
title: Feedstock::Extract.new(selector: "div.title"),
updated: Feedstock::Extract.new(selector: "span.date") },
entry: { id: Feedstock::Extract.new(selector: "a", content: { attribute: "href" }),
title: Feedstock::Extract.new(selector: "h2"),
updated: Feedstock::Extract.new(selector: "span.date"),
author: Feedstock::Extract.new(selector: "span.byline"),
link: Feedstock::Extract.new(selector: "a", content: { attribute: "href" }),
summary: Feedstock::Extract.new(selector: "div.summary") },
entries: Feedstock::Extract.new(selector: "div.story") }
# Using the default format and template
Feedstock.feed url, rules
# Using the XML format and a user-specified template
Feedstock.feed url, rules, :xml, "podcast.xml"
```
More information is available in [api.md].
[api.md]: https://github.com/pyrmont/feedstock/blob/master/api.md
## Bugs
Found a bug? I'd love to know about it. The best way is to report it in the
[Issues section][ghi] on GitHub.
[ghi]: https://github.com/pyrmont/feedstock/issues
## Versioning
Feedstock uses [Semantic Versioning 2.0.0][sv2].
[sv2]: http://semver.org/
## Licence
Feedstock is released into the public domain. See [LICENSE][] for more details.
[LICENSE]: https://github.com/pyrmont/feedstock/blob/master/LICENSE