An open API service indexing awesome lists of open source software.

https://github.com/pyrmont/feedstock

A Ruby library for data extraction that can be used to make RSS feeds from webpages
https://github.com/pyrmont/feedstock

atom-feed data-scraping erb-template rss-feed ruby

Last synced: 11 months ago
JSON representation

A Ruby library for data extraction that can be used to make RSS feeds from webpages

Awesome Lists containing this project

README

          

# Feedstock

[![Gem Version][gem-badge]][gem-link]

[gem-badge]: https://badge.fury.io/rb/feedstock.svg
[gem-link]: https://rubygems.org/gems/feedstock

Feedstock is a Ruby library for extracting information from an HTML/XML document
and inserting it into an ERB template. Its primary purpose is to create a feed
for a webpage that doesn't offer one.

## Rationale

I love RSS feeds.

That's why I think it's a shame not every website has a feed. However, even when
a website does have a feed, sometimes it doesn't include quite the mix
information that I want. I made Feedstock to solve those two problems.

Feedstock is a Ruby library that you can use to create an Atom or RSS feed. It
requires a URL to a document and a hash of rules. The rules tell Feedstock how
to extract and transform the data found on the webpage. That data is stuffed
into a hash and then run through an ERB template. Feedstock comes with a
template but you can use your own, too.

## Example

The [feeds.inqk.net repository][example] includes an example of how the
Feedstock library can be used in practice.

[example]: https://github.com/pyrmont/feeds.inqk.net/
"An example of using the Feedstock library"

## Installation

Feedstock is available as a gem:

```shell
$ gem install feedstock
```

## Usage

Feedstock extracts information from a document at a given _URL_ using a
collection of _rules_. The feed is generated by calling `Feedstock.feed` as
below:

```ruby
# Define the URL
url = "https://example.org"

# Define the rules
rules = { info: { id: url,
title: Feedstock::Extract.new(selector: "div.title"),
updated: Feedstock::Extract.new(selector: "span.date") },

entry: { id: Feedstock::Extract.new(selector: "a", content: { attribute: "href" }),
title: Feedstock::Extract.new(selector: "h2"),
updated: Feedstock::Extract.new(selector: "span.date"),
author: Feedstock::Extract.new(selector: "span.byline"),
link: Feedstock::Extract.new(selector: "a", content: { attribute: "href" }),
summary: Feedstock::Extract.new(selector: "div.summary") },

entries: Feedstock::Extract.new(selector: "div.story") }

# Using the default format and template
Feedstock.feed url, rules

# Using the XML format and a user-specified template
Feedstock.feed url, rules, :xml, "podcast.xml"
```

More information is available in [api.md].

[api.md]: https://github.com/pyrmont/feedstock/blob/master/api.md

## Bugs

Found a bug? I'd love to know about it. The best way is to report it in the
[Issues section][ghi] on GitHub.

[ghi]: https://github.com/pyrmont/feedstock/issues

## Versioning

Feedstock uses [Semantic Versioning 2.0.0][sv2].

[sv2]: http://semver.org/

## Licence

Feedstock is released into the public domain. See [LICENSE][] for more details.

[LICENSE]: https://github.com/pyrmont/feedstock/blob/master/LICENSE