Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gromnitsky/grepfeed

Filters out RSS/Atom feeds, returning articles that match a specified pattern. The output is another valid XML feed.
https://github.com/gromnitsky/grepfeed

grep rss xml

Last synced: about 2 months ago
JSON representation

Filters out RSS/Atom feeds, returning articles that match a specified pattern. The output is another valid XML feed.

Awesome Lists containing this project

README

        

Filters out RSS/Atom feeds, returning articles that match a specified
pattern. The output is another valid XML feed.

## What's included

* a cli util;
* a standalone http server that shares the same engine w/ the cli util.
* a web client that uses the included server as an intermediary and
acts as a gui version of the cli util.

## Requirements

* node >= 20

## Setup

$ npm i -g grepfeed
$ grepfeed-server

Open http://127.0.0.0:3000 in a browser.

## How it works

`lib/feed.js` contains all the code that parses & transforms xml
feeds. Its core is `Grep` class--a Transform stream:

readable_stream.pipe().pipe(writable_stream)

### cli

`cli/grepfeed.js` extends `Grep` to override several methods where
it's convenient to write the output in any format one wants. 3
interfaces are included: text-only (the default), json, xml. The
latter produces a valid rss 2.0 feed. E.g.

$ curl http://example.com/rss | cli/grepfeed.js apple -d=2016 -x

parses the input feed, selects only articles written in 2016 or newer
that match the regexp pattern `/apple/`. `-x` means xml output.

~~~
Usage: grepfeed.js [opt] [PATTERN] < xml

-e print only articles w/ enclosures
-n NUM number of articles to print
-x xml output
-j json output
-m print only meta
-V program version

Filter by:

-d [-]date[,date]
-c categories

Or/and search for a regexp PATTERN in each rss article & print the
matching ones. The internal order of the search: title, summary,
description, author.

-v invert match
~~~

### server

Acts as a proxy: downloads a requested feed & returns the filtered
xml. Query params match `cli/grepfeed.js` command line interface. To
start a server, run

$ make
$ server/index.js

(For a different host/port combination, use `HOST` & `PORT` env vars.)

This following example yields the same xml as in the `cli/grepfeed.js`
case, only does it through http:

$ curl '127.0.0.1:3000/api/?_=apple&d=2016&url=http%3A%2F%2Fexample.com%2Frss'

Notice `d` means `-d` in the `cli/grepfeed.js` example, `-x` doesn't make
sense here, `_` means the 1st command line arg, `apple` in this
case. The server doesn't invoke `cli/grepfeed.js` program; they both use
minimist to parse command options, thus the perceived similarity in
the behaviour.

### caveats

A URL you'd like to filter must be reachable from within the machine
`server/index.js` is running on. This could pose a security risk or be
inconvenient if you want to filter XML from *your* LAN. In the latter
case run `grepfeed-server` on your local machine.

## Bugs

* All html tags in article titles are removed, even if a title is in
plain text.
* This should've been written in Rust or something similar, as Node is
slow and memory hungry for this kind of tasks.

## See also

[itunesrss](https://github.com/gromnitsky/itunesrss),
[rss2mail](https://github.com/gromnitsky/rss2mail)

## License

MIT.