Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gromnitsky/grepfeed
Filters out RSS/Atom feeds, returning articles that match a specified pattern. The output is another valid XML feed.
https://github.com/gromnitsky/grepfeed
grep rss xml
Last synced: about 2 months ago
JSON representation
Filters out RSS/Atom feeds, returning articles that match a specified pattern. The output is another valid XML feed.
- Host: GitHub
- URL: https://github.com/gromnitsky/grepfeed
- Owner: gromnitsky
- Created: 2016-01-18T01:08:06.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2024-06-16T10:48:18.000Z (7 months ago)
- Last Synced: 2024-11-09T04:09:27.460Z (2 months ago)
- Topics: grep, rss, xml
- Language: JavaScript
- Homepage: https://grepfeed.sigwait.org/
- Size: 1.56 MB
- Stars: 8
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Filters out RSS/Atom feeds, returning articles that match a specified
pattern. The output is another valid XML feed.## What's included
* a cli util;
* a standalone http server that shares the same engine w/ the cli util.
* a web client that uses the included server as an intermediary and
acts as a gui version of the cli util.## Requirements
* node >= 20
## Setup
$ npm i -g grepfeed
$ grepfeed-serverOpen http://127.0.0.0:3000 in a browser.
## How it works
`lib/feed.js` contains all the code that parses & transforms xml
feeds. Its core is `Grep` class--a Transform stream:readable_stream.pipe().pipe(writable_stream)
### cli
`cli/grepfeed.js` extends `Grep` to override several methods where
it's convenient to write the output in any format one wants. 3
interfaces are included: text-only (the default), json, xml. The
latter produces a valid rss 2.0 feed. E.g.$ curl http://example.com/rss | cli/grepfeed.js apple -d=2016 -x
parses the input feed, selects only articles written in 2016 or newer
that match the regexp pattern `/apple/`. `-x` means xml output.~~~
Usage: grepfeed.js [opt] [PATTERN] < xml-e print only articles w/ enclosures
-n NUM number of articles to print
-x xml output
-j json output
-m print only meta
-V program versionFilter by:
-d [-]date[,date]
-c categoriesOr/and search for a regexp PATTERN in each rss article & print the
matching ones. The internal order of the search: title, summary,
description, author.-v invert match
~~~### server
Acts as a proxy: downloads a requested feed & returns the filtered
xml. Query params match `cli/grepfeed.js` command line interface. To
start a server, run$ make
$ server/index.js(For a different host/port combination, use `HOST` & `PORT` env vars.)
This following example yields the same xml as in the `cli/grepfeed.js`
case, only does it through http:$ curl '127.0.0.1:3000/api/?_=apple&d=2016&url=http%3A%2F%2Fexample.com%2Frss'
Notice `d` means `-d` in the `cli/grepfeed.js` example, `-x` doesn't make
sense here, `_` means the 1st command line arg, `apple` in this
case. The server doesn't invoke `cli/grepfeed.js` program; they both use
minimist to parse command options, thus the perceived similarity in
the behaviour.### caveats
A URL you'd like to filter must be reachable from within the machine
`server/index.js` is running on. This could pose a security risk or be
inconvenient if you want to filter XML from *your* LAN. In the latter
case run `grepfeed-server` on your local machine.## Bugs
* All html tags in article titles are removed, even if a title is in
plain text.
* This should've been written in Rust or something similar, as Node is
slow and memory hungry for this kind of tasks.## See also
[itunesrss](https://github.com/gromnitsky/itunesrss),
[rss2mail](https://github.com/gromnitsky/rss2mail)## License
MIT.