https://github.com/vijinho/feedex

PHP CLI and REST tool to search URLs for RSS/ATOM feeds and extract them, optionally as JSON or OPML.
https://github.com/vijinho/feedex

atom-feed atom-feed-scraper cli json json-api opml opml-to-json php php-cli rest-api rss rss-feed rss-feed-parsing rss-feed-scraper rss-feeds webservice

Last synced: 2 months ago
JSON representation

PHP CLI and REST tool to search URLs for RSS/ATOM feeds and extract them, optionally as JSON or OPML.

Host: GitHub
URL: https://github.com/vijinho/feedex
Owner: vijinho
Created: 2018-10-17T04:34:01.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2018-10-19T16:53:51.000Z (about 7 years ago)
Last Synced: 2024-12-22T06:22:14.500Z (about 1 year ago)
Topics: atom-feed, atom-feed-scraper, cli, json, json-api, opml, opml-to-json, php, php-cli, rest-api, rss, rss-feed, rss-feed-parsing, rss-feed-scraper, rss-feeds, webservice
Language: PHP
Homepage:
Size: 141 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# feedex

PHP CLI and WWW tool to extract and save feeds from URL(s)

- Full-resolves URL before attempting to extract feed URLs
- Extracts the URLs of RSS (1.0 and 2.0) and ATOM feeds associated to a page, as well as OPML outline documents with [nicolus/picofeed](https://github.com/nicolus/picoFeed)
- Runs on the command-line
- Can be called as a stand-alone webservice using the php command line built-in server
- All messages when running with `--debug` or `--verbose` are to *stderr* to avoid interference with *stdout*
- Can output the result if successful to *stdout*
- Output file can be used as input when using options 'txt' and '[json](https://json.org/)', 'opml' and URLs can be re-checked with --force option.
- Can output found feeds as an [opml](http://dev.opml.org/) file.
- Can output data to markdown .md for easy sharing of podcast information (output-only --format)
- Errors are output in JSON as 'errors' with just a bunch of strings

```
{
"errors": [
"Unable to parse --date: next sunsaday"
]
}
```

## Installation

- `composer require nicolus/picofeed`
- `composer dump`

## Command-line options

```
Usage: php feedex.php
Extract and save feeds from URL(s)
(Specifying any other unknown argument options will be ignored.)

-h, --help Display this help and exit
-v, --verbose Run in verbose mode
-d, --debug Run in debug mode (implies also -v, --verbose)
-u, --url= (Required or -i) URL to check for feeds)
-i --input={filename} (Required or -u) Text file of URLs, one-per-line to read in and process.
-c, --clear (Optional) Clear-out URLs which have no feeds before writing output file.
-e, --echo (Optional) Echo/output the result to stdout if successful
-f --format={txt|json|php|opml|md} (Optional) Output format for screen and filename: txt (default)|json|php(serialized)|opml|markdown
--filename={output} (Optional) Filename for output data from operation
--force-check (Optional) Forcibly check URLs, even for those which already have feeds in the input file.
--skip-opml (Optional) Skip opml processing, just go to output, e.g. for generating markdown
```

## Example output Format

The script can take-in previous txt format output and re-use it as input, checking only urls where there are no existing feeds.

### txt

Only stand-alone URL lines without following feed lines are searched for feeds, unless `--force-check` is used which forces all URLs to be checked in the text file.

```
http://campaigntoabolishthebbc.blogspot.com/
http://campaigntoabolishthebbc.blogspot.com/feeds/posts/default
http://campaigntoabolishthebbc.blogspot.com/feeds/posts/default?alt=rss
https://www.blogger.com/feeds/7418166530762317285/posts/default

http://cash-is-cool.com/
http://cash-is-cool.com/rss/articles.php
http://cash-is-cool.com/rss/twitter.php

http://datastori.es/
http://datastori.es/comments/feed/
http://datastori.es/feed/
http://datastori.es/feed/m4a/
http://datastori.es/feed/mp3/
http://datastori.es/feed/podcast/

http://eurofolkradio.com/
http://eurofolkradio.com/comments/feed/
http://eurofolkradio.com/feed/
http://eurofolkradio.com/feed/podcast

http://grahamhancock.com/blog/
https://grahamhancock.com/blog/feed/
```

### json

```
{
"http:\/\/campaigntoabolishthebbc.blogspot.com\/": [
"http:\/\/campaigntoabolishthebbc.blogspot.com\/feeds\/posts\/default",
"http:\/\/campaigntoabolishthebbc.blogspot.com\/feeds\/posts\/default?alt=rss",
"https:\/\/www.blogger.com\/feeds\/7418166530762317285\/posts\/default"
],
"http:\/\/cash-is-cool.com\/": [
"http:\/\/cash-is-cool.com\/rss\/articles.php",
"http:\/\/cash-is-cool.com\/rss\/twitter.php"
],
"http:\/\/datastori.es\/": [
"http:\/\/datastori.es\/comments\/feed\/",
"http:\/\/datastori.es\/feed\/",
"http:\/\/datastori.es\/feed\/m4a\/",
"http:\/\/datastori.es\/feed\/mp3\/",
"http:\/\/datastori.es\/feed\/podcast\/"
],
"http:\/\/eurofolkradio.com\/": [
"http:\/\/eurofolkradio.com\/comments\/feed\/",
"http:\/\/eurofolkradio.com\/feed\/",
"http:\/\/eurofolkradio.com\/feed\/podcast"
],
"http:\/\/grahamhancock.com\/blog\/": [
"https:\/\/grahamhancock.com\/blog\/feed\/"
],
"http:\/\/greatgameindia.com\/": [
"http:\/\/greatgameindia.com\/comments\/feed\/",
"http:\/\/greatgameindia.com\/feed\/",
"http:\/\/greatgameindia.com\/homepage\/feed\/"
],
}
```

### OPML file to Markdown

`php feedex.php --input=audio.opml -e -fmd --filename=podcasts.md`

Example output, slightly modified: [urunu.com/blog/2018-10-18-podcasts](http://www.urunu.com/blog/2018-10-18-podcasts)

# Examples

Find feeds for URL http://example.com/blog/public outputting as 'txt'.

`php feedex.php --url=http://example.com/blog/public --echo --format=txt --debug`

output:

```
http://example.com/blog/public
http://example.com/feed/
http://example.com/comments/feed/
```

output as json, saving to a filename 'urls.json' and debugging enabled:

`php feedex.php --url=http://example.com/blog/public --filename=urls.json --format=json`

output with full-debugging:

```
[D 1/1] OPTIONS:
Array
(
[debug] => 1
[echo] => 1
[help] => 0
[input] => 0
[url] => 1
[verbose] => 1
)
[V 1/1] OUTPUT_FORMAT: txt
[D 1/1] Using dir: /Users/vijay/tmp
[D 1/1] Using input filename:
[D 1/1] Checking URL:
http://example.com/blog/public
[D 1/2] Feeds found for URL:
http://example.com/blog/public/
Array
(
[0] => http://example.com/feed/
[1] => http://example.com/comments/feed/
)
http://example.com/blog/public
http://example.com/feed/
http://example.com/comments/feed/
[D 1/2] Memory used (1/2) MB (current/peak).
```

## Example of reading in file of URLs

Read in file of URLs and process, outputting txt, piping screen output (stderr and stdout) to less

`php feedex.php --input=urls.txt --filename=results.txt --verbose --echo --clear 2>&1 | less`

```
[V] OUTPUT_FORMAT: txt
[V] Found 1 valid URL(s) in input file:
urls.txt
Array
(
[0] => http://example.com/blog/public/

)
http://example.com/blog/public/
http://example.com/feed/
http://example.com/comments/feed/
```

## Example of creating OPML file from extracted subscriptions

Reads in 'urls.txt', extracts feeds and writes to 'results.opml', debug mode piping all output including *stderr* to *stdout* into *less* viewer

`php feedex.php --input=urls.txt --filename=results.opml --echo --format=opml --debug 2>&1 | less`

Contents of 'results.opml':

```

FeedEx

```

## Running as a webservice

### Starting the service

1. Start the PHP webserver with `php -S 127.0.0.1:12312`
2. Browse the URL: http://127.0.0.1:12312/feedex.php with GET/POST parameters

Accepted request input parameters: 'url=', 'format='

### Webservice Example

This will only check a single URL.

`http://127.0.0.1:12312/feedex.php?url=http://example.com/blog/public&format=json`

Result:

```
{
"http:\/\/example.com\/blog\/public": [
"http:\/\/example.com\/feed\/",
"http:\/\/example.com\/comments\/feed\/"
]
}
```

Using `format=text`

`http://127.0.0.1:12312/feedex.php?url=http://example.com/blog/public&format=txt`

```
http://example.com.com/
http://example.com.com/comments/feed/
http://example.com.com/feed/
http://example.com.com/home/feed/
```

----
vijay@yoyo.org

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vijinho/feedex

Awesome Lists containing this project

README