{"id":13411940,"url":"https://github.com/html2rss/html2rss-web","last_synced_at":"2025-03-14T17:31:19.994Z","repository":{"id":37892906,"uuid":"135929870","full_name":"html2rss/html2rss-web","owner":"html2rss","description":"🕸 Generates and delivers RSS feeds via HTTP. Docker image available! Create your own feeds or get started quickly with the included configs.","archived":false,"fork":false,"pushed_at":"2024-05-21T19:20:38.000Z","size":611,"stargazers_count":79,"open_issues_count":2,"forks_count":11,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-05-22T00:57:59.882Z","etag":null,"topics":["builder","docker","feed","feed-configs","html2rss","html2rss-configs","roda","rolling-release","rss","rss-aggregator","rss-feed","rss-feed-scraper","ruby","scraper","serves","webfeed","webfeeds","website-scraper"],"latest_commit_sha":null,"homepage":"https://html2rss.github.io/components/html2rss-web","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/html2rss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"gildesmarais"}},"created_at":"2018-06-03T18:28:51.000Z","updated_at":"2024-06-13T04:12:25.498Z","dependencies_parsed_at":"2023-10-04T21:15:48.140Z","dependency_job_id":"0dc04f50-8237-44a3-86bf-aac772409b99","html_url":"https://github.com/html2rss/html2rss-web","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html2rss%2Fhtml2rss-web","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html2rss%2Fhtml2rss-web/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html2rss%2Fhtml2rss-web/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html2rss%2Fhtml2rss-web/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/html2rss","download_url":"https://codeload.github.com/html2rss/html2rss-web/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243618701,"owners_count":20320279,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["builder","docker","feed","feed-configs","html2rss","html2rss-configs","roda","rolling-release","rss","rss-aggregator","rss-feed","rss-feed-scraper","ruby","scraper","serves","webfeed","webfeeds","website-scraper"],"created_at":"2024-07-30T20:01:18.917Z","updated_at":"2025-03-14T17:31:19.983Z","avatar_url":"https://github.com/html2rss.png","language":"Ruby","readme":"![html2rss logo](https://github.com/html2rss/html2rss/raw/master/support/logo.png)\n\n# html2rss-web\n\nThis web application scrapes websites to build and deliver RSS 2.0 feeds.\n\n**Features:**\n\n- Provides stable URLs for feeds generated by automatic sourcing.\n- [Create your custom feeds](#how-to-build-your-rss-feeds)!\n- Comes with plenty of [included configs](https://github.com/html2rss/html2rss-configs) out of the box.\n- Handles request caching.\n- Sets caching-related HTTP headers.\n\nThe functionality of scraping websites and building the RSS feeds is provided by the Ruby gem [`html2rss`](https://github.com/html2rss/html2rss).\n\n## Get started\n\nThis application should be used with Docker. It is designed to require as little maintenance as possible. See [Versioning and Releases](#versioning-and-releases) and [consider automatic updates](#docker-automatically-keep-the-html2rss-web-image-up-to-date).\n\n### With Docker\n\n```sh\ndocker run -p 3000:3000 gilcreator/html2rss-web\n```\n\nThen open \u003chttp://127.0.0.1:3000/\u003e in your browser and click the example feed link.\n\nThis is the quickest way to get started. However, it's also the option with the least flexibility: it doesn't allow you to use custom feed configs and doesn't update automatically.\n\nIf you want more flexibility and automatic updates sound good to you, read on to get started _with docker compose_…\n\n### With `docker compose`\n\nCreate a `docker-compose.yml` file and paste the following into it:\n\n```yaml\nservices:\n  html2rss-web:\n    image: gilcreator/html2rss-web\n    ports:\n      - \"3000:3000\"\n    volumes:\n      - type: bind\n        source: ./feeds.yml\n        target: /app/config/feeds.yml\n        read_only: true\n    environment:\n      RACK_ENV: production\n      HEALTH_CHECK_USERNAME: health\n      HEALTH_CHECK_PASSWORD: please-set-YOUR-OWN-veeeeeery-l0ng-aNd-h4rd-to-gue55-Passw0rd!\n      # AUTO_SOURCE_ENABLED: 'true'\n      # AUTO_SOURCE_USERNAME: foobar\n      # AUTO_SOURCE_PASSWORD: A-Unique-And-Long-Password-For-Your-Own-Instance\n      ## to allow just requests originating from the local host\n      # AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:3000\n      ## to allow multiple origins, seperate those via comma:\n      # AUTO_SOURCE_ALLOWED_ORIGINS: example.com,h2r.host.tld\n      BROWSERLESS_IO_WEBSOCKET_URL: ws://browserless:3001\n      BROWSERLESS_IO_API_TOKEN: 6R0W53R135510\n\n  watchtower:\n    image: containrrr/watchtower\n    volumes:\n      - /var/run/docker.sock:/var/run/docker.sock\n      - \"~/.docker/config.json:/config.json\"\n    command: --cleanup --interval 7200\n\n  browserless:\n    image: \"ghcr.io/browserless/chromium\"\n    ports:\n      - \"3001:3001\"\n    environment:\n      PORT: 3001\n      CONCURRENT: 10\n      TOKEN: 6R0W53R135510\n```\n\nStart it up with: `docker compose up`.\n\nIf you have not created your `feeds.yml` yet, download [this `feeds.yml` as a blueprint](https://raw.githubusercontent.com/html2rss/html2rss-web/master/config/feeds.yml) into the directory containing the `docker-compose.yml`.\n\n## Docker: Automatically keep the html2rss-web image up-to-date\n\nThe [watchtower](https://containrrr.dev/watchtower/) service automatically pulls running Docker images and checks for updates. If an update is available, it will automatically start the updated image with the same configuration as the running one. Please read its manual.\n\nThe `docker-compose.yml` above contains a service description for watchtower.\n\n## How to use automatic feed generation\n\n\u003e [!NOTE]\n\u003e This feature is disabled by default.\n\nTo enable the `auto_source` feature, comment in the env variables in the `docker-compose.yml` file from above and change the values accordingly:\n\n```yaml\nenvironment:\n  ## … snip ✁\n  AUTO_SOURCE_ENABLED: \"true\"\n  AUTO_SOURCE_USERNAME: foobar\n  AUTO_SOURCE_PASSWORD: A-Unique-And-Long-Password-For-Your-Own-Instance\n  ## to allow just requests originating from the local host\n  AUTO_SOURCE_ALLOWED_ORIGINS: 127.0.0.1:3000\n  ## to allow multiple origins, seperate those via comma:\n  # AUTO_SOURCE_ALLOWED_ORIGINS: example.com,h2r.host.tld\n  ## … snap ✃\n```\n\nRestart the container and open \u003chttp://127.0.0.1:3000/auto_source/\u003e.\nWhen asked, enter your username and password.\n\nThen enter the URL of a website and click on the _Generate_ button.\n\n## How to use the included configs\n\nhtml2rss-web comes with many feed configs out of the box. [See the file list of all configs.](https://github.com/html2rss/html2rss-configs/tree/master/lib/html2rss/configs)\n\nTo use a config from there, build the URL like this:\n\n|                          |                               |\n| ------------------------ | ----------------------------- |\n| `lib/html2rss/configs/`  | `domainname.tld/whatever.yml` |\n| Would become this URL:   |                               |\n| `http://localhost:3000/` | `domainname.tld/whatever.rss` |\n|                          | `^^^^^^^^^^^^^^^^^^^^^^^^^^^` |\n\n## How to build your RSS feeds\n\nTo build your own RSS feed, you need to create a _feed config_.\\\nThat _feed config_ goes into the file `feeds.yml`.\\\nCheck out the [`example` feed config](https://github.com/html2rss/html2rss-web/blob/master/config/feeds.yml#L9).\n\nPlease refer to [html2rss' README for a description of _the feed config and its options_](https://github.com/html2rss/html2rss#the-feed-config-and-its-options). html2rss-web is just a small web application that builds on html2rss.\n\n## Versioning and releases\n\nThis web application is distributed in a [rolling release](https://en.wikipedia.org/wiki/Rolling_release) fashion from the `master` branch.\n\nFor the latest commit passing GitHub CI/CD on the master branch, an updated Docker image will be pushed to [Docker Hub: `gilcreator/html2rss-web`](https://hub.docker.com/r/gilcreator/html2rss-web).\n\nGitHub's @dependabot is enabled for dependency updates and they are automatically merged to the `master` branch when the CI gives the green light.\n\nIf you use Docker, you should update to the latest image automatically by [setting up _watchtower_ as described](#get-started).\n\n## Use in production\n\nThis app is published on Docker Hub and therefore easy to use with Docker.\\\nThe above `docker-compose.yml` is a good starting point.\n\nIf you're going to host a public instance, _please, please, please_:\n\n- Put the application behind a reverse proxy.\n- Allow outside connections only via HTTPS.\n- Have an auto-update strategy (e.g., watchtower).\n- Monitor your `/health_check.txt` endpoint.\n- [Let the world know and add your instance to the wiki](https://github.com/html2rss/html2rss-web/wiki/Instances) -- thank you!\n\n### Supported ENV variables\n\n| Name                           | Description                        |\n| ------------------------------ | ---------------------------------- |\n| `BASE_URL`                     | default: '\u003chttp://localhost:3000\u003e' |\n| `LOG_LEVEL`                    | default: 'warn'                    |\n| `HEALTH_CHECK_USERNAME`        | default: auto-generated on start   |\n| `HEALTH_CHECK_PASSWORD`        | default: auto-generated on start   |\n|                                |                                    |\n| `AUTO_SOURCE_ENABLED`          | default: false                     |\n| `AUTO_SOURCE_USERNAME`         | no default.                        |\n| `AUTO_SOURCE_PASSWORD`         | no default.                        |\n| `AUTO_SOURCE_ALLOWED_ORIGINS`  | no default.                        |\n|                                |                                    |\n| `PORT`                         | default: 3000                      |\n| `RACK_ENV`                     | default: 'development'             |\n| `RACK_TIMEOUT_SERVICE_TIMEOUT` | default: 15                        |\n| `WEB_CONCURRENCY`              | default: 2                         |\n| `WEB_MAX_THREADS`              | default: 5                         |\n|                                |                                    |\n| `SENTRY_DSN`                   | no default.                        |\n\n### Runtime monitoring via `GET /health_check.txt`\n\nIt is recommended to set up monitoring of the `/health_check.txt` endpoint. With that, you can find out when one of _your own_ configs breaks. The endpoint uses HTTP Basic authentication.\n\nFirst, set the username and password via these environment variables: `HEALTH_CHECK_USERNAME` and `HEALTH_CHECK_PASSWORD`. If these are not set, html2rss-web will generate a new random username and password on _each_ start.\n\nAn authenticated `GET /health_check.txt` request will respond with:\n\n- If the feeds are generatable: `success`.\n- Otherwise: the names of the broken configs.\n\nTo get notified when one of your configs breaks, set up monitoring of this endpoint.\n\n[UptimeRobot's free plan](https://uptimerobot.com/) is sufficient for basic monitoring (every 5 minutes).\\\nCreate a monitor of type _Keyword_ with this information and make it aware of your username and password:\n\n![A screenshot showing the Keyword Monitor: a name, the instance's URL to /health_check.txt, and an interval.](docs/uptimerobot_monitor.jpg)\n\n### Application Performance Monitoring using Sentry\n\nWhen you specify `SENTRY_DSN` in your environment variables, the application will be setup to use Sentry.\n\n## Setup for development\n\nCheck out the git repository and…\n\n### Using Docker\n\nThis approach allows you to experiment without installing Ruby on your machine.\nAll you need to do is install and run Docker.\n\n```sh\n# Build image from Dockerfile and name/tag it as html2rss-web:\ndocker build -t html2rss-web -f Dockerfile .\n\n# Run the image and name it html2rss-web-dev:\ndocker run \\\n  --detach \\\n  --mount type=bind,source=$(pwd)/config,target=/app/config \\\n  --name html2rss-web-dev \\\n  html2rss-web\n\n# Open an interactive TTY with the shell `sh`:\ndocker exec -ti html2rss-web-dev sh\n\n# Stop and clean up the container\ndocker stop html2rss-web-dev\ndocker rm html2rss-web-dev\n\n# Remove the image\ndocker rmi html2rss-web\n```\n\n### Using installed Ruby\n\nIf you're comfortable with installing Ruby directly on your machine, follow these instructions:\n\n1. Install Ruby `\u003e= 3.2`\n2. `gem install bundler foreman`\n3. `bundle`\n4. `foreman start`\n\n_html2rss-web_ now listens on port **3000** for requests.\n\n## Contribute\n\nContributions are welcome!\n\nOpen a pull request with your changes,\\\nopen an issue, or\\\n[join discussions on html2rss](https://github.com/orgs/html2rss/discussions).\n","funding_links":["https://github.com/sponsors/gildesmarais"],"categories":["Ruby"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhtml2rss%2Fhtml2rss-web","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhtml2rss%2Fhtml2rss-web","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhtml2rss%2Fhtml2rss-web/lists"}