{"id":13411882,"url":"https://github.com/html2rss/html2rss","last_synced_at":"2025-03-14T17:31:12.656Z","repository":{"id":38357551,"uuid":"135059452","full_name":"html2rss/html2rss","owner":"html2rss","description":"📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.","archived":false,"fork":false,"pushed_at":"2024-10-08T10:31:46.000Z","size":1070,"stargazers_count":114,"open_issues_count":13,"forks_count":9,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-10-08T13:43:54.840Z","etag":null,"topics":["atom-feed","extract","feed","feed-configs","html","html2rss","json","rss","rss-aggregator","rss-bridge","rss-builder","rss-feed","rss-feed-scraper","rss-generator","ruby","scrape","scraper","scraping","scraping-websites","yahoo-pipes"],"latest_commit_sha":null,"homepage":"https://html2rss.github.io/components/html2rss","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/html2rss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"support/logo.png","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"gildesmarais"}},"created_at":"2018-05-27T15:37:40.000Z","updated_at":"2024-10-08T10:29:54.000Z","dependencies_parsed_at":"2024-04-30T17:24:25.345Z","dependency_job_id":"649af776-5da4-4fe8-ab6a-5889eba63c25","html_url":"https://github.com/html2rss/html2rss","commit_stats":{"total_commits":213,"total_committers":6,"mean_commits":35.5,"dds":0.5774647887323944,"last_synced_commit":"2caf3696998a88a5105b6a3325401faf2d0bbf8d"},"previous_names":["gildesmarais/html2rss"],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html2rss%2Fhtml2rss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html2rss%2Fhtml2rss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html2rss%2Fhtml2rss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html2rss%2Fhtml2rss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/html2rss","download_url":"https://codeload.github.com/html2rss/html2rss/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243618677,"owners_count":20320274,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atom-feed","extract","feed","feed-configs","html","html2rss","json","rss","rss-aggregator","rss-bridge","rss-builder","rss-feed","rss-feed-scraper","rss-generator","ruby","scrape","scraper","scraping","scraping-websites","yahoo-pipes"],"created_at":"2024-07-30T20:01:17.945Z","updated_at":"2025-03-14T17:31:12.648Z","avatar_url":"https://github.com/html2rss.png","language":"Ruby","readme":"![html2rss logo](https://github.com/html2rss/html2rss/raw/master/support/logo.png)\n\n[![Gem Version](https://badge.fury.io/rb/html2rss.svg)](http://rubygems.org/gems/html2rss/) [![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://www.rubydoc.info/gems/html2rss) ![Retro Badge: valid RSS](https://validator.w3.org/feed/images/valid-rss-rogers.png)\n\n`html2rss` is a Ruby gem that generates RSS 2.0 feeds from websites.\n\nIts `auto_source` scraper finds items for the RSS feed automatically. 🧙🏼\n\nAdditionally, you can use the `selectors` scraper and control the information extraction.\nIt takes plain old CSS selectors and extracts the information with help from\n[Extractors](#using-extractors) and chainable [post processors](#using-post-processors).\nIt supports [scraping JSON](#scraping-and-handling-json-responses) responses.\n\nTo scrape websites that require JavaScript, html2rss can request these using a headless browser (Puppeteer / browserless.io).\nIndependently of the used request strategy, you can [set HTTP request headers](#the-headers-set-any-http-request-header).\n\n|                |                |\n| -------------- | -------------- |\n| 🤩 Like it?    | Star it! ⭐️   |\n| 😍 Endorse it? | Sponsor it! 💓 |\n\n\u003e [!TIP]\n\u003e Want to retrieve your RSS feeds via HTTP?\n\u003e [Check out `html2rss-web`](https://github.com/html2rss/html2rss-web)!\n\n## Getting started\n\n[Install Ruby](https://www.ruby-lang.org/en/documentation/installation/) (latest version is recommended) on your machine and run `gem install html2rss` in your terminal.\n\nAfter the installation has finished, `html2rss help` will print usage information.\n\n### use automatic generation\n\nhtml2rss offers an automatic RSS generation feature. Try it on CLI with:\n\n`html2rss auto https://unmatchedstyle.com/`\n\n### creating a feed config file and using it\n\nIf the results are not to your satisfaction, you can create a feed config file.\n\nCreate a file called `my_config_file.yml` with this sample content:\n\n```yml\nchannel:\n  url: https://unmatchedstyle.com\nselectors:\n  items:\n    selector: \"article[id^='post-']\"\n    enhance: true\n# auto_source: {} # Enables auto_source additionally when uncommented\n```\n\nBuild the feed from this config with: `html2rss feed ./my_config_file.yml`.\n\n## The _feed config_ and its options\n\nHtml2rss is configured using `channel`, `selectors`, `strategy`, `headers`, `stylesheets` and `auto_source`.\nThe possible options of each are explained below.\n\nGood to know:\n\n- You'll find extensive example feed configs at [`spec/*.test.yml`](https://github.com/html2rss/html2rss/tree/master/spec).\n- See [`html2rss-configs`](https://github.com/html2rss/html2rss-configs) for ready-made feed configs!\n- If you've created feed configs, you're invited to send a PR to [`html2rss-configs`](https://github.com/html2rss/html2rss-configs) to make your config available to the public.\n\nAlright, let's dive in.\n\n### The `channel`\n\n| attribute     |              | type    | default        | remark                                             |\n| ------------- | ------------ | ------- | -------------- | -------------------------------------------------- |\n| `url`         | **required** | String  |                |                                                    |\n| `title`       | optional     | String  | auto-generated |                                                    |\n| `description` | optional     | String  | auto-generated | Retrieved from meta description tags               |\n| `author`      | optional     | String  | blank          | Format: `email (Name)`                             |\n| `ttl`         | optional     | Integer | auto-generated | Responses max-age, falls back to `360` (_minutes_) |\n| `language`    | optional     | String  | auto-generated | Determined by `lang` attribute                     |\n| `time_zone`   | optional     | String  | `'UTC'`        | TimeZone name                                      |\n\n### The scraper `auto_source`: automatically find the items\n\nThe `auto_source` scraper finds items automatically. To find them its scrapers search for:\n\n1. `schema`: parses `\u003cscript type=\"json/ld\"\u003e` tags which contain Schema.org objects like [Article](https://schema.org/Article).\n2. `semantic_html` looks for [semantic HTML tags](https://developer.mozilla.org/en-US/docs/Learn_web_development/Core/Accessibility/HTML)\n3. `html`: tries to find articles by selecting frequently occuring selectors.\n\nIt's a good idea to give `auto_source` a try, before starting to configure the `selectors` scraper.\n\nYou can fine-tune the scraper settings like this:\n\n```yml\nchannel:\n  url: https://example.com\nauto_source:\n  scraper:\n    schema:\n      enabled: false # default: true\n    semantic_html:\n      enabled: false # default: true\n    html:\n      enabled: true\n      minimum_selector_frequency: 3 # default: 2\n  cleanup:\n    keep_different_domain: false # default: true\n```\n\n### The scraper `selectors`: more control\n\n\u003e [!INFO]\n\u003e To build a [valid RSS 2.0 item](http://www.rssboard.org/rss-profile#element-channel-item), you need at least a `title` **or** a `description` in your item. You can, of course, have both.\n\nThe `selectors` scraper allows you to specify CSS selectors and by this giving you full control of extraction.\n\nYou must give an **`items`** selector hash, which contains the CSS selector. The items selector selects a collection of HTML tags from which the RSS feed items are built. Except for the `items` selector, all other keys are scoped to each item of the collection.\n\n**Having an `items` and a `title` selector is enough** to build a simple feed:\n\n```yml\nchannel:\n  url: \"https://example.com\"\nselectors:\n  items:\n    selector: \".article\"\n  title:\n    selector: \"h1\"\n```\n\n#### Automatically enhance items\n\nSpecifying the `title`, `url` or `image` selector in every config quickly becomes cumbersome.\nhtml2rss enhances every item automatically.\nHowever, if you specify a selector, its value will be used.\n\n```yml\nchannel:\n  url: \"https://example.com\"\nselectors:\n  items:\n    selector: \".article\"\n    enhance: true # default: true\n```\n\n#### Selectors which will be included in the the RSS feed\n\nYour `selectors` hash can contain arbitrary named selectors, but only a few will make it into the RSS feed (due to the RSS 2.0 specification):\n\n| RSS 2.0 tag   | name in `html2rss` | remark                                    |\n| ------------- | ------------------ | ----------------------------------------- |\n| `title`       | `title`            |                                           |\n| `description` | `description`      | Will be sanitized when contains HTML      |\n| `link`        | `url`              | A URL.                                    |\n| `author`      | `author`           |                                           |\n| `category`    | `categories`       | See notes below.                          |\n| `guid`        | `guid`             | Generated automatically. See notes below. |\n| `enclosure`   | `enclosure`        | See notes below.                          |\n| `pubDate`     | `published_at`     | An instance of `Time`.                    |\n| `comments`    | `comments`         | A URL.                                    |\n| `source`      | ~~source~~         | Not yet supported.                        |\n\n#### A selector and its Options\n\nEvery named selector (i.e. `title`, `description`, see above) in your `selectors` can have these attributes:\n\n| name           | value                                                    |\n| -------------- | -------------------------------------------------------- |\n| `selector`     | The CSS selector to select the tag with the information. |\n| `extractor`    | Name of the extractor. See notes below.                  |\n| `post_process` | An array. See notes below.                               |\n\n##### Using extractors\n\nExtractors help with extracting the information from the selected HTML tag.\n\n- The default extractor is `text`, which returns the tag's inner text.\n- The `html` extractor returns the tag's outer HTML.\n- The `href` extractor returns a URL from the tag's `href` attribute and corrects relative ones to absolute ones.\n- The `attribute` extractor returns the value of that tag's attribute.\n- The `static` extractor returns the configured static value (it doesn't extract anything).\n- [See file list of extractors](https://github.com/html2rss/html2rss/tree/master/lib/html2rss/selectors/extractors).\n\nExtractors might need extra attributes on the selector hash. 👉 [Read their docs for usage examples](https://www.rubydoc.info/gems/html2rss/Html2rss/Selectors/Extractors).\n\n\u003cdetails\u003e\u003csummary\u003eSee a Ruby example\u003c/summary\u003e\n\n```ruby\nHtml2rss.feed(\n  channel: {},\n  selectors: {\n    url: { selector: 'a', extractor: 'href' }\n  }\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003eSee a YAML feed config example\u003c/summary\u003e\n\n```yml\nchannel:\n  # ... omitted\nselectors:\n  # ... omitted\n  url:\n    selector: \"a\"\n    extractor: \"href\"\n```\n\n\u003c/details\u003e\n\n##### Using post processors\n\nExtracted information can be further manipulated with post processors.\nYou can specify one or more post processors and they'll process in that order.\n\n| name               |                                                                                       |\n| ------------------ | ------------------------------------------------------------------------------------- |\n| `gsub`             | Allows global substitution operations on Strings (Regexp or simple pattern).          |\n| `html_to_markdown` | HTML to Markdown, using [reverse_markdown](https://github.com/xijo/reverse_markdown). |\n| `markdown_to_html` | converts Markdown to HTML, using [kramdown](https://github.com/gettalong/kramdown).   |\n| `parse_time`       | Parses a String containing a time in a time zone.                                     |\n| `parse_uri`        | Parses a String as URL.                                                               |\n| `sanitize_html`    | Strips unsafe and uneeded HTML and adds security related attributes.                  |\n| `substring`        | Cuts a part off of a String, starting at a position.                                  |\n| `template`         | Based on a template, it creates a new String filled with other selectors values.      |\n\n⚠️ Always make use of the `sanitize_html` post processor for HTML content. _Never trust the internet!_ ⚠️\n\nIf the `description` contains HTML, it will be sanitized automatically.\n\n\u003cdetails\u003e\u003csummary\u003eYAML example: build the description from a template String (in Markdown) and convert that Markdown to HTML\u003c/summary\u003e\n\n```yml\nchannel:\n  # ... omitted\nselectors:\n  # ... omitted\n  price:\n    selector: '.price'\n  description:\n    selector: '.section'\n    post_process:\n      - name: template\n        string: |\n          # %{self}\n\n          Price: %{price}\n      - name: markdown_to_html\n```\n\n\u003c/details\u003e\n\n###### Post processor `gsub`\n\nThe post processor `gsub` makes use of Ruby's [`gsub`](https://apidock.com/ruby/String/gsub) method.\n\n| key           | type   | required | note                     |\n| ------------- | ------ | -------- | ------------------------ |\n| `pattern`     | String | yes      | Can be Regexp or String. |\n| `replacement` | String | yes      | Can be a backreference.  |\n\n\u003cdetails\u003e\u003csummary\u003eSee a Ruby example\u003c/summary\u003e\n\n```ruby\nHtml2rss.feed(\n  channel: {},\n  selectors: {\n    title: { selector: 'a', post_process: [{ name: 'gsub', pattern: 'foo', replacement: 'bar' }] }\n  }\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003eSee a YAML feed config example\u003c/summary\u003e\n\n```yml\nchannel:\n  # ... omitted\nselectors:\n  # ... omitted\n  title:\n    selector: \"a\"\n    post_process:\n      - name: \"gsub\"\n        pattern: \"foo\"\n        replacement: \"bar\"\n```\n\n\u003c/details\u003e\n\n##### Adding `\u003ccategory\u003e` tags to an item\n\nThe `categories` selector takes an array of selector names. Each value of those\nselectors will become a `\u003ccategory\u003e` on the RSS item.\n\n\u003cdetails\u003e\n  \u003csummary\u003eSee a Ruby example\u003c/summary\u003e\n\n```ruby\nHtml2rss.feed(\n  channel: {},\n  selectors: {\n    genre: {\n      # ... omitted\n      selector: '.genre'\n    },\n    branch: { selector: '.branch' },\n    categories: %i[genre branch]\n  }\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eSee a YAML feed config example\u003c/summary\u003e\n\n```yml\nchannel:\n  # ... omitted\nselectors:\n  # ... omitted\n  genre:\n    selector: \".genre\"\n  branch:\n    selector: \".branch\"\n  categories:\n    - genre\n    - branch\n```\n\n\u003c/details\u003e\n\n##### Custom item GUID\n\nBy default, html2rss generates a stable GUID automatically, based on the item's url, or ultimatively on `title` or `description`.\n\nIf this is not stable (i.e. your RSS reader shows already read articles as new/unread frequently),\nyou can choose from which attributes the GUID will be build.\nThe principle is the same as for the categories: pass an array of selectors names.\n\n\u003cdetails\u003e\u003csummary\u003eSee a Ruby example\u003c/summary\u003e\n\n```ruby\nHtml2rss.feed(\n  channel: {},\n  selectors: {\n    title: {\n      # ... omitted\n      selector: 'h1'\n    },\n    url: { selector: 'a', extractor: 'href' },\n    guid: %i[url]\n  }\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003eSee a YAML feed config example\u003c/summary\u003e\n\n```yml\nchannel:\n  # ... omitted\nselectors:\n  # ... omitted\n  title:\n    selector: \"h1\"\n  url:\n    selector: \"a\"\n    extractor: \"href\"\n  guid:\n    - url\n```\n\nIn all cases, the GUID is eventually encoded as base-36 CRC32 checksum.\n\n\u003c/details\u003e\n\n##### Adding an `\u003cenclosure\u003e` tag to an item\n\nAn enclosure can be any file, e.g. a image, audio or video - think Podcast.\n\nThe `enclosure` selector needs to return a URL of the content to enclose. If the extracted URL is relative, it will be converted to an absolute one using the channel's URL as base.\n\nSince `html2rss` does no further inspection of the enclosure, its support comes with trade-offs:\n\n1. The content-type is guessed from the file extension of the URL, unless one is specified in `content_type`.\n2. If the content-type guessing fails, it will default to `application/octet-stream`.\n3. The content-length will always be undetermined and therefore stated as `0` bytes.\n\nRead the [RSS 2.0 spec](http://www.rssboard.org/rss-profile#element-channel-item-enclosure) for further information on enclosing content.\n\n\u003cdetails\u003e\n  \u003csummary\u003eSee a Ruby example\u003c/summary\u003e\n\n```ruby\nHtml2rss.feed(\n  channel: {},\n  selectors: {\n    enclosure: {\n      selector: 'audio',\n      extractor: 'attribute',\n      attribute: 'src',\n      content_type: 'audio/mp3'\n    }\n  }\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eSee a YAML feed config example\u003c/summary\u003e\n\n```yml\nchannel:\n  # ... omitted\nselectors:\n  # ... omitted\n  enclosure:\n    selector: \"audio\"\n    extractor: \"attribute\"\n    attribute: \"src\"\n    content_type: \"audio/mp3\"\n```\n\n\u003c/details\u003e\n\nSee the more complex formatting options of the [`sprintf` method](https://ruby-doc.org/core/Kernel.html#method-i-sprintf).\n\n#### Scraping and handling JSON responses\n\nWhen the requested website returns a application/json content-typed response (i.e. you `Accept: application/json` header in the request), the selectors scraper converts that JSON to XML naiively. That XML you can query using CSS selectors.\n\n\u003e [!NOTE]\n\u003e The JSON response must be an Array or Hash for this to work.\n\n\u003cdetails\u003e\u003csummary\u003eSee example of a converted JSON object\u003c/summary\u003e\n\nThis JSON object:\n\n```json\n{\n  \"data\": [{ \"title\": \"Headline\", \"url\": \"https://example.com\" }]\n}\n```\n\nconverts to:\n\n```xml\n\u003cobject\u003e\n  \u003cdata\u003e\n    \u003carray\u003e\n      \u003cobject\u003e\n        \u003ctitle\u003eHeadline\u003c/title\u003e\n        \u003curl\u003ehttps://example.com\u003c/url\u003e\n      \u003c/object\u003e\n    \u003c/array\u003e\n  \u003c/data\u003e\n\u003c/object\u003e\n```\n\nYour items selector would be `array \u003e object`, the item's URL selector would be `url`.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eSee example of a converted JSON array\u003c/summary\u003e\n\nThis JSON array:\n\n```json\n[{ \"title\": \"Headline\", \"url\": \"https://example.com\" }]\n```\n\nconverts to:\n\n```xml\n\u003carray\u003e\n  \u003cobject\u003e\n    \u003ctitle\u003eHeadline\u003c/title\u003e\n    \u003curl\u003ehttps://example.com\u003c/url\u003e\n  \u003c/object\u003e\n\u003c/array\u003e\n```\n\nYour items selector would be `array \u003e object`, the item's URL selector would be `url`.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003eSee a Ruby example\u003c/summary\u003e\n\n```ruby\nHtml2rss.feed(\n  headers: {\n    Accept: 'application/json'\n  },\n  channel: {\n    url: 'http://domainname.tld/whatever.json'\n  },\n  selectors: {\n    title: { selector: 'foo' }\n  }\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003eSee a YAML feed config example\u003c/summary\u003e\n\n```yml\nchannel:\n  url: \"http://domainname.tld/whatever.json\"\n  headers:\n    Accept: application/json\nselectors:\n  title:\n    selector: \"foo\"\n```\n\n\u003c/details\u003e\n\n### The `strategy`: customization of how requests to the channel URL are sent\n\nBy default, html2rss issues a naiive HTTP request and extracts information from the response. That is performant and works for many websites. Under the hood, the [faraday gem](https://rubygems.org/gems/faraday) is used and gives the name to the default _strategy_: `faraday`.\n\nModern websites often do not render much HTML on the server, but evaluate JavaScript on the client to create the HTML. Because the default strategy does not execute any JavaScript, the faraday strategy will not find the \"juicy content\". For this scenario, try the browserless strategy.\n\nYou can write your custom strategy and make use of it. Consult the docs of `Html2rss::RequestService.register_strategy()`.\n\n#### `strategy: browserless`: Browserless.io\n\nYou can use _Browserless.io_ to run a headless Chrome browser and return the website's source code after the website generated it.\nFor this, you can either run your own Browserless.io instance (Docker image available -- [read their license](https://github.com/browserless/browserless/pkgs/container/chromium#licensing)!) or pay them for a hosted instance.\n\nTo run a local Browserless.io instance, you can use the following Docker command:\n\n```sh\ndocker run \\\n  --rm \\\n  -p 3000:3000 \\\n  -e \"CONCURRENT=10\" \\\n  -e \"TOKEN=6R0W53R135510\" \\\n  ghcr.io/browserless/chromium\n```\n\nTo make html2rss use your instance, specify the `browserless` strategy.\n\n```sh\n# auto:\nBROWSERLESS_IO_WEBSOCKET_URL=\"ws://127.0.0.1:3000\" BROWSERLESS_IO_API_TOKEN=\"6R0W53R135510\" \\\n  html2rss auto --strategy=browserless https://example.com\n\n# feed:\nBROWSERLESS_IO_WEBSOCKET_URL=\"ws://127.0.0.1:3000\" BROWSERLESS_IO_API_TOKEN=\"6R0W53R135510\" \\\n  html2rss feed --strategy=browserless the_the_config.yml\n```\n\n\u003e [!TIP]\n\u003e When running locally with commands from above, you can skip setting the environment variables, as they are aligned with the default values from above example.\n\nIn your config, set `strategy: browserless`.\n\n\u003cdetails\u003e\u003csummary\u003eSee a YAML feed config example\u003c/summary\u003e\n\n```yml\nstrategy: browserless\nheaders:\n  User-Agent: \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\"\nchannel:\n  url: https://www.imdb.com/user/ur67728460/ratings\n  ttl: 1440\nselectors:\n  items:\n    selector: \"li.ipc-metadata-list-summary-item\"\n  title:\n    selector: \".ipc-title__text\"\n    post_process:\n      - name: gsub\n        pattern: \"/^(\\\\d+.)\\\\s/\"\n        replacement: \"\"\n      - name: template\n        string: \"%{self} rated with: %{user_rating}\"\n  url:\n    selector: \"a.ipc-title-link-wrapper\"\n    extractor: \"href\"\n  user_rating:\n    selector: \"[data-testid='ratingGroup--other-user-rating'] \u003e .ipc-rating-star--rating\"\n```\n\n\u003c/details\u003e\n\n### The `headers`: Set any HTTP request header\n\nTo set HTTP request headers, you can add them to `headers`. This is useful for i.e. APIs that require an `Authorization` header or you'd like to send `Accept: application/json`.\n\n```yml\nheaders:\n  Authorization: \"Bearer YOUR_TOKEN\"\n  Accept: application/json\nchannel:\n  url: \"https://example.com/api/resource\"\nselectors:\n  # ... omitted\n```\n\nOr for setting a User-Agent:\n\n```yml\nheaders:\n  User-Agent: \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\"\nchannel:\n  url: \"https://example.com\"\nselectors:\n  # ... omitted\nauto_source: {}\n```\n\n### Dynamic parameters in `channel` and `headers` attributes\n\nSometimes there are structurally similar pages with different URLs or you need to pass some values into the headers.\nIn such cases, you can add _dynamic parameters_ to the `channel` and `headers` values.\n\nExample of an dynamic parameter `id` in the channel URL:\n\n```yml\nchannel:\n  url: \"http://domainname.tld/whatever/%\u003cid\u003es.html\"\nheaders:\n  X-Something: \"%\u003cfoo\u003es\"\n```\n\nCommand line usage example:\n\n```sh\nhtml2rss feed the_feed_config.yml --params id:42 foo:bar\n```\n\n\u003cdetails\u003e\u003csummary\u003eSee a Ruby example\u003c/summary\u003e\n\n```ruby\nHtml2rss.feed(channel: { url: 'http://domainname.tld/whatever/%\u003cid\u003es.html' },\n              headers: { 'X-Something': '%\u003cfoo\u003es' },\n              params: { id: 42, foo: 'bar' })\n```\n\n\u003c/details\u003e\n\n### The `stylesheets`: Display the RSS feed nicely in a web browser\n\nTo display RSS feeds nicely in a web browser, you can:\n\n- add a plain old CSS stylesheet, or\n- use XSLT (e**X**tensible **S**tylesheet **L**anguage **T**ransformations).\n\nA web browser will apply these stylesheets and show the contents as described.\n\nIn a CSS stylesheet, you'd use `element` selectors to apply styles.\n\nIf you want to do more, then you need to create a XSLT. XSLT allows you\nto use a HTML template and to freely design the information of the RSS,\nincluding using JavaScript and external resources.\n\nYou can add as many stylesheets and types as you like. Just add them to your global configuration.\n\n\u003cdetails\u003e\u003csummary\u003eRuby: a stylesheet config example\u003c/summary\u003e\n\n```ruby\nHtml2rss.feed(\n  stylesheets: [\n    {\n      href: '/relative/base/path/to/style.xls', media: :all, type: 'text/xsl'\n    },\n    {\n      href: 'http://example.com/rss.css', media: :all, type: 'text/css'\n    }\n  ],\n  channel: {},\n  selectors: {}\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003eYAML: a stylesheet config example\u003c/summary\u003e\n\n```yml\nstylesheets:\n  - href: \"/relative/base/path/to/style.xls\"\n    media: \"all\"\n    type: \"text/xsl\"\n  - href: \"http://example.com/rss.css\"\n    media: \"all\"\n    type: \"text/css\"\nfeeds:\n  # ... omitted\n```\n\n\u003c/details\u003e\n\nRecommended further readings:\n\n- [How to format RSS with CSS on lifewire.com](https://www.lifewire.com/how-to-format-rss-3469302)\n- [XSLT: Extensible Stylesheet Language Transformations on MDN](https://developer.mozilla.org/en-US/docs/Web/XSLT)\n- [The XSLT used by html2rss-web](https://github.com/html2rss/html2rss-web/blob/master/public/rss.xsl)\n\n## Store feed configuration in YAML file\n\nThis step is not required to work with this gem, but is helpful when you plan to use the CLI or [`html2rss-web`](https://github.com/html2rss/html2rss-web).\n\nFirst, create a YAML file, e.g. `feeds.yml`. This file will contain your multiple feed configs under the key `feeds`. Everything which you specify outside of this, will be applied to every feed you're building.\n\nExample:\n\n```yml\nheaders:\n  \"User-Agent\": \"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1\"\n  \"Accept\": \"text/html\"\nfeeds:\n  myfeed:\n    channel:\n    selectors:\n    auto_source:\n  myotherfeedwit:\n    headers:\n    strategy:\n    channel:\n    selectors:\n```\n\nYour feed configs go below `feeds`.\n\nFind a full example of a `feeds.yml` at [`spec/fixtures/feeds.test.yml`](https://github.com/html2rss/html2rss/blob/master/spec/fixtures/feeds.test.yml).\n\nIf you prefer to have a single feed defined in a YAML, just omit the feeds. [Checkout the `single.test.yml`.](https://github.com/html2rss/html2rss/blob/master/spec/fixtures/single.test.yml).\nNow you can build your feeds like this:\n\n\u003cdetails\u003e\u003csummary\u003eBuild feeds in Ruby\u003c/summary\u003e\n\n```ruby\nrequire 'html2rss'\n\nmyfeed = Html2rss.config_from_yaml_file('feeds.yml', 'myfeed')\nHtml2rss.feed(myfeed)\n\nmyotherfeed = Html2rss.config_from_yaml_file('feeds.yml', 'myotherfeed')\nHtml2rss.feed(myotherfeed)\n\nsingle = Html2rss.config_from_yaml_file('single.test.yml')\nHtml2rss.feed(single)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003eBuild feeds on the command line\u003c/summary\u003e\n\n```sh\nhtml2rss feed feeds.yml myfeed\nhtml2rss feed feeds.yml myotherfeed\nhtml2rss feed single.test.yml\n```\n\n\u003c/details\u003e\n\n## Generating a feed with Ruby\n\nYou can also install it as a dependency in your Ruby project:\n\n|                      🤩 Like it? | Star it! ⭐️         |\n| -------------------------------: | -------------------- |\n| Add this line to your `Gemfile`: | `gem 'html2rss'`     |\n|                    Then execute: | `bundle`             |\n|                    In your code: | `require 'html2rss'` |\n\nHere's a minimal working example using Ruby:\n\n```ruby\nrequire 'html2rss'\n\nrss = Html2rss.feed(\n  channel: { url: 'https://stackoverflow.com/questions' },\n  auto_source: {}\n)\n\nputs rss\n\n```\n\nand instead with `auto_source`, provide `selectors` (you can use both simultaneously):\n\n```ruby\nrequire 'html2rss'\n\nrss = Html2rss.feed(\n  channel: { url: 'https://stackoverflow.com/questions' },\n  selectors: {\n    items: { selector: '#hot-network-questions \u003e ul \u003e li' },\n    title: { selector: 'a' },\n    url: { selector: 'a', extractor: 'href' }\n  }\n)\n\nputs rss\n```\n\n## Gotchas and tips \u0026 tricks\n\n- Check that the channel URL does not redirect to a mobile page with a different markup structure.\n- Do not rely on your web browser's developer console when using the standard strategy. It does not execute JavaScript.\n  In such cases, fiddling with [`curl`](https://github.com/curl/curl) and [`pup`](https://github.com/ericchiang/pup) to find the selectors seems efficient (`curl URL | pup`).\n- [CSS selectors are versatile. Here's an overview.](https://www.w3.org/TR/selectors-4/#overview)\n\n## Contributing\n\nFind ideas what to contribute in:\n\n1. \u003chttps://github.com/orgs/html2rss/discussions\u003e\n2. the issues tracker: \u003chttps://github.com/html2rss/html2rss/issues\u003e\n\nTo submit changes:\n\n1. Fork this repo ( \u003chttps://github.com/html2rss/html2rss/fork\u003e )\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Implement a commit your changes (`git commit -am 'feat: add XYZ'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create a new Pull Request using the Github web UI\n\n## Development Helpers\n\n1. `bin/setup`: installs dependencies and sets up the development environment.\n2. for a modern Ruby development experience: install [`ruby-lsp`](https://github.com/Shopify/ruby-lsp) and integrate it to your IDE.\n\nFor example: [Ruby in Visual Studio Code](https://code.visualstudio.com/docs/languages/ruby).\n","funding_links":["https://github.com/sponsors/gildesmarais"],"categories":["Ruby"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhtml2rss%2Fhtml2rss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhtml2rss%2Fhtml2rss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhtml2rss%2Fhtml2rss/lists"}