{"id":14155188,"url":"https://github.com/extractus/feed-extractor","last_synced_at":"2025-05-14T21:10:27.775Z","repository":{"id":2359408,"uuid":"46329174","full_name":"extractus/feed-extractor","owner":"extractus","description":"Simplest way to read \u0026 normalize RSS/ATOM/JSON feed data","archived":false,"fork":false,"pushed_at":"2025-02-09T03:32:01.000Z","size":1167,"stargazers_count":173,"open_issues_count":5,"forks_count":34,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-02T11:04:15.697Z","etag":null,"topics":["atom-feed","deno","feed-reader","jsonfeed","nodejs","rss"],"latest_commit_sha":null,"homepage":"https://extractor-demos.pages.dev/feed-extractor","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/extractus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-11-17T07:00:18.000Z","updated_at":"2025-03-13T07:57:35.000Z","dependencies_parsed_at":"2023-11-06T10:43:34.125Z","dependency_job_id":"e6cb0407-748d-4524-a57d-4d2033f5c734","html_url":"https://github.com/extractus/feed-extractor","commit_stats":{"total_commits":143,"total_committers":8,"mean_commits":17.875,"dds":"0.15384615384615385","last_synced_commit":"b81646a23dbec4f9224cb1b19ea10f106d28d27f"},"previous_names":["ndaidong/feed-reader"],"tags_count":51,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/extractus%2Ffeed-extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/extractus%2Ffeed-extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/extractus%2Ffeed-extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/extractus%2Ffeed-extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/extractus","download_url":"https://codeload.github.com/extractus/feed-extractor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248674679,"owners_count":21143760,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atom-feed","deno","feed-reader","jsonfeed","nodejs","rss"],"created_at":"2024-08-17T08:02:25.606Z","updated_at":"2025-04-14T00:52:59.926Z","avatar_url":"https://github.com/extractus.png","language":"JavaScript","readme":"# feed-extractor\n\nTo read \u0026 normalize RSS/ATOM/JSON feed data.\n\n[![npm version](https://badge.fury.io/js/@extractus%2Ffeed-extractor.svg)](https://badge.fury.io/js/@extractus%2Ffeed-extractor)\n![CodeQL](https://github.com/extractus/feed-extractor/workflows/CodeQL/badge.svg)\n![CI test](https://github.com/extractus/feed-extractor/workflows/ci-test/badge.svg)\n[![Coverage Status](https://img.shields.io/coveralls/github/extractus/feed-extractor)](https://coveralls.io/github/extractus/feed-extractor?branch=main)\n\n(This library is derived from [feed-reader](https://www.npmjs.com/package/feed-reader) renamed.)\n\n## Demo\n\n- [Give it a try!](https://extractor-demos.pages.dev/feed-extractor)\n- [Example FaaS](https://extractus.deno.dev/extract?apikey=rn0wbHos2e73W6ghQf705bdF\u0026type=feed\u0026url=https://news.google.com/rss)\n\n## Install \u0026 Usage\n\n### Node.js\n\n```bash\nnpm i @extractus/feed-extractor\n```\n\n```ts\nimport { extract } from '@extractus/feed-extractor'\n\n// extract a RSS\nconst result = await extract('https://news.google.com/rss')\nconsole.log(result)\n```\n\n### Deno\n\n```ts\nimport { extract } from 'npm:@extractus/feed-extractor'\n```\n\n### Browser\n\n```ts\nimport { extract } from 'https://esm.sh/@extractus/feed-extractor'\n```\n\nPlease check [the examples](https://github.com/extractus/feed-extractor/tree/main/examples) for reference.\n\n\n## Automate RSS feed extraction with GitHub Actions\n\n[RSS Feed Fetch Action](https://github.com/Promptly-Technologies-LLC/rss-fetch-action) is a GitHub Action designed to automate the fetching of RSS feeds.\nIt fetches an RSS feed from a given URL and saves it to a specified file in your GitHub repository.\nThis action is particularly useful for populating content on GitHub Pages websites or other static site generators.\n\n\n## CJS Deprecated\n\nCJS is deprecated for this package.  When calling `require('@extractus/feed-extractor')` a deprecation warning is now logged.  You should update your code to use the ESM export.\n\n- You can ignore this warning via the environment variable `FEED_EXTRACTOR_CJS_IGNORE_WARNING=true`\n- To see where the warning is coming from you can set the environment variable `FEED_EXTRACTOR_CJS_TRACE_WARNING=true`\n\n\n## APIs\n\n- [extract()](#extract)\n- [extractFromJson()](#extractfromjson)\n- [extractFromXml()](#extractfromxml)\n\n#### Note:\n\n- *Old method `read()` has been marked as deprecated and will be removed in next major release.*\n\n---\n\n### `extract()`\n\nLoad and extract feed data from given RSS/ATOM/JSON source. Return a Promise object.\n\n#### Syntax\n\n```ts\nextract(String url)\nextract(String url, Object parserOptions)\nextract(String url, Object parserOptions, Object fetchOptions)\n```\n\nExample:\n\n```js\nimport { extract } from '@extractus/feed-extractor'\n\nconst result = await extract('https://news.google.com/atom')\nconsole.log(result)\n```\n\nWithout any options, the result should have the following structure:\n\n```ts\n{\n  title: String,\n  link: String,\n  description: String,\n  generator: String,\n  language: String,\n  published: ISO Date String,\n  entries: Array[\n    {\n      id: String,\n      title: String,\n      link: String,\n      description: String,\n      published: ISO Datetime String\n    },\n    // ...\n  ]\n}\n```\n\n#### Parameters\n\n##### `url` *required*\n\nURL of a valid feed source\n\nFeed content must be accessible and conform one of the following standards:\n\n  - [RSS Feed](https://www.rssboard.org/rss-specification)\n    - [RDF Feed](https://web.resource.org/rss/1.0/spec)\n  - [ATOM Feed](https://datatracker.ietf.org/doc/html/rfc5023)\n  - [JSON Feed](https://www.jsonfeed.org/version/1.1/)\n\n##### `parserOptions` *optional*\n\nObject with all or several of the following properties:\n\n  - `normalization`: Boolean, normalize feed data or keep original. Default `true`.\n  - `useISODateFormat`: Boolean, convert datetime to ISO format. Default `true`.\n  - `descriptionMaxLen`: Number, to truncate description. Default `250` characters. Set to `0` = no truncation.\n  - `xmlParserOptions`: Object, used by xml parser, view [fast-xml-parser's docs](https://github.com/NaturalIntelligence/fast-xml-parser/blob/master/docs/v4/2.XMLparseOptions.md)\n  - `getExtraFeedFields`: Function, to get more fields from feed data\n  - `getExtraEntryFields`: Function, to get more fields from feed entry data\n  - `baseUrl`: URL string, to absolutify the links within feed content\n\nFor example:\n\n```ts\nimport { extract } from '@extractus/feed-extractor'\n\nawait extract('https://news.google.com/atom', {\n  useISODateFormat: false\n})\n\nawait extract('https://news.google.com/rss', {\n  useISODateFormat: false,\n  getExtraFeedFields: (feedData) =\u003e {\n    return {\n      subtitle: feedData.subtitle || ''\n    }\n  },\n  getExtraEntryFields: (feedEntry) =\u003e {\n    const {\n      enclosure,\n      category\n    } = feedEntry\n    return {\n      enclosure: {\n        url: enclosure['@_url'],\n        type: enclosure['@_type'],\n        length: enclosure['@_length']\n      },\n      category: isString(category) ? category : {\n        text: category['@_text'],\n        domain: category['@_domain']\n      }\n    }\n  }\n})\n```\n\n##### `fetchOptions` *optional*\n\n`fetchOptions` is an object that can have the following properties:\n\n- `headers`: to set request headers\n- `proxy`: another endpoint to forward the request to\n- `agent`: a HTTP proxy agent\n- `signal`: AbortController signal or AbortSignal timeout to terminate the request\n\nFor example, you can use this param to set request headers to fetch as below:\n\n```js\nimport { extract } from '@extractus/feed-extractor'\n\nconst url = 'https://news.google.com/rss'\nawait extract(url, null, {\n  headers: {\n    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'\n  }\n})\n```\n\nYou can also specify a proxy endpoint to load remote content, instead of fetching directly.\n\nFor example:\n\n```js\nimport { extract } from '@extractus/feed-extractor'\n\nconst url = 'https://news.google.com/rss'\n\nawait extract(url, null, {\n  headers: {\n    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'\n  },\n  proxy: {\n    target: 'https://your-secret-proxy.io/loadXml?url=',\n    headers: {\n      'Proxy-Authorization': 'Bearer YWxhZGRpbjpvcGVuc2VzYW1l...'\n    }\n  }\n})\n```\n\nPassing requests to proxy is useful while running `@extractus/feed-extractor` on browser.\nView `examples/browser-feed-reader` as reference example.\n\nAnother way to work with proxy is use `agent` option instead of `proxy` as below:\n\n```js\nimport { extract } from '@extractus/feed-extractor'\n\nimport { HttpsProxyAgent } from 'https-proxy-agent'\n\nconst proxy = 'http://abc:RaNdoMpasswORd_country-France@proxy.packetstream.io:31113'\n\nconst url = 'https://news.google.com/rss'\n\nconst feed = await extract(url, null, {\n  agent: new HttpsProxyAgent(proxy),\n})\nconsole.log('Run feed-extractor with proxy:', proxy)\nconsole.log(feed)\n```\n\nFor more info about [https-proxy-agent](https://www.npmjs.com/package/https-proxy-agent), check [its repo](https://github.com/TooTallNate/proxy-agents).\n\nBy default, there is no request timeout. You can use the option `signal` to cancel request at the right time.\n\nThe common way is to use AbortControler:\n\n```js\nconst controller = new AbortController()\n\n// stop after 5 seconds\nsetTimeout(() =\u003e {\n  controller.abort()\n}, 5000)\n\nconst data = await extract(url, null, {\n  signal: controller.signal,\n})\n```\n\nA newer solution is AbortSignal's `timeout()` static method:\n\n```js\n// stop after 5 seconds\nconst data = await extract(url, null, {\n  signal: AbortSignal.timeout(5000),\n})\n```\n\nFor more info:\n\n- [AbortController constructor](https://developer.mozilla.org/en-US/docs/Web/API/AbortController)\n- [AbortSignal: timeout() static method](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal/timeout_static)\n\n\n### `extractFromJson()`\n\nExtract feed data from JSON string.\nReturn an object which contains feed data.\n\n#### Syntax\n\n```ts\nextractFromJson(String json)\nextractFromJson(String json, Object parserOptions)\n```\n\nExample:\n\n```js\nimport { extractFromJson } from '@extractus/feed-extractor'\n\nconst url = 'https://www.jsonfeed.org/feed.json'\n// this resource provides data in JSON feed format\n// so we fetch remote content as json\n// then pass to feed-extractor\nconst res = await fetch(url)\nconst json = await res.json()\n\nconst feed = extractFromJson(json)\nconsole.log(feed)\n```\n\n#### Parameters\n\n##### `json` *required*\n\nJSON string loaded from JSON feed resource.\n\n##### `parserOptions` *optional*\n\nSee [parserOptions](#parseroptions-optional) above.\n\n\n### `extractFromXml()`\n\nExtract feed data from XML string.\nReturn an object which contains feed data.\n\n#### Syntax\n\n```ts\nextractFromXml(String xml)\nextractFromXml(String xml, Object parserOptions)\n```\n\nExample:\n\n```js\nimport { extractFromXml } from '@extractus/feed-extractor'\n\nconst url = 'https://news.google.com/atom'\n// this resource provides data in ATOM feed format\n// so we fetch remote content as text\n// then pass to feed-extractor\nconst res = await fetch(url)\nconst xml = await res.text()\n\nconst feed = extractFromXml(xml)\nconsole.log(feed)\n```\n\n#### Parameters\n\n##### `xml` *required*\n\nXML string loaded from RSS/ATOM feed resource.\n\n##### `parserOptions` *optional*\n\nSee [parserOptions](#parseroptions-optional) above.\n\n\n## Test\n\n```bash\ngit clone https://github.com/extractus/feed-extractor.git\ncd feed-extractor\npnpm i\npnpm test\n```\n\n![feed-extractor-test.png](https://i.imgur.com/2b5xt6S.png)\n\n\n## Quick evaluation\n\n```bash\ngit clone https://github.com/extractus/feed-extractor.git\ncd feed-extractor\npnpm i\npnpm eval https://news.google.com/rss\n```\n\n## License\nThe MIT License (MIT)\n\n## Support the project\n\nIf you find value from this open source project, you can support in the following ways:\n\n- Give it a star ⭐\n- Buy me a coffee: https://paypal.me/ndaidong 🍵\n- Subscribe [Feed Reader service](https://rapidapi.com/pwshub-pwshub-default/api/feed-reader1/) on RapidAPI 😉\n\nThank you.\n\n---\n","funding_links":["https://paypal.me/ndaidong"],"categories":["rss"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fextractus%2Ffeed-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fextractus%2Ffeed-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fextractus%2Ffeed-extractor/lists"}