{"id":32118376,"url":"https://github.com/unistudents/saffron","last_synced_at":"2026-02-21T01:02:31.349Z","repository":{"id":42232723,"uuid":"352291209","full_name":"UniStudents/Saffron","owner":"UniStudents","description":"A fairly intuitive \u0026 powerful framework that enables you to collect \u0026 save articles and news from all over the web.","archived":false,"fork":false,"pushed_at":"2024-09-14T09:08:17.000Z","size":924,"stargazers_count":11,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-10-03T01:27:20.201Z","etag":null,"topics":["aggregator","announcements","api-scraper","articles","crawler","crawler-framework","dynamic-scraping","html-scraping","javascript","news","parser","rss","rss-aggregator","rss-feed","rss-parser","saffron","scraping","typescript","wordpress-api"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UniStudents.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-28T09:43:30.000Z","updated_at":"2025-02-06T15:13:22.000Z","dependencies_parsed_at":"2023-12-28T19:34:59.561Z","dependency_job_id":"c8153954-1e73-480a-b472-a899cd6743e1","html_url":"https://github.com/UniStudents/Saffron","commit_stats":{"total_commits":308,"total_committers":7,"mean_commits":44.0,"dds":"0.32792207792207795","last_synced_commit":"a9f66e03987020acfa35723003db2a79a6ef1685"},"previous_names":["poiw-org/saffron"],"tags_count":45,"template":false,"template_full_name":null,"purl":"pkg:github/UniStudents/Saffron","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UniStudents%2FSaffron","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UniStudents%2FSaffron/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UniStudents%2FSaffron/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UniStudents%2FSaffron/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UniStudents","download_url":"https://codeload.github.com/UniStudents/Saffron/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UniStudents%2FSaffron/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279537375,"owners_count":26187073,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-18T02:00:06.492Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aggregator","announcements","api-scraper","articles","crawler","crawler-framework","dynamic-scraping","html-scraping","javascript","news","parser","rss","rss-aggregator","rss-feed","rss-parser","saffron","scraping","typescript","wordpress-api"],"created_at":"2025-10-20T17:07:34.895Z","updated_at":"2025-10-20T17:07:36.521Z","avatar_url":"https://github.com/UniStudents.png","language":"HTML","readme":"# Saffron | News \u0026amp; announcements aggregation framework.\n\n## Table of Contents\n\n- [What is Saffron?](#what-is-saffron)\n- [Architecture](#architecture)\n- [Installation](#installation)\n- [Initialization](#initialization)\n- [Configuration](#configuration)\n- [Parsers](#parsers)\n    - [WordPress V2](#wordpress-v2)\n    - [RSS](#rss)\n    - [HTML](#html)\n    - [JSON / XML](#json--xml)\n    - [Dynamic](#dynamic)\n    - [Which to choose](#which-to-choose)\n- [Article](#article)\n- [Source files](#source-files)\n    - [What is a source file?](#what-is-a-source-file)\n    - [Creating a source file](#creating-a-source-file)\n- [Middleware](#middleware)\n    - [Register a middleware](#register-a-middleware)\n    - [Format article](#format-article)\n    - [Articles](#articles)\n- [Listeners](#listeners)\n- [Standalone](#standalone)\n\n## What is Saffron?\n\nSaffron stands for **S**imple **A**bstract **F**ramework **F**or the **R**etrieval **O**f **N**ews\n\nAs said saffron is a framework. It is an abstraction engine that helps you collect news and\nannouncements from websites in a uniform way.\n\nIt supports different ways of data collection, such as API endpoints and web-scraping.\nIt tries to ease the process of integrating all data sources, by abstracting data collection into a few simple\nand powerful functions.\n\n## Architecture\n\nSaffron's architecture is based on a `main` node that issues scraping instructions and several `worker` nodes\nthat do the scraping \u0026 upload the data to the database.\n\nThe communication between the nodes is happening through the `Grid`. The grid will generate events to communicate\nwith other classes. Saffron supports remote nodes by using [`socket.io`](https://socket.io) server and clients\nas a middleware to connect to the `main` node.\n\n## Installation\n\nTo install the latest release:\n\n```shell\nnpm install @unistudents/saffron\n```\n\nTo install a specific version:\n\n```shell\nnpm install @unistudents/saffron@version\n```\n\n## Initialization\n\nOnce you have installed the library and created your [configuration](./docs/configuration.md):\n\n```ts\nimport Saffron from \"@unistudents/saffron\";\n\nconst saffron = new Saffron();\n\n// Initialize saffron\nsaffron.initialize({/* configuration */});\n\n// Start sheduler and workers.\nsaffron.start();\n```\n\n## Configuration\n\nRead the [configuration](./docs/configuration.md) file for more information.\n\n## Parsers\n\nTo retrieve the desired information from the websites we use parsers.\nThere are four available parser types: `wordpress`, `rss`, `html`, `api` and `dynamic`.\n\n### WordPress V2\n\nParser type: `wordpress-v2`\n\nBy default, [`WordPress`](https://wordpress.com/) based websites has an open API for news retrieval.\nWe make use of that to get access on the articles and categories of the website.\n\nTo quickly check if a website supports the WordPress API simply open your browser and\ntype `\u003cwebsite-root-link\u003e/wp-json/wp/v2/posts/`.\nIf a valid JSON file is displayed on the browser (or downloaded on your computer) which contains the website's articles,\nthen you can safely use the `wordpress` parser.\n\n### RSS\n\nParser type: `rss`\n\nMany websites support [`RSS`](https://en.wikipedia.org/wiki/RSS) feed. RSS allows users and applications to access updates\nto websites in a standardized, computer-readable format. You can check if a website supports RSS if you can see this\nicon \u003cimg src=\"docs/rss.png\" width=\"15\" height=\"15\" /\u003e.\n\n### JSON / XML\n\nParser type: `json` (or `xml`)\n\nThis parser is best to be used when it comes to pages that are loading data using API requests (e.g. lazy loading).\nThe only prerequisite for this parser is that the response of the API requests is in a structured JSON or XML format.\n\n### HTML\n\nParser type: `html`\n\nThis parser uses scrapping tools like [CheerioJS](https://cheerio.js.org/) to scrape the website content and receive\nthe displayed news. This parser is best to be used when the HTML in the website is structured. Websites where the HTML\nand CSS are not structured will be very difficult to scrape.\n\n### Dynamic\n\nParser type: `dynamic`\n\nUnlike the other parsers, this parser uses javascript/typescript code to parse a website. All the logic for the scraping is\ndecided by the user by extending the class `DynamicSourceFile`.\n\n### Which to choose\n\nWe recommend a specific order for using the available parsers.\n\n* If the desired website is based an [`WordPress`](https://wordpress.com/) and the WordPress articles API is enabled, then choose the `wordpress-v2` parser.\n* If the desired website supports [`RSS`](https://en.wikipedia.org/wiki/RSS) feed. then choose the `rss` parser.\n* If the desired website is loading data using API requests with structured responses (e.g. lazy loading), then choose the `json` or `xml` parser.\n* If the desired website has a structured form, the use the `html` parser.\n* If none of the above is possible (bad html or custom API) then the `dynamic` parser is our last choice.\n\n## Article\n\nWe have created a universal format for the parsed news, and we named it `Article`.\n\nRead the [article](./docs/article.md) file for more information.\n\n## Source files\n\n### What is a source file?\n\nA source file is a `json` or `javascript` file that represents a website.\nThese files are generated from the user and guide Saffron on how to parse a website.\n\n### Creating a source file\n\nRead the [source](./docs/source_files/source_file.md) file for the common options or the parsers files\n[WordPress V2](./docs/source_files/wordpress_v2.md), [RSS](./docs/source_files/rss.md), [API](./docs/source_files/json.md), [HTML](./docs/source_files/html.md) or [Dynamic](./docs/source_files/dynamic.md) for the scrape options.\n\n## Middleware\n\nA middleware is a function that gets executed before the articles are passed to `newArticles` function.\nMiddleware functions can be useful for logging, article formatting or sorting.\n\nThe order where the middleware are executed is the order where they were reistered.\nEach middleware function can be called more than once.\n\n### Register a middleware\n\n```typescript\nsaffron.use(\"name\", (...args: any) =\u003e {\n    //...\n});\n```\n\n### Format article\n\nFor changing the contents of the articles.\nIt gets as parameter every article that was found from the parsers and must return the same object when it changed.\n\n```javascript\nsaffron.use(\"article.format\", (article: Article) =\u003e {\n    // If possible set pubDate with milliseconds.\n    let ms = new Date(article.pubDate).getTime();\n    if (!isNaN(ms)) article.pubDate = ms;\n\n    // Append source name before title for every article\n    article.title = `[${article.getSource(saffron).name}] ${article.title}`;\n\n    // Return the changed article.\n    return article;\n});\n```\n\nYou can also access the source class of the article by calling `article.getSource()`.\nNote that any changes made on the source class will also affect the saved source.\n\n### Articles\n\nThis middleware can be used to edit the articles in bulk. You can sort or filter them as you want.\nThe only requirement is to return an array (empty or not) of articles.\n\n```js\nsaffron.use(\"articles\", (articles: Article[]) =\u003e {\n    sort(articles);\n    return articles.filter(\n        (article) =\u003e article.title != null \u0026\u0026 article.title !== \"\"\n    );\n});\n```\n\n## Listeners\n\nSaffron supports listeners for various event. Listeners can be used for logging or creating analytics.\n\nRead the [listeners](./docs/listeners.md) file for more information.\n\n## Standalone\n\nSaffron supports immediate parsing using the static function `parse`.\n\n```ts\nimport {Saffron} from \"@unistudents/saffron\";\n\ntry {\n    const result = Saffron.parse({\n        name: \"source-name\",\n        url: [\"Category 1\", \"https://example.com\"],\n        type: \"html\",\n        // ...\n        scrape: {\n            // ...\n        },\n    }, null); // or pass a config\n\n    console.log(\"Result:\", result);\n} catch (e) {\n    console.log(\"Encountered an error during parsing:\", e);\n}\n```\n\nThe result of the `parse` function is an array of objects for each url passed in the source file:\n\n```ts\n[\n    {\n        url: \"https://example.com\",\n        aliases: [\"Category 1\"],\n        articles: [/*Article*/, /*Article*/, /*Article*/, /*...*/]\n    },\n];\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funistudents%2Fsaffron","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funistudents%2Fsaffron","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funistudents%2Fsaffron/lists"}