{"id":13758604,"url":"https://github.com/baraveli/rss-scraper","last_synced_at":"2025-09-20T10:32:57.485Z","repository":{"id":62491782,"uuid":"225413754","full_name":"baraveli/rss-scraper","owner":"baraveli","description":"Rss Scraper to scrap rss feed from news websites.","archived":false,"fork":false,"pushed_at":"2020-09-30T06:10:06.000Z","size":58,"stargazers_count":7,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-04-30T13:34:44.827Z","etag":null,"topics":["composer","php","rss","rss-scraper","rssreader","scrap-rss-feed","scraper"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/baraveli.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-02T15:56:35.000Z","updated_at":"2024-04-24T19:01:50.000Z","dependencies_parsed_at":"2022-11-02T11:31:11.108Z","dependency_job_id":null,"html_url":"https://github.com/baraveli/rss-scraper","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baraveli%2Frss-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baraveli%2Frss-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baraveli%2Frss-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baraveli%2Frss-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/baraveli","download_url":"https://codeload.github.com/baraveli/rss-scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233606224,"owners_count":18701610,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["composer","php","rss","rss-scraper","rssreader","scrap-rss-feed","scraper"],"created_at":"2024-08-03T13:00:33.490Z","updated_at":"2025-09-20T10:32:57.058Z","avatar_url":"https://github.com/baraveli.png","language":"PHP","funding_links":[],"categories":["Table of Contents"],"sub_categories":["PHP Projects"],"readme":"# Rss scraper\r\n\r\n[![Build Status](https://travis-ci.org/baraveli/rss-scraper.svg?branch=master)](https://travis-ci.org/baraveli/rss-scraper)\r\n[![Latest Stable Version](https://poser.pugx.org/baraveli/rss-scraper/v/stable)](https://packagist.org/packages/baraveli/rss-scraper)\r\n[![License](https://poser.pugx.org/baraveli/rss-scraper/license)](https://packagist.org/packages/baraveli/rss-scraper)\r\n\r\n![Rss Scraper logo](https://jinas.me/images/baravelirssgithub.jpg)\r\nRss Scraper to scrap rss feed from news websites.\r\n\r\n## :rocket: Installation\r\n\r\n```shell\r\ncomposer require baraveli/rss-scraper\r\n```\r\n\r\n## Usage\r\n\r\nTo use this package when you install it be sure to create a config.json file inside your application and specify the sites you want to index.\r\n\r\n## :satellite: Rss Scraper Specs\r\n\r\nThis documentation decribe the rss scraper structure,usage and how the individual components work in the libary.\r\n\r\n## :crystal_ball: General Explanation\r\n\r\nThe rss scraper get the rss feed of the news from the configuration and get the rss feed items and return the data as a json response or an array.\r\n\r\n- ### :hammer: Config loader\r\n\r\nRss scraper configurations are stored in the configs directory as \u003ccode\u003econfig.json\u003c/code\u003e file. The config file has the information about the rss feeds that the rss scraper calls to scrap the rss feed.\r\n\r\nExample config:\r\n\r\n```json\r\n{\r\n    \"mihaaru\":\"https://mihaaru.com/rss\",\r\n    \"vaguthu\" \"https://vaguthu.mv/feed\"\r\n}\r\n```\r\n\r\nThis configuration file is loading the rss feed of [mihaaru](mihaaru.com) and [vaguthu](vaguthu.mv).\r\n\r\nThats pretty much it for the configuration file. Rss scraper has a util \u003ccode\u003eConfigLoader\u003c/code\u003e class to load configuration data from the configs directory and return the rss feed url as an array.\r\n\r\nThe ConfigLoader class has one static load method which takes a \u003ccode\u003efilename\u003c/code\u003e as an argument to the method as a string. filename will be the name of the json file inside the configs directory. In this case the file name will be config. If a given file is not found load method throws an execption saying \"Error reading the config file or it is empty.\"\r\n\r\nConfig loader class is shown below:\r\n\r\n```php\r\n\u003c?php\r\n\r\nnamespace Baraveli\\RssScraper\\Util;\r\n\r\nuse Baraveli\\RssScraper\\Interfaces\\IConfigLoader;\r\n\r\nclass ConfigLoader implements IConfigLoader\r\n{\r\n    /**\r\n     * load\r\n     *\r\n     * @param  mixed $filename\r\n     *\r\n     * This static method loads configuration files from the configs directory\r\n     *\r\n     * @return array\r\n     */\r\n    public static function load(string $filename): array\r\n    {\r\n        $path = IConfigLoader::DIRECTORY_PATH . $filename .  '.json';\r\n        if (!file_exists($path)) {\r\n            $path = getcwd() . '/'. $filename .  '.json';\r\n        }\r\n\r\n        $file = file_get_contents($path, FILE_USE_INCLUDE_PATH);\r\n        $urls = json_decode($file, true);\r\n\r\n        if (!isset($file, $urls)) {\r\n            throw new \\Exception(\"Error reading the config file or it it is empty\");\r\n        }\r\n\r\n        return $urls;\r\n    }\r\n}\r\n\r\n```\r\n\r\n- ### :flashlight: Http Client\r\n\r\nClient class inside the Http directory of the RSS scraper is used to send HTTP request to the RSS feed URL specified in the config to get the content. The class get method gets the content of the RSS URL and check if the returned data is a validxml content. \u003ccode\u003eisValidXmL()\u003c/code\u003e is helper method that is provided by the helper trait. if the isvalidxml check passes the xml file is then pass to the \u003ccode\u003esimplexml_load_string()\u003c/code\u003e function that is built into php. the returned loaded string get passed to \u003ccode\u003eparseXML\u003c/code\u003e method to return the decoded version of the xml file to php array. The data is then returned.\r\n\r\nThis classes uses guzzle to make the http request.\r\n\r\nClient class is shown below:\r\n\r\n```php\r\n\u003c?php\r\n\r\nnamespace Baraveli\\RssScraper\\Http;\r\n\r\nuse GuzzleHttp\\Client as GuzzleClient;\r\nuse Baraveli\\RssScraper\\Util\\Helper;\r\n\r\nclass Client\r\n{\r\n    use Helper;\r\n\r\n    private $client;\r\n\r\n    public function __construct()\r\n    {\r\n        $this-\u003eclient = new GuzzleClient();\r\n    }\r\n\r\n    /**\r\n     * get\r\n     *\r\n     * Method to get the rss feed.\r\n     * \r\n     * This method does parsing of xml to php array and validation checks before returning data.\r\n     * \r\n     * @param  mixed $link\r\n     *\r\n     * @return void\r\n     */\r\n    public function get($link)\r\n    {\r\n        $response = $this-\u003eclient-\u003erequest('GET', $link);\r\n\r\n        $responseBody = $response-\u003egetBody();\r\n        if (!$this-\u003eisValidXml($responseBody)) {\r\n            throw new \\Exception(\"The file doesn't contain valid XML\");\r\n        }\r\n\r\n        $xmlfile = simplexml_load_string($responseBody);\r\n        $data = $this-\u003eparseXML($xmlfile);\r\n\r\n        return $data;\r\n    }\r\n\r\n    /**\r\n     * parseXML\r\n     * \r\n     * This method decode the xml data to php array\r\n     *\r\n     * @param  mixed $xmlfile\r\n     *\r\n     * @return void\r\n     */\r\n    protected function parseXML($xmlfile)\r\n    {\r\n        $json = json_encode($xmlfile);\r\n        $data = json_decode($json, true);\r\n\r\n        return $data;\r\n    }\r\n}\r\n```\r\n\r\n- ### :page_facing_up: Article Collection\r\n\r\nArticle collection is a class that is responsible for adding everything to a collection so that the collection can easily be manipluated as a array or json.\r\nArticle collection class has an item array which holds all the items. Items are added through the add method given a value. Class also have a method called jsonify() which converts the responses to json and a toArray() method that converts the response to an array. Count method lets you to count the number of item inside the item array.\r\n\r\nArticle Collection class is shown below:\r\n\r\n```php\r\n\u003c?php\r\n\r\nnamespace Baraveli\\RssScraper\\Collections;\r\n\r\nuse Countable;\r\n\r\nclass ArticleCollection implements Countable\r\n{\r\n\r\n    protected $items = [];\r\n\r\n    /**\r\n     * __toString\r\n     *\r\n     * Jsonify the collection automatically when the trying to output as a string.\r\n     * \r\n     * @return void\r\n     */\r\n    public function __toString()\r\n    {\r\n        return $this-\u003ejsonify();\r\n    }\r\n\r\n\r\n    /**\r\n     * add\r\n     *\r\n     * @param  mixed $value\r\n     * \r\n     * Method to add items to the collection array.\r\n     *\r\n     * @return void\r\n     */\r\n    public function add($value)\r\n    {\r\n        $this-\u003eitems[] = $value;\r\n    }\r\n\r\n    /**\r\n     * get\r\n     *\r\n     * @param  mixed $key\r\n     * \r\n     * Method to get the items from the collection array given a (int)key value\r\n     *\r\n     * @return void\r\n     */\r\n    public function get($key)\r\n    {\r\n        return array_key_exists($key, $this-\u003eitems) ? $this-\u003eitems[$key] : null;\r\n    }\r\n\r\n    /**\r\n     * jsonify\r\n     * \r\n     * Method to convert the response to json\r\n     * \r\n     * This method is chainable with the getrss() function.\r\n     *\r\n     * @return void\r\n     */\r\n    public function jsonify()\r\n    {\r\n        return json_encode($this-\u003eitems);\r\n    }\r\n\r\n    /**\r\n     * toArray\r\n     *\r\n     * Method to return the response as an array\r\n     * \r\n     * This method is chainable with the getrss() function.\r\n     * \r\n     * @return void\r\n     */\r\n    public function toArray()\r\n    {\r\n        return $this-\u003eitems;\r\n    }\r\n\r\n    /**\r\n     * count\r\n     *\r\n     * Method to count how many items are in the article collection array\r\n     * \r\n     * @return void\r\n     */\r\n    public function count()\r\n    {\r\n        return count($this-\u003eitems);\r\n    }\r\n}\r\n\r\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbaraveli%2Frss-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbaraveli%2Frss-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbaraveli%2Frss-scraper/lists"}