{"id":36572794,"url":"https://github.com/rafaelglikis/sinama","last_synced_at":"2026-01-12T07:19:50.146Z","repository":{"id":62532735,"uuid":"144781773","full_name":"rafaelglikis/sinama","owner":"rafaelglikis","description":"Web scraping library","archived":false,"fork":false,"pushed_at":"2018-08-23T00:16:15.000Z","size":47,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-26T16:52:28.374Z","etag":null,"topics":["crawler","crawling","scraper","scraping"],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rafaelglikis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-08-14T23:22:46.000Z","updated_at":"2019-03-16T14:55:10.000Z","dependencies_parsed_at":"2022-11-02T14:45:48.433Z","dependency_job_id":null,"html_url":"https://github.com/rafaelglikis/sinama","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/rafaelglikis/sinama","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaelglikis%2Fsinama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaelglikis%2Fsinama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaelglikis%2Fsinama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaelglikis%2Fsinama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rafaelglikis","download_url":"https://codeload.github.com/rafaelglikis/sinama/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaelglikis%2Fsinama/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28336494,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T06:09:07.588Z","status":"ssl_error","status_checked_at":"2026-01-12T06:05:18.301Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","crawling","scraper","scraping"],"created_at":"2026-01-12T07:19:50.092Z","updated_at":"2026-01-12T07:19:50.139Z","avatar_url":"https://github.com/rafaelglikis.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sinama\n[![Build Status](https://travis-ci.org/rafaelglikis/sinama.svg?branch=master)](https://travis-ci.org/rafaelglikis/sinama)\n\nSinama is a simple web scraping library.\n\n## Requirements\n* PHP 7.0\n\n## Installation\n```shell\ncomposer require rafaelglikis/sinama\n```\n\n## Usage\nCreate a Sinama Client (which extends Goutte\\Client):\n\n```php\nuse  Sinama\\Client;\n$client = new Client();\n```    \nMake requests with the request() method:\n\n```php\n// Go to the motherfuckingwebsite.com website\n$crawler = $client-\u003erequest('GET', 'https://motherfuckingwebsite.com/');\n```\n    \nThe method returns a Crawler object (which extends [Symfony/Component/DomCrawler/Crawler](https://api.symfony.com/4.1/Symfony/Component/DomCrawler/Crawler.html)).\n\nTo use your own Guzzle settings, you may create and pass a new Guzzle 6 instance to Sinama Client. For example, to add a 60 second request timeout:\n\n```php\nuse  Sinama\\Client;\nuse GuzzleHttp\\Client as GuzzleClient;\n\n$client = new Client(new GuzzleClient([\n    'timeout' =\u003e 60\n]));\n$crawler = $client-\u003erequest('GET', 'https://github.com/trending');\n```\nFor more options visit [Guzzle Documentation](http://docs.guzzlephp.org/en/stable/request-options.html).\n\nClick on links:\n\n```php\n$link = $crawler-\u003eselectLink('PHP')-\u003elink();\n$crawler = $client-\u003eclick($link);\necho $crawler-\u003egetUri().\"\\n\";\n```\n    \nExtract data the symfony way:\n\n```php\n$crawler-\u003efilter('h3 \u003e a')-\u003eeach(function ($node) {\n    print trim($node-\u003etext()).\"\\n\";\n});\n```\n    \nOr use Sinama special methods:\n    \n```php\n$crawler = $client-\u003erequest('GET', 'https://github.com/trending');\necho '\u003chtml\u003e';\necho '\u003chead\u003e';\necho '\u003ctitle\u003e'.$crawler-\u003efindTitle().'\u003c/title\u003e';\necho '\u003chead\u003e';\necho '\u003cbody\u003e';\necho '\u003ch1\u003e'.$crawler-\u003efindTitle().'\u003c/h1\u003e';\necho '\u003cp\u003eMain Image: '.$crawler-\u003efindMainImage().'\u003c/p\u003e';\necho $crawler-\u003efindMainContent();\necho '\u003cpre\u003e';\necho 'Links: ';\nprint_r($crawler-\u003efindLinks());\necho 'Emails: ';\nprint_r($crawler-\u003efindEmails());\necho 'Images: ';\nprint_r($crawler-\u003efindImages());\necho '\u003c/pre\u003e';\necho '\u003c/body\u003e';\necho '\u003c/html\u003e';\n```\n    \nSubmit forms:\n\n```php\n$crawler = $client-\u003erequest('GET', 'https://www.google.com/');\n$form = $crawler-\u003eselectButton('Google Search')-\u003eform();\n$crawler = $client-\u003esubmit($form, ['q' =\u003e 'rafaelglikis/sinama']);\n$crawler-\u003efilter('h3 \u003e a')-\u003eeach(function ($node) {\n    print trim($node-\u003etext()).\"\\n\";\n});\n```\n\nNow that we have learned enough let's scrape a site with Sinama Spider:\n\n```php\nuse Sinama\\Crawler;\nuse Sinama\\Spider as BaseSpider;\n\nclass Spider extends BaseSpider\n{\n    public function parse(Crawler $crawler)\n    {\n        $crawler-\u003efilter('div.read-more \u003e a')-\u003eeach(function (Crawler $node) {\n            $this-\u003escrape($node-\u003eattr('href'));\n        });\n\n        $crawler-\u003efilter('div.blog-pagination \u003e a')-\u003eeach(function ($node) {\n            $this-\u003efollow($node-\u003eattr('href'));\n        });\n    }\n\n    public function scrape($url)\n    {\n        echo \"*************************************************** \".$url.\"\\n\";\n        $crawler = $this-\u003eclient-\u003erequest('GET', $url);\n        echo \"Title: \" . $crawler-\u003efindTitle() . \"\\n\";\n        echo \"Main Image: \" . $crawler-\u003efindMainImage().\"\\n\";\n        echo \"Main Content: \\n\" . $crawler-\u003efindMainContent().\"\\n\";\n        echo \"Emails: \\n\";\n        print_r($crawler-\u003efindEmails());\n        echo \"Links: \\n\";\n        print_r($crawler-\u003efindLinks());\n    }\n\n    public function getStartUrls(): array\n    {\n        return [\n            'https://blog.scrapinghub.com'\n        ];\n    }\n}\n\n$spider = new Spider([\n    'start_urls' =\u003e [ 'https://blog.scrapinghub.com' ],\n    'max_depth' =\u003e 2,\n    'verbose' =\u003e true\n]);\n$spider-\u003erun();\n```\n\n## TODO\n* Crawler::findTags()\n\n    ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frafaelglikis%2Fsinama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frafaelglikis%2Fsinama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frafaelglikis%2Fsinama/lists"}