{"id":20865105,"url":"https://github.com/antheta/falcon-php","last_synced_at":"2025-08-01T10:36:56.270Z","repository":{"id":207958062,"uuid":"571040650","full_name":"Antheta/falcon-php","owner":"Antheta","description":"🌎 An intermediary for web scrapers with built-in parsers. ","archived":false,"fork":false,"pushed_at":"2023-11-18T17:02:24.000Z","size":554,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-01-19T08:21:00.233Z","etag":null,"topics":["dynamic","gateway","scraper","scraper-gateway","scrapers","web-scraper"],"latest_commit_sha":null,"homepage":"https://falcon-docs-v2.vercel.app","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Antheta.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-11-27T00:30:28.000Z","updated_at":"2024-06-06T11:01:34.000Z","dependencies_parsed_at":"2023-11-18T18:36:49.500Z","dependency_job_id":null,"html_url":"https://github.com/Antheta/falcon-php","commit_stats":null,"previous_names":["antheta/falcon-php"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antheta%2Ffalcon-php","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antheta%2Ffalcon-php/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antheta%2Ffalcon-php/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antheta%2Ffalcon-php/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Antheta","download_url":"https://codeload.github.com/Antheta/falcon-php/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243238970,"owners_count":20259126,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dynamic","gateway","scraper","scraper-gateway","scrapers","web-scraper"],"created_at":"2024-11-18T05:47:01.908Z","updated_at":"2025-03-12T15:13:35.433Z","avatar_url":"https://github.com/Antheta.png","language":"PHP","readme":"\u003cp align=\"center\"\u003e\n \u003ca href=\"https://antheta.com\" target=\"_blank\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"./assets/falcon.png\"\u003e\n    \u003cimg align=\"center\" src=\"./assets/falcon.png\" height=\"150\"\u003e\n  \u003c/picture\u003e\n  \u003ccenter\u003e\n    \u003ca href=\"https://github.com/Antheta/falcon-php/actions\"\u003e\n      \u003cimg src=\"https://github.com/Antheta/falcon-php/actions/workflows/run-tests.yml/badge.svg\"\u003e\n    \u003c/a\u003e\n  \u003c/center\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\nFalcon is an open-source (MIT licensed) high-performance PHP web scraper with built-in parsers and extendability.\n\nPlease notice that this library is not intended to be used to gather emails or any other personal data for spam.\n\n## Documentation\n\n[Documentation](http://docs.antheta.com/)\n\n## Features\n- Many different built-in parsers.\n- Near-endless extendability\n  - Custom parser support.\n  - Custom regex support.\n  - Custom driver (scraper) support.\n\n## Installation\n\n### Composer\n```bash\ncomposer require antheta/falcon\n```\n\n## Usage\nRunning the scraper:\n```php\n$falcon = Falcon::getInstance()-\u003erun(\"https://example.com/\");\n$result = $falcon-\u003eparse()-\u003eresults(); // use all available parsers and get all results\n```\nThe example above scrapes the url and returns an array.\n\n\n### Use specific parsers\nIf you wish to get specific resources from the results\n```php\n$falcon = Falcon::getInstance()-\u003erun(\"https://example.com/\");\n// only returns emails\n$emails = $falcon-\u003eparse([\"email\", \"ip\"])-\u003eemails(); \n```\n\n## Methods\n\nHelper methods for returning the results:\n| Name | \n| - |\n| results | \n| emails | \n| phonenumbers | \n| ipaddresses |\n| forms |\n| links |\n| images |\n| stylesheets |\n| scripts |\n| fonts |\n\n## Custom regexes\nTo add your own regexes to parsers you can just use the `addRegexes` helper:\n\n```php\n// this will attempt to parse emails with the given regex\n$falcon = Falcon::getInstance()\n          -\u003eaddRegexes(\"email\", [\"/[\\._a-zA-Z0-9-]+@[\\._a-zA-Z0-9-ddd]+/i\"])\n          -\u003erun(\"https://example.com/\")-\u003eparse()-\u003eemails();\n\n// you can extend this to other parsers as well and add as many regexes as needed\n$falcon = Falcon::getInstance()\n            // regexes for emails\n            -\u003eaddRegexes(\"email\", [\n              \"/[\\._a-zA-Z0-9-]+@[\\._a-zA-Z0-9-ddd]+/i\",\n              \"/[\\._a-zA-Z0-9-]+\\(at\\)[\\._a-zA-Z0-9-]+/i\",\n            ])\n            // regexes for phonenumbers\n            -\u003eaddRegexes(\"phonenumber\", [\n              \"/([\\+]?[(]?[0-9]{3}[)]?[-\\s\\.]?[0-9]{3}[-\\s\\.]?[0-9]{4,12})/\",\n            ])\n            -\u003erun(\"https://example.com/\")\n            -\u003eparse()\n            -\u003eresults();\n```\n\n## Custom parsers\n\nWith custom parsers you are in control of what kind of data will the parser return:\n\n```php\n$falcon = Falcon::getInstance();\n\n$falcon-\u003eaddParser(\"myCustomParser\", fn ($payload) =\u003e MyParser($payload));\n\nfunction MyParser($payload) {\n  // your custom logic here\n}\n\n// or\n$falcon-\u003eaddParser(\"myCustomParser\", function($payload) {\n  // your custom logic here\n});\n\n// result from your parser\n$falcon-\u003eparse(\"myCustomParser\")-\u003eresults()[\"myCustomParser\"];\n```\n\n## Custom drivers\n\nDrivers are used for scraping the sites and returning the html to falcon. Drivers can also be used to write completely custom logic and saving it to falcon for later use. Start by creating your own driver class that extends the `DriverInterface` interface and implement the driver specific logic within class.\n\n```php\n$falcon = Falcon::getInstance();\n\n$falcon-\u003eaddDrivers([\n  \"myDriver\" =\u003e MyDriver::class\n]);\n```\n\n## Scraping dynamic content\nYou could migrate from hQuery to headless JavaScript browser like CapserJS \u0026 Phantom to load dynamic content. This way you can also scrape data that is loaded dynamically (after the inital page load). \n\nCheck out [Falcon Drivers](https://github.com/Antheta/falcon-drivers) for getting started.\n\n## License\n\nThe MIT License (MIT). Please see [License File](LICENSE) for more information.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantheta%2Ffalcon-php","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantheta%2Ffalcon-php","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantheta%2Ffalcon-php/lists"}