{"id":21321899,"url":"https://github.com/hstanleycrow/easyphparticleextractor","last_synced_at":"2025-07-31T13:34:14.796Z","repository":{"id":167069139,"uuid":"642640806","full_name":"hstanleycrow/EasyPHPArticleExtractor","owner":"hstanleycrow","description":"Free PHP library to extract the main content from an article post or news post, including images and HTML","archived":false,"fork":false,"pushed_at":"2023-05-19T03:00:58.000Z","size":40,"stargazers_count":1,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-12T05:38:49.464Z","etag":null,"topics":["extract-article","extract-content","extract-text","extract-website","extraction","extractor","php","php-library","website"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hstanleycrow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-19T02:55:49.000Z","updated_at":"2024-03-06T02:08:11.000Z","dependencies_parsed_at":"2023-07-11T07:33:24.922Z","dependency_job_id":null,"html_url":"https://github.com/hstanleycrow/EasyPHPArticleExtractor","commit_stats":null,"previous_names":["hstanleycrow/easyphparticleextractor"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hstanleycrow/EasyPHPArticleExtractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hstanleycrow%2FEasyPHPArticleExtractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hstanleycrow%2FEasyPHPArticleExtractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hstanleycrow%2FEasyPHPArticleExtractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hstanleycrow%2FEasyPHPArticleExtractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hstanleycrow","download_url":"https://codeload.github.com/hstanleycrow/EasyPHPArticleExtractor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hstanleycrow%2FEasyPHPArticleExtractor/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268047997,"owners_count":24187210,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-31T02:00:08.723Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extract-article","extract-content","extract-text","extract-website","extraction","extractor","php","php-library","website"],"created_at":"2024-11-21T20:11:31.390Z","updated_at":"2025-07-31T13:34:14.768Z","avatar_url":"https://github.com/hstanleycrow.png","language":"PHP","funding_links":["https://www.buymeacoffee.com/haroldcrow"],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\r\n  \u003cbr\u003e\r\n   Easy PHP Article Extractor\r\n  \u003cbr\u003e\r\n\u003c/h1\u003e\r\n\u003cp\u003eThis is a post article and news posts extractor library for PHP. This library detects where is the content in the HTML and reads the article content from the page, keeping all the useful HTML, suitable for translate and publish in other languages.\r\n\r\nYou can extract only the text too. The library can remove all the internal links to avoid those links into the content. It process the images too to remove the links. Talking about images, the library extract the images in all the HTML tags supported and ways to add one image into the content of the posts.\r\n\r\nThe library detects Youtube videos too to inject the URL of the video in the right place of the content, avoiding the manual action for this, and do the same with inserted Tweets.\r\n\r\nFor a project I have developed, I found many existing open source solutions, but each had unique failures for my project. You can use this libray with another library of mine for translate using the Google Translator API: \u003ca href=\"https://github.com/hstanleycrow/EasyPHPGoogleTranslate\" target=\"_blank\"\u003eEasyPHPGoogleTranslate\u003c/a\u003e.\r\n\r\nAnother use for this library is combinating it with another of my libraries that publish content from PHP to Wordpress: \u003ca href=\"https://github.com/hstanleycrow/EasyPHPToWordpress\" target=\"_blank\"\u003eEasyPHPToWordpress\u003c/a\u003e. In that way, you can extract, translate and publish to Wordpress. I have developed another library: \u003ca href=\"https://github.com/hstanleycrow/EasyPHPOpenAI\" target=\"_blank\"\u003eEasyPHPOpenAI\u003c/a\u003e, where you can use it to use OpenAI API into the content extracted.\r\n\u003c/p\u003e\r\n\r\n\u003ch4 align=\"center\"\u003eFree PHP library to extract the main content from an article post or news post, including images and HTML\u003c/h4\u003e\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003ca href=\"#how-to-use\"\u003eHow To Use\u003c/a\u003e •\r\n  \u003ca href=\"#download\"\u003eDownload\u003c/a\u003e •\r\n  \u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\r\n\u003c/p\u003e\r\n\r\n\r\n## How To Use\r\n\r\n```bash\r\n# Clone this repository\r\n$ git clone https://github.com/hstanleycrow/EasyPHPArticleExtractor/\r\n\r\n# install libraries\r\n$ composer update\r\n```\r\nor \r\n```bash\r\n# Install using composer\r\n$ composer require hstanleycrow/easyphparticleextractor\r\n\r\n### Using Examples\r\nYou only need to create an instance of the main class with the URL with the content to extract and you will to obtain the content with the HTML, in plain text and the title of the article.\r\nPD: I use the library to extract the content with HTML, so the plain text is not my priority. In the other hand, the detection of the main content is very hard, so, sometimes it can extract weird content with the main post, but this library was developed to use the extracted content with an text editor, so, extract some garbage is not a problem for me, because in the editor the user can clean the content.\r\n\r\n```php\r\n$url = 'https://nftplazas.com/zed-run-airdrop/';\r\n$articleExtractor = new ArticleExtractor($url);\r\n$article = $articleExtractor-\u003earticle();\r\n$title = $articleExtractor-\u003etitle();\r\n$plaintext = $articleExtractor-\u003eplainText();\r\n\r\n$url = 'https://www.seroundtable.com/google-search-algorithm-ranking-volatility-35414.html';\r\n$article = $articleExtractor-\u003earticle();\r\necho $articleExtractor-\u003etitle() . PHP_EO\r\n\r\n```\r\n\r\n## Download\r\n\r\nYou can [download](https://github.com/hstanleycrow/EasyPHPArticleExtractor/) the latest version here.\r\n\r\n## PHP Versions\r\nI have tested this class only in this PHP versions. So, if you have an older version and do not work, let me know.\r\n| PHP Version |\r\n| ------------- |\r\n| PHP 8.0 | \r\n| PHP 8.1 |\r\n| PHP 8.2 |\r\n\r\n## Support\r\n\r\n\u003ca href=\"https://www.buymeacoffee.com/haroldcrow\" target=\"_blank\"\u003e\u003cimg src=\"https://www.buymeacoffee.com/assets/img/custom_images/purple_img.png\" alt=\"Buy Me A Coffee\" style=\"height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;\" \u003e\u003c/a\u003e\r\n\r\n## License\r\n\r\nMIT\r\n\r\n---\r\n\r\n\u003e [www.hablemosdeseo.net](https://www.hablemosdeseo.net) \u0026nbsp;\u0026middot;\u0026nbsp;\r\n\u003e GitHub [@hstanleycrow](https://github.com/hstanleycrow) \u0026nbsp;\u0026middot;\u0026nbsp;\r\n\u003e Twitter [@harold_crow](https://twitter.com/harold_crow)\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhstanleycrow%2Feasyphparticleextractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhstanleycrow%2Feasyphparticleextractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhstanleycrow%2Feasyphparticleextractor/lists"}