{"id":17274699,"url":"https://github.com/stevebauman/hypertext","last_synced_at":"2025-05-15T20:05:51.657Z","repository":{"id":202803947,"uuid":"708171954","full_name":"stevebauman/hypertext","owner":"stevebauman","description":"A PHP HTML to pure text transformer.","archived":false,"fork":false,"pushed_at":"2024-10-12T16:28:40.000Z","size":2782,"stargazers_count":162,"open_issues_count":1,"forks_count":5,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-05-12T05:18:54.189Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stevebauman.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-21T18:30:54.000Z","updated_at":"2025-04-26T04:19:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"15de0555-29ad-4c3d-8f11-0db7aa7569d6","html_url":"https://github.com/stevebauman/hypertext","commit_stats":null,"previous_names":["stevebauman/hypertext"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevebauman%2Fhypertext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevebauman%2Fhypertext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevebauman%2Fhypertext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevebauman%2Fhypertext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stevebauman","download_url":"https://codeload.github.com/stevebauman/hypertext/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254414499,"owners_count":22067272,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T08:54:32.394Z","updated_at":"2025-05-15T20:05:46.554Z","avatar_url":"https://github.com/stevebauman.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eHypertext\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\nA PHP HTML to pure text transformer that beautifully handles various and malformed HTML.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/stevebauman/hypertext/actions\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/stevebauman/hypertext/run-tests.yml?branch=master\u0026style=flat-square\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://packagist.org/packages/stevebauman/hypertext\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/packagist/v/stevebauman/hypertext.svg?style=flat-square\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://packagist.org/packages/stevebauman/hypertext\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/packagist/dt/stevebauman/hypertext.svg?style=flat-square\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://packagist.org/packages/stevebauman/hypertext\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/packagist/l/stevebauman/hypertext.svg?style=flat-square\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\nHypertext is excellent at pulling text content out of any HTML based document and automatically:\n\n- Removes CSS\n- Removes scripts\n- Removes headers\n- Removes non-HTML based content\n- Preserves spacing \n- Preserves links (optional)\n- Preserves new lines (optional)\n\nIt is directed at using the output in LLM related tasks, such as prompts and embeddings.\n\n## Installation\n\n```bash\ncomposer require stevebauman/hypertext\n```\n\n## Usage\n\n```php\nuse Stevebauman\\Hypertext\\Transformer;\n\n$transformer = new Transformer();\n\n// (Optional) Filter out specific elements by their XPath.\n$transformer-\u003efilter(\"//*[@id='some-element']\");\n\n// (Optional) Retain new line characters.\n$transformer-\u003ekeepNewLines();\n\n// (Optional) Retain anchor tags and their href attribute.\n$transformer-\u003ekeepLinks();\n\n$text = $transformer-\u003etoText($html);\n```\n\n## Example\n\n\u003e For larger examples, please view the [tests/Fixtures](https://github.com/stevebauman/hypertext/tree/master/tests/Fixtures) directory.\n\n**Input**:\n\n```html\n\u003c!DOCTYPE html\u003e\n\u003chtml lang=\"en\"\u003e\n\u003chead\u003e\n    \u003cmeta charset=\"UTF-8\"\u003e\n    \u003cmeta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"\u003e\n    \u003ctitle\u003eMy Blog\u003c/title\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n    \u003ch1\u003eWelcome to My Blog\u003c/h1\u003e\n    \u003cp\u003eThis is a paragraph of text on my webpage.\u003c/p\u003e\n    \u003ca href=\"https://blog.com/posts\"\u003eClick here\u003c/a\u003e to view my posts.\n\u003c/body\u003e\n\u003c/html\u003e\n```\n\n**Output (Pure Text)**:\n\n```php\necho (new Transformer)-\u003etoText($html);\n```\n\n```text\nWelcome to My Blog This is a paragraph of text on my webpage. Click here to view my posts.\n```\n\n**Output (Keep New Lines)**:\n\n```php\necho (new Transformer)-\u003ekeepNewLines()-\u003etoText($html);\n```\n\n```text\nWelcome to My Blog\nThis is a paragraph of text on my webpage.\nClick here to view my posts.\n```\n\n**Output (Keep Links)**:\n\n```php\necho (new Transformer)-\u003ekeepLinks()-\u003etoText($html);\n```\n\n```text\nWelcome to My Blog This is a paragraph of text on my webpage. \u003ca href=\"https://blog.com/posts\"\u003eClick Here\u003c/a\u003e to view my posts.\n```\n\n**Output (Keep Both)**:\n\n```php\necho (new Transformer)\n    -\u003ekeepLinks()\n    -\u003ekeepNewLines()\n    -\u003etoText($html);\n```\n\n```text\nWelcome to My Blog\nThis is a paragraph of text on my webpage.\n\u003ca href=\"https://blog.com/posts\"\u003eClick Here\u003c/a\u003e to view my posts.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevebauman%2Fhypertext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstevebauman%2Fhypertext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevebauman%2Fhypertext/lists"}