{"id":13467947,"url":"https://github.com/Alir3z4/html2text","last_synced_at":"2025-03-26T03:31:16.037Z","repository":{"id":14293282,"uuid":"17001672","full_name":"Alir3z4/html2text","owner":"Alir3z4","description":"Convert HTML to Markdown-formatted text.","archived":false,"fork":false,"pushed_at":"2024-07-25T15:51:38.000Z","size":1268,"stargazers_count":1919,"open_issues_count":103,"forks_count":286,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-03-25T16:16:22.491Z","etag":null,"topics":["markdown","markdown-parser","python"],"latest_commit_sha":null,"homepage":"alir3z4.github.io/html2text/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"aaronsw/html2text","license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Alir3z4.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog.rst","contributing":"docs/contributing.md","funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-02-19T22:41:11.000Z","updated_at":"2025-03-25T08:05:28.000Z","dependencies_parsed_at":"2024-06-18T11:12:19.667Z","dependency_job_id":"20f54392-22fa-46c5-b770-215aa473aaa0","html_url":"https://github.com/Alir3z4/html2text","commit_stats":{"total_commits":615,"total_committers":80,"mean_commits":7.6875,"dds":0.7365853658536585,"last_synced_commit":"8917f5c83d8cf013110124a6b37331b2c29a0fff"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alir3z4%2Fhtml2text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alir3z4%2Fhtml2text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alir3z4%2Fhtml2text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alir3z4%2Fhtml2text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Alir3z4","download_url":"https://codeload.github.com/Alir3z4/html2text/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245584739,"owners_count":20639619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["markdown","markdown-parser","python"],"created_at":"2024-07-31T15:01:02.957Z","updated_at":"2025-03-26T03:31:16.006Z","avatar_url":"https://github.com/Alir3z4.png","language":"Python","readme":"# html2text\n\n[![CI](https://github.com/Alir3z4/html2text/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/Alir3z4/html2text/actions/workflows/main.yml)\n[![codecov](https://codecov.io/gh/Alir3z4/html2text/graph/badge.svg?token=OoxiyymjgU)](https://codecov.io/gh/Alir3z4/html2text)\n\n\n\nhtml2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).\n\n\nUsage: `html2text [filename [encoding]]`\n\n| Option                                                 | Description\n|--------------------------------------------------------|---------------------------------------------------\n| `--version`                                            | Show program's version number and exit\n| `-h`, `--help`                                         | Show this help message and exit\n| `--ignore-links`                                       | Don't include any formatting for links\n|`--escape-all`                                          | Escape all special characters.  Output is less readable, but avoids corner case formatting issues.\n| `--reference-links`                                    | Use reference links instead of links to create markdown\n| `--mark-code`                                          | Mark preformatted and code blocks with [code]...[/code]\n\nFor a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)\n\n\nOr you can use it from within `Python`:\n\n```\n\u003e\u003e\u003e import html2text\n\u003e\u003e\u003e\n\u003e\u003e\u003e print(html2text.html2text(\"\u003cp\u003e\u003cstrong\u003eZed's\u003c/strong\u003e dead baby, \u003cem\u003eZed's\u003c/em\u003e dead.\u003c/p\u003e\"))\n**Zed's** dead baby, _Zed's_ dead.\n\n```\n\n\nOr with some configuration options:\n```\n\u003e\u003e\u003e import html2text\n\u003e\u003e\u003e\n\u003e\u003e\u003e h = html2text.HTML2Text()\n\u003e\u003e\u003e # Ignore converting links from HTML\n\u003e\u003e\u003e h.ignore_links = True\n\u003e\u003e\u003e print h.handle(\"\u003cp\u003eHello, \u003ca href='https://www.google.com/earth/'\u003eworld\u003c/a\u003e!\")\nHello, world!\n\n\u003e\u003e\u003e print(h.handle(\"\u003cp\u003eHello, \u003ca href='https://www.google.com/earth/'\u003eworld\u003c/a\u003e!\"))\n\nHello, world!\n\n\u003e\u003e\u003e # Don't Ignore links anymore, I like links\n\u003e\u003e\u003e h.ignore_links = False\n\u003e\u003e\u003e print(h.handle(\"\u003cp\u003eHello, \u003ca href='https://www.google.com/earth/'\u003eworld\u003c/a\u003e!\"))\nHello, [world](https://www.google.com/earth/)!\n\n```\n\n*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*\n\n\n## How to install\n\n`html2text` is available on pypi\nhttps://pypi.org/project/html2text/\n\n```\n$ pip install html2text\n```\n\n\n## How to run unit tests\n\n    tox\n\nTo see the coverage results:\n\n    coverage html\n\nthen open the `./htmlcov/index.html` file in your browser.\n\n## Documentation\n\nDocumentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)\n","funding_links":[],"categories":["Web Content Extracting","Python","资源列表","📝 Content \u0026 Text Extraction","Web内容提取","HarmonyOS","Markdown","网络","Web Content Extracting [🔝](#readme)","Awesome Python"],"sub_categories":["网页内容提取","Ruby","Windows Manager","Web Content Extracting"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlir3z4%2Fhtml2text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAlir3z4%2Fhtml2text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlir3z4%2Fhtml2text/lists"}