{"id":46449187,"url":"https://github.com/dfface/xml2epub","last_synced_at":"2026-03-06T00:03:27.642Z","repository":{"id":38161633,"uuid":"401755012","full_name":"dfface/xml2epub","owner":"dfface","description":"Batch convert multiple web pages, html files or images into one e-book.","archived":false,"fork":false,"pushed_at":"2026-02-12T15:04:27.000Z","size":97529,"stargazers_count":20,"open_issues_count":6,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-03-03T08:51:33.979Z","etag":null,"topics":["clipper","ebook","epub","file","html","image","images","local","python","url","web","xml"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dfface.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-08-31T15:33:19.000Z","updated_at":"2026-02-18T15:53:44.000Z","dependencies_parsed_at":"2024-03-13T17:56:31.479Z","dependency_job_id":null,"html_url":"https://github.com/dfface/xml2epub","commit_stats":{"total_commits":42,"total_committers":4,"mean_commits":10.5,"dds":0.2857142857142857,"last_synced_commit":"b43d8b3905e8dc85106ae7374e2ae5aa64701149"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/dfface/xml2epub","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfface%2Fxml2epub","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfface%2Fxml2epub/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfface%2Fxml2epub/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfface%2Fxml2epub/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dfface","download_url":"https://codeload.github.com/dfface/xml2epub/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfface%2Fxml2epub/sbom","scorecard":{"id":338977,"data":{"date":"2025-08-11","repo":{"name":"github.com/dfface/xml2epub","commit":"22b047a681dc3c13ffed68ae25d9e40209098a2a"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.7,"checks":[{"name":"Code-Review","score":0,"reason":"Found 2/22 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/pypi.yml:1","Warn: no topLevel permission defined: .github/workflows/testpypi.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: License.txt:0","Info: FSF or OSI recognized license: MIT License: License.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/pypi.yml:23: update your workflow using https://app.stepsecurity.io/secureworkflow/dfface/xml2epub/pypi.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/pypi.yml:26: update your workflow using https://app.stepsecurity.io/secureworkflow/dfface/xml2epub/pypi.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/testpypi.yml:24: update your workflow using https://app.stepsecurity.io/secureworkflow/dfface/xml2epub/testpypi.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/testpypi.yml:27: update your workflow using https://app.stepsecurity.io/secureworkflow/dfface/xml2epub/testpypi.yml/main?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/pypi.yml:33","Warn: pipCommand not pinned by hash: .github/workflows/pypi.yml:34","Warn: pipCommand not pinned by hash: .github/workflows/testpypi.yml:35","Warn: pipCommand not pinned by hash: .github/workflows/testpypi.yml:37","Warn: pipCommand not pinned by hash: .github/workflows/testpypi.yml:39","Info:   0 out of   4 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   5 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 10 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":2,"reason":"8 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-cpwx-vrp4-4pq7","Warn: Project is vulnerable to: GHSA-gmj6-6f8f-6699","Warn: Project is vulnerable to: GHSA-h5c8-rqwp-cp95","Warn: Project is vulnerable to: GHSA-h75v-3vvj-5mfj","Warn: Project is vulnerable to: GHSA-q2x7-8rv6-6q7h","Warn: Project is vulnerable to: GHSA-9hjg-9r4m-mvj7","Warn: Project is vulnerable to: GHSA-9wx4-h78v-vm56","Warn: Project is vulnerable to: PYSEC-2023-74 / GHSA-j8r2-6x86-q33q"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-18T05:21:11.415Z","repository_id":38161633,"created_at":"2025-08-18T05:21:11.415Z","updated_at":"2025-08-18T05:21:11.415Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30156253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T22:39:40.138Z","status":"ssl_error","status_checked_at":"2026-03-05T22:39:24.771Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clipper","ebook","epub","file","html","image","images","local","python","url","web","xml"],"created_at":"2026-03-06T00:03:24.783Z","updated_at":"2026-03-06T00:03:27.627Z","avatar_url":"https://github.com/dfface.png","language":"Python","readme":"# xml2epub\n\n![GitHub Repo stars](https://img.shields.io/github/stars/dfface/xml2epub)\n![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/dfface/xml2epub/.github/workflows/pypi.yml)\n[![python](https://img.shields.io/pypi/pyversions/xml2epub)](https://pypi.org/project/xml2epub/)\n[![pypi](https://img.shields.io/pypi/v/xml2epub)](https://pypi.org/project/xml2epub/)\n[![wheel](https://img.shields.io/pypi/wheel/xml2epub)](https://pypi.org/project/xml2epub/)\n[![license](https://img.shields.io/github/license/dfface/xml2epub)](https://pypi.org/project/xml2epub/)\n![PyPI - Downloads](https://img.shields.io/pypi/dd/xml2epub)\n\nBatch convert web pages, HTML files or images to a single e-book.\n\nFeatures:\n\n* Auto-generate cover: Uses matching `\u003ctitle\u003e` text (per [COVER_TITLE_LIST](./xml2epub/constants.py)) or a random generated cover default.\n* Auto-extract core content: Filters HTML to retain key elements (see [SUPPORTED_TAGS](./xml2epub/constants.py)).\n\n## ToC\n\n- [xml2epub](#xml2epub)\n  - [ToC](#toc)\n  - [How to install](#how-to-install)\n  - [Basic Usage](#basic-usage)\n  - [API](#api)\n    - [Epub object](#epub-object)\n      - [`Epub(title)`](#epubtitle)\n      - [`Epub.add_chapter(chapter_object)`](#epubadd_chapterchapter_object)\n      - [`Epub.create_epub(output_directory)`](#epubcreate_epuboutput_directory)\n    - [`create_chapter_from_file(path_to_file)`](#create_chapter_from_filepath_to_file)\n    - [`create_chapter_from_url(url)`](#create_chapter_from_urlurl)\n    - [`create_chapter_from_string(html_string)`](#create_chapter_from_stringhtml_string)\n    - [`html_clean(input_string)`](#html_cleaninput_string)\n  - [Tips](#tips)\n  - [FAQ](#faq)\n\n## How to install\n\n`xml2epub` is available on pypi: https://pypi.org/project/xml2epub/\n\n```bash\npip3 install xml2epub\n```\n\n## Basic Usage\n\n```python\nimport xml2epub\n\n## create an empty eBook, with toc located after cover\nbook = xml2epub.Epub(\"My New E-book Name\", toc_location=\"afterFirstChapter\")\n## create chapters by url\n#### custom your own cover image\nchapter0 = xml2epub.create_chapter_from_string(\"https://cdn.jsdelivr.net/gh/dfface/img0@master/2022/02-10-0R7kll.png\", title='cover', strict=False)\n#### create chapter objects\nchapter1 = xml2epub.create_chapter_from_url(\"https://dev.to/devteam/top-7-featured-dev-posts-from-the-past-week-h6h\")\nchapter2 = xml2epub.create_chapter_from_url(\"https://dev.to/ks1912/getting-started-with-docker-34g6\")\n## add chapters to your eBook\nbook.add_chapter(chapter0)\nbook.add_chapter(chapter1)\nbook.add_chapter(chapter2)\n## generate epub file\nbook.create_epub(\"Your Output Directory\")\n```\n\nAfter a short wait (no errors), \"My New E-book Name.epub\" will be generated in \"Your Output Directory\":\n\n![The generated epub file](https://cdn.jsdelivr.net/gh/dfface/img0@master/2022/02-09-Guz0bl.png)\n\nFor more **examples**, check the [examples](./examples) directory.\n\nIf no cover is inferred from the HTML, a random cover is generated.\n\n![The generated cover image](https://fastly.jsdelivr.net/gh/dfface/img0@master/2025/11-30-uLU9Bg-SiVVbc.jpg)\n\n## API\n\n### Epub object\n\n#### `Epub(title)`\n\n`Epub(title, creator='dfface', language='en', rights='', publisher='dfface/xml2epub', epub_dir=None, toc_location='end')`\n\nCreates Epub object (adds book info/chapters, generates EPUB file).\n\n* title (str): EPUB [title](http://kb.daisy.org/publishing/docs/epub/title.html) (per spec).\n* creator (Optional[str]): EPUB [author](http://kb.daisy.org/publishing/docs/html/dpub-aria/doc-credit.html) (per spec).\n* owner (Optional[str]): The owner of this file—yes, that's you! This affects the text in the top banner if you use our generated cover.\n* language (Optional[str]): EPUB [language](http://kb.daisy.org/publishing/docs/epub/language.html) (per spec).\n* rights (Optional[str]): EPUB [copyright](http://kb.daisy.org/publishing/docs/html/dpub-aria/doc-credit.html) (per spec).\n* publisher (Optional[str]): EPUB [publisher](http://kb.daisy.org/publishing/docs/html/dpub-aria/doc-credit.html) (per spec).\n* epub_dir (Optional[str]): Intermediate file path (default: system temp path).\n* toc_location (Optional[str]): ToC position (default: end; options: beginning/afterFirstChapter/end):\n  * beginning: ToC → chapters\n  * afterFirstChapter: Chapter1 (cover) → ToC → chapters\n  * end: Chapters → ToC\n\n#### `Epub.add_chapter(chapter_object)`\n\nAdd Chapter object (Created via 3 chapter creation methods) to EPUB.\n\n#### `Epub.create_epub(output_directory)`\n\n`Epub.create_epub(output_directory, epub_name=None, absolute_location=None)`\n\nGenerate EPUB file.\n\n* `output_directory` (str): Output directory for EPUB.\n* `epub_name` (Optional[str]): EPUB filename (no `.epub` suffix; printable chars only, defaults to `title`).\n* `absolute_location` (Optional[str]): Absolute path/name (no `.epub` suffix; overrides default `${cwd}/${output_directory}/${epub_name}`.epub; requires write permissions).\n\n### `create_chapter_from_file(path_to_file)`\n\n`create_chapter_from_file(file_name, url=None, title=None, strict=True, local=False)`\n\nCreate Chapter from HTML/XHTML file.\n\n* `file_name` (string): HTML/XHTML file path.\n* `url` (Optional[string]): Infers title; recommended for relative links.\n* `title` (Optional[string]): Chapter name (uses HTML `\u003ctitle\u003e` if None).\n* `strict` (Optional[boolean]): Strict cleaning (removes inline styles, trivial attrs); default True.\n* `local` (Optional[boolean]): Use local resources (copy images/CSS via paths, no online fetch).\n\n### `create_chapter_from_url(url)`\n\n`create_chapter_from_url(url, title=None, strict=True, local=False)`\n\nCreate Chapter by extracting webpage from URL.\n\n* `url` (string): Website link (recommended for resolving relative links).\n* `title` (Optional[string]): Chapter name (uses HTML `\u003ctitle\u003e` if None).\n* `strict` (Optional[boolean]): Strict page cleaning (removes inline styles/attrs; default True).False allows image links for custom covers.\n* `local` (Optional[boolean]): Use local resources (copy images/CSS via paths, no online fetch).\n\n### `create_chapter_from_string(html_string)`\n\n`create_chapter_from_string(html_string, url=None, title=None, strict=True, local=False)`\n\nCreate Chapter from string (base method for URL/file variants).\n\n* `html_string` (string): HTML/XHTML string; or image URL (strict=False) / image path (strict=False + local=True). Image as cover if title is None/ in [COVER_TITLE_LIST] (e.g., cover).\n* `url` (Optional[string]): Infers title; recommended for relative links.\n* `title` (Optional[string]): Chapter name (uses HTML \u003ctitle\u003e if None).\n* `strict` (Optional[boolean]): Strict page cleaning (removes inline styles/attrs; default True).\n* `local` (Optional[boolean]): Use local resources (copy images/CSS via paths, no online fetch).\n\n### `html_clean(input_string)`\n\n`html_clean(input_string, help_url=None, tag_clean_list=constants.TAG_DELETE_LIST, class_list=constants.CLASS_INCLUDE_LIST, tag_dictionary=constants.SUPPORTED_TAGS)`\n\nExposed internal default clean method for easy customization.\n\n* `input_string` (str): HTML/XML string.\n* `help_url` (Optional[str]): Current chapter URL (resolves relative links).\n* `tag_dictionary` (Optional[dict]): Tags/classes to retain (default: [SUPPORTED_TAGS](./xml2epub/constants.py), can be `None`: **retain all tags** except those specified in `tag_clean_list`).\n* `tag_clean_list` (Optional[list]): Tags to delete (full tag + subtags; default: [TAG_DELETE_LIST](./xml2epub/constants.py)). Preferably set `tag_dictionary` to `None`.\n* `class_clean_list` (Optional[list]): Tags to delete (class matches list; full tag + subtags; default: [CLASS_DELETE_LIST](./xml2epub/constants.py)).\n\n## Tips\n\n* Custom cover: Use `create_chapter_from_string` – set `html_string` to image URL (with `strict=False`) or local path (with `local=True` + `strict=False`). Recommend adding `title='Cover'`.\n* Custom web content cleaning: Fetch HTML via crawler → use exposed `html_clean` (recommend `tag_clean_list`, `class_clean_list`, url) → pass output to `create_chapter_from_string`'s `html_string` (keep `strict=False`).\n* For `create_chapter_*` + `strict=False`: Recommend `url` (resolves relative links).\n* For `html_clean`: Recommend `help_url` (resolves relative links).\n* Post-EPUB generation: Use [Calibre](https://calibre-ebook.com/) to convert to standard EPUB/mobi/azw3 (fix compatibility) or edit/adjust styles.\n* If the reading effect of the generated EPUB e-books is unsatisfactory on traditional readers such as Calibre, you can consider using [epub-browser](https://github.com/dfface/epub-browser) to read the generated EPUB e-books in your browser.\n* Local images/CSS/resources: Set `local=True` in `create_chapter_*` – program copies local resources instead of fetching online.\n\n## FAQ\n\n1. Generated EPUB has no content?\n\nEnsure the target URL is a static page accessible without login. If empty, fetch the HTML string (via crawler) and use `create_chapter_from_string` to generate EPUB.\n\n2. Generated EPUB has unwanted content?\n\nOur default HTML filtering may not cover all cases. Filter the HTML string yourself before using `create_chapter_from_string`.\n\n3. Generate EPUB from HTML string without content sanitization?\n\nSet `strict=False` in `create_chapter_from_string` to skip internal cleaning.\n\n4. Self-fetch \u0026 clean HTML string (steps):\n   1. Get HTML string via crawler (e.g., `requests.get(url).text`).\n   2. Clean it with exposed `html_clean` (e.g., `html_clean(html_string, tag_clean_list=['sidebar'])`) or custom methods.\n   3. Generate Chapter via `create_chapter_from_string(html_string, strict=False)` (set `strict=False` to skip internal cleaning).\n   4. Generate EPUB per basic usage (see example: [hugo2epub.py](examples/hugo2epub/hugo2epub.py)).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdfface%2Fxml2epub","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdfface%2Fxml2epub","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdfface%2Fxml2epub/lists"}