{"id":19188808,"url":"https://github.com/miquido/parsepub","last_synced_at":"2025-04-15T01:37:14.490Z","repository":{"id":117573718,"uuid":"185740961","full_name":"miquido/parsepub","owner":"miquido","description":"A universal tool written in Kotlin designed to convert an EPUB publication into a data model used later by a reader. In addition it also provides validation and a system that informing about the inconsistency of the format. The project was made by Miquido. https://www.miquido.com/","archived":false,"fork":false,"pushed_at":"2019-05-24T06:25:43.000Z","size":1152,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-28T13:37:33.442Z","etag":null,"topics":["epub","epub-parser"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miquido.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-09T06:38:03.000Z","updated_at":"2024-10-09T19:21:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"70f987e4-8814-497a-bfa8-7f928f8eb2d7","html_url":"https://github.com/miquido/parsepub","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miquido%2Fparsepub","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miquido%2Fparsepub/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miquido%2Fparsepub/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miquido%2Fparsepub/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miquido","download_url":"https://codeload.github.com/miquido/parsepub/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248990365,"owners_count":21194746,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["epub","epub-parser"],"created_at":"2024-11-09T11:26:06.208Z","updated_at":"2025-04-15T01:37:14.472Z","avatar_url":"https://github.com/miquido.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/miquido/parsepub.svg?branch=master)](https://travis-ci.org/miquido/parsepub)  [![Download](https://api.bintray.com/packages/miquido/maven-repo/parsepub/images/download.svg) ](https://bintray.com/miquido/maven-repo/parsepub/_latestVersion)\n\n# parsepub\n\n---\n\n## **Overview**\n\n**parsepub** is a universal tool written in Kotlin designed to convert an EPUB publication into a data model used later by a reader. In addition it also provides validation and a system that informing about the inconsistency of the format.  \n\n---\n\n## **Features**\n\n* converting the publication to a model containing all resources and necessary information\n* providing EPUB format support in versions 2.0 and 3.0 for all major tags\n* handling inconsistency errors or lack of necessary elements in the publication structure\n* support for displaying information when element structure attributes are missing\n\n---\n\n## Restrictions\nIn order for program to work properly the EPUB file must be created in accordance with the format requirements.   \nSpec for [EPUB 3.0](http://idpf.org/epub/30)  \nSpec for [EPUB 2.1](http://idpf.org/epub/201)\n\n---\n## Base model - description\nThe EpubBook class contains all information from an uncompressed EPUB publication.\nEach of the parameters corresponds to a set of information parsed from the elements of the publication structure.\n```bash\ndata class EpubBook (\n    val epubOpfFilePath: String? = null,\n    val epubTocFilePath: String? = null,\n    val epubCoverImage: EpubResourceModel? = null,\n    val epubMetadataModel: EpubMetadataModel? = null,\n    val epubManifestModel: EpubManifestModel? = null,\n    val epubSpineModel: EpubSpineModel? = null,\n    val epubTableOfContentsModel: EpubTableOfContentsModel? = null\n)\n```\n*epubOpfFilePath* - Contains absolute path to the .opf file.  \n*epubTocFilePath* - Contains absolute path to the .toc file.  \n*epubCoverImage* - Contains all information about the publication cover image.  \n*epubMetadataModel* - Contains all publication resources.  \n*epubManifestModel* -  Contains all basic information about the publication.  \n*epubSpineModel* -  Contains list of references in reading order.  \n*epubTableOfContentsModel* - Contains table of contents of the publication.  \n\nMore info about the elements of the publication in the  \n**\"Information about epub format for non-developers\"** section\n\n## Quick start\nTo convert the selected EPUB publication, create an instance of the EpubParser class\n```bash\nval epubParser = EpubParser()\n```\nnext call `parse` method on it \n```bash\nepubParser.parse(inputPath, decompressPath)\n```\nThis method returns an *EpubBook* class object and has two parameters:  \n*inputPath* - the path to the EPUB file,  \n*decompressPath* - path to the place where the file should be unpacked\n\n### Error handling in the structure of the publication\nThe structure of the converted file may be incorrect for one main reason - no required elements of publications such as **Metadata, Manifest, Spine, Table of Contents**.\n\n**Solution - ValidationListeners**  \nTo limit the unexpected effects of an incorrect structure, we can create an implementation for properly prepared listeners that will alert us when the format will be wrong.  \nOn the previously created instance of the *EpubParser()* class, we call the `setValidationListeners` method, in the body of which we create the implementation of our listeners.  \nEach listener has been assigned to a specific element.\n```bash\nepubParser.setValidationListeners {\n   setOnMetadataMissing { Log.e(ERROR_TAG, \"Metadata missing\") }\n   setOnManifestMissing { Log.e(ERROR_TAG, \"Manifest missing\") }\n   setOnSpineMissing { Log.e(ERROR_TAG, \"Spine missing\") }\n   setOnTableOfContentsMissing { Log.e(ERROR_TAG, \"Table of contents missing\") }\n} \n```\n\n### Displaying information about missing attributes\nOur parsing method can return unexpected results also when the set of attributes in the file structure element is not complete  \ne.g. missing **language** attribute in **Metadata** element.\n\n**Solution - onAttributeMissing**  \nThe mechanism that we created is the answer to the problem illustrated above and it is the part of ValidationListener.  \nWhen the required attribute is not correct or missing, our listener reports information with name of him and his parent.  \nAs parameters, we receive two values:  \n*parentElement* - the name of the main element in which the error occurs  \n*attributeName* - name of the missing attribute\n\n```bash\nsetOnAttributeMissing { parentElement, attributeName -\u003e\n    Log.e(\"$parentElement warn\", \"missing $attributeName attribute\")\n}\n```\n\n## Information about epub format for non-developers\n**EPUB** is an e-book file format that uses the \".epub\" file extension.\nIts structure is based on the main elements, such as: **Metadata, Manifest, Spine, Table of Contents**.\n\n**Metadata** - contains all metadata information for a specific EPUB file. Three metadata attributes are required (though many are still available):  \n*title* - contains the title of the book. \\\n*language* - contains the language of the book, \\\n*identifier* - contains the unique identifier of the book.\n\n```\n\u003cmetadata xmlns:dc=\"http://purl.org/dc/elements/1.1/\"\u003e\n   \u003cdc:title id=\"title\"\u003eTitle of the book\u003c/dc:title\u003e\n   \u003cdc:language\u003een\u003c/dc:language\u003e\n   \u003cdc:identifier id=\"pub-id\"\u003eid-identifier\u003c/dc:identifier\u003e\n```\n**Manifest** - element lists all the files. Each file is represented by an element, and has the required attributes:  \n*id* - id of the resource  \n*href* - location of the resource  \n*media-type* - type and format of the resource\n\n**Spine** - element lists all the XHTML content documents in their linear reading order.  \n  \n**Table of contents** - contains the hierarchical table of contents for the EPUB file.  \nA description of the full TOC specification can be found here:  \nTOC spec for [EPUB 2.0](http://www.idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.4.1)  \nTOC spec for [EPUB 3.0](https://www.idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-nav)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiquido%2Fparsepub","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiquido%2Fparsepub","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiquido%2Fparsepub/lists"}