Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/paulocheque/epub-meta

Small Python library to read metadata information from an ePub (2 and 3) file.
https://github.com/paulocheque/epub-meta

Last synced: 3 days ago
JSON representation

Small Python library to read metadata information from an ePub (2 and 3) file.

Host: GitHub
URL: https://github.com/paulocheque/epub-meta
Owner: paulocheque
License: agpl-3.0
Created: 2016-08-22T03:59:52.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2024-07-26T15:06:03.000Z (5 months ago)
Last Synced: 2024-12-17T12:05:19.143Z (10 days ago)
Language: Python
Homepage:
Size: 11.3 MB
Stars: 44
Watchers: 7
Forks: 17
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        # epub-meta

[![Build Status](https://travis-ci.org/paulocheque/epub-meta.png?branch=master)](https://travis-ci.org/paulocheque/epub-meta)

[![Coverage Status](https://coveralls.io/repos/github/paulocheque/epub-meta/badge.svg?ts=1)](https://coveralls.io/github/paulocheque/epub-meta)

[![Code Status](https://landscape.io/github/paulocheque/epub-meta/master/landscape.png)](https://landscape.io/github/paulocheque/epub-meta/)

**Latest version: 0.0.7 (2018/09)**

Small **Python** library to read **metadata** information from an **ePub** (2 and 3) file. 

It does not depends on any library and run on Python 3 and 2.

## Installation

    pip install epub_meta

## Usage

    import epub_meta

Discover the main metadata of the ePub file

    >>> metadata = epub_meta.get_epub_metadata('/path/to/my_epub_file.epub')

    >>> type(metadata)

    

    >>> metadata

    { ... }

Example:

    >>> data = epub_meta.get_epub_metadata('/path/to/pro_git.epub', read_cover_image=True, read_toc=True)

    >>> print(data)

    {

        'authors': [u'Scott Chacon'],

        'epub_version': u'2.0',

        # ISBN, uuids etc

        'identifiers': [u'bf50c6e1-eb0a-4a1c-a2cd-ea8809ae086a', u'9781430218333'],

        'language': u'en',

        'publication_date': u'2009-08-19T00:00:00+00:00',

        'publisher': u'Springer',

        'subject': u'Software Development',

        'title': u'Pro Git',

        # import base64 ; base64.b64decode(data.cover_image_content)

        'cover_image_content': [base64 string],

        'cover_image_extension': '.jpg',

        'toc': [

            {'index': 0, 'title': 'Getting Started', 'src': 'progit_split_000.html', 'level': 0},

            {'index': 1, 'title': 'Git Basics', 'level': 0, 'src': 'progit_split_008.html'},

            {'index': 2, 'title': 'Git Branching', 'level': 0, 'src': 'progit_split_017.html'},

            {'index': 3, 'title': 'Git on the Server', 'src': 'progit_split_025.html', 'level': 0},

            {'index': 4, 'title': 'Distributed Git', 'src': 'progit_split_037.html', 'level': 0},

            {'index': 5, 'title': 'Git Tools', 'src': 'progit_split_042.html', 'index': 5, 'level': 0},

            {'index': 6, 'title': 'Customizing Git', 'src': 'progit_split_051.html', 'level': 0},

            {'index': 7, 'title': 'Git and Other Systems', 'src': 'progit_split_057.html', 'level': 0},

            {'index': 8, 'title': 'Git Internals', 'src': 'progit_split_061.html', 'level': 0}

        ],

        'file_size_in_bytes': 4346158

    }

You can access the dict keys using *dot* notation:

    data.authors

    data.epub_version

    ...

You should check for invalid ePub files or for unknown ePub conventions:

    try:

        epub_meta.get_epub_metadata('/path/to/my_epub_file.epub')    

    except epub_meta.EPubException as e:

        print(e)

To discover and parse yourself the ePub OPF file, you can get the content of the *OPF - XML* file:

    print(epub_meta.get_epub_opf_xml('/path/to/my_epub_file.epub'))

## Change Log

##### 0.0.7 (2016-09-08)

- Fixed url encoded strings

- Accepting relative paths

- Discover description if available

##### 0.0.6 (2016-12-12)

- Parsing and reading authors in pr02.html file if available

- Parsing and reading the publish date in pr01.html if available

##### 0.0.5 (2016-11-02)

- No more duplicate authors (preserving the order)

- Improvements in the ToC parser/reader

- Avoid infinite loop for bad/unknown epub files

##### 0.0.4 (2016-11-02)

- Backward incompatibility: Returning ToC as a list of objects instead of a list of strings

- The ToC information includes the source of the section: property `src`

- The ToC is hierarchical, so we include a `level` property to identify the depth of the toc item in the list

- The ToC order is important, so we include a `index` property to keep the order explicit

- Trimming some string values

##### 0.0.3 (2016-08-23)

- Added the file size into the metadata dict

##### 0.0.2 (2016-08-22)

- Fixed TOC discovering for ePub v3 files

##### 0.0.1 (2016-08-19)

- `get_epub_metadata(path, read_cover_image=True, read_toc=True)` function

- `get_epub_opf_xml(path)` function

- Read cover image content in base64

- Read TOC contents as an list of strings

## Development

Useful commands:

    # Create a virtual env

    make prepare

    # Install al dependencies

    make deps

    # Run tests

    make test

    # Run tests with Tox (for all Python compatible versions)

    make test_all

    # Run coverage

    make coverage

    # Useful command for running tests before pushing to Git

    make push