Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tswicegood/pykindle

Simple Python parser for Kindle's My Clippings.txt file
https://github.com/tswicegood/pykindle

Last synced: 12 days ago
JSON representation

Simple Python parser for Kindle's My Clippings.txt file

Host: GitHub
URL: https://github.com/tswicegood/pykindle
Owner: tswicegood
License: other
Created: 2010-05-07T03:50:47.000Z (over 14 years ago)
Default Branch: master
Last Pushed: 2010-06-08T03:23:57.000Z (over 14 years ago)
Last Synced: 2024-04-14T14:50:17.564Z (7 months ago)
Language: Python
Homepage:
Size: 97.7 KB
Stars: 9
Watchers: 4
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.rst
- License: LICENSE.cddl

Awesome Lists containing this project

README

        PyKindle

========

Python library for parsing Kindle clippings

Usage

-----

Pretty straight forward::

    >>> from kindle.parser import parse

    >>> path_to_clippings = "/Volumes/Kindle/documents/My Clippings.txt"

    >>> clippings = parse(path_to_clippings)

Now you have a list of dictionaries containing all of the data that Kindle

tracks inside its ``My Clippings.txt`` file.

For example, here's the structure of the last clipping out of my Kindle::

    >>> clippings[-1]

    {'author': 'Bryan Lawson',

     'date': datetime.datetime(2010, 5, 6, 21, 26, 0, 3),

     'location': '624-25',

     'notes': 'Clients often seem to find it easier to  communicate their wishes by reacting to and criticising a proposed  design, than by trying to draw up an abstract comprehensive performance   specification.',

     'title': 'How Designers Think, Fourth Edition: The Design Process Demystified',

     'type': 'Highlight'}

Handling Pragmatic Bookshelf books

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There is one oddity here that I feel obliged to mention (seeing as how I write

for them and you've all bought `my book`_ and

want to use pyKindle to parse your notes from it, right?): Pragmatic Bookshelf.

Each of the digital version that Pragmatic Bookshelf creates is watermarked

with the name of the original purchaser.  This is generally a good thing, but

if you're syncing to a site such as `Readernaut`_, you

need to be able to search the original book name.

You can provide filters (see the source code for examples) for all of the

fields as they come out of the Kindle.  Here's an example one-liner for

adjusting the title field to account for the Pragmatic Bookshelf issue::

    >>> from kindle.parser import filter_title

    >>> title_filter = lambda x: filter_title(x).replace(" (Travis Swicegood)", '')

    >>> clippings = kindle.parser.parse(path_to_clippings, title_filter=title_filter)

Dig into the code.  It's not complicated.  You'll be able to see what all the filters

are doing and how to make awesome filters for yourself. :-)

.. _my book: http://pragprog.com/titles/tsgit

.. _Readernaut: http://readernaut.com