{"id":13862667,"url":"https://github.com/leoncvlt/blinkist-scraper","last_synced_at":"2025-06-28T14:36:43.705Z","repository":{"id":51004317,"uuid":"251381117","full_name":"leoncvlt/blinkist-scraper","owner":"leoncvlt","description":"📚 Python tool to download book summaries and audio from Blinkist.com, and generate some pretty output","archived":false,"fork":false,"pushed_at":"2021-05-08T16:58:04.000Z","size":13853,"stargazers_count":199,"open_issues_count":16,"forks_count":36,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-05-08T21:09:18.265Z","etag":null,"topics":["blinkist","python","scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/leoncvlt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-30T17:37:37.000Z","updated_at":"2025-04-22T16:50:17.000Z","dependencies_parsed_at":"2022-09-25T00:32:01.991Z","dependency_job_id":null,"html_url":"https://github.com/leoncvlt/blinkist-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/leoncvlt/blinkist-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leoncvlt%2Fblinkist-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leoncvlt%2Fblinkist-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leoncvlt%2Fblinkist-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leoncvlt%2Fblinkist-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/leoncvlt","download_url":"https://codeload.github.com/leoncvlt/blinkist-scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leoncvlt%2Fblinkist-scraper/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262446718,"owners_count":23312601,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blinkist","python","scraping"],"created_at":"2024-08-05T06:01:49.692Z","updated_at":"2025-06-28T14:36:43.679Z","avatar_url":"https://github.com/leoncvlt.png","language":"Python","funding_links":["https://www.buymeacoffee.com/leoncvlt"],"categories":["Python"],"sub_categories":[],"readme":"# blinkist-scraper\n\nA python script to download book summaries and audio from [Blinkist](https://www.blinkist.com/) and generate some pretty output files.\n\n## Installation / Requirements\n\nMake sure you're in your virtual environment of choice, then run\n- `poetry install --no-dev` if you have [Poetry](https://python-poetry.org/) installed\n- `pip install -r requirements.txt` otherwise\n\nThis script uses [ChromeDriver](chromedriver.chromium.org) to automate the Google Chrome browser - therefore Google Chrome needs to be installed in order to work.\n\nThe script will automatically try to download and use the appropriate chromedriver distribution for your OS and Chrome version. If this doesn't work, download the right version for you from https://chromedriver.chromium.org/downloads and use the `--chromedriver` argument to specify its path at runtime.\n\n## Usage\n\n```text\nusage: blinkistscraper [-h] [--language {en,de}] [--match-language]\n                       [--cooldown COOLDOWN] [--headless] [--audio]\n                       [--concat-audio] [--keep-noncat] [--no-scrape]\n                       [--book BOOK] [--daily-book] [--books BOOKS]\n                       [--book-category BOOK_CATEGORY]\n                       [--categories CATEGORIES [CATEGORIES ...]]\n                       [--ignore-categories IGNORE_CATEGORIES [IGNORE_CATEGORIES ...]]\n                       [--create-html] [--create-epub] [--create-pdf]\n                       [--save-cover] [--embed-cover-art] \n                       [--chromedriver CHROMEDRIVER] [--no-ublock] [--no-sandbox] [-v]\n                       email password\n\npositional arguments:\n  email                 The email to log into your premium Blinkist account\n  password              The password to log into your premium Blinkist account\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --language {en,de}    The language to scrape books in - either 'en' for\n                        english or 'de' for german\n  --match-language      Skip scraping books if not in the requested language\n                        (not all book are avaible in german)\n  --cooldown COOLDOWN   Seconds to wait between scraping books, and\n                        downloading audio files. Can't be smaller than 1\n  --headless            Start the automated web browser in headless mode.\n                        Works only if you already logged in once\n  --audio               Download the audio blinks for each book.\n  --concat-audio        Concatenate the audio blinks into a single file and\n                        tag it. Requires ffmpeg\n  --keep-noncat         Keep the individual blink audio files, instead of\n                        deleting them (works with '--concat-audio' only)\n  --no-scrape           Don't scrape the website, only process existing json\n                        files in the dump folder. Do not provide email or\n                        password with this option.\n  --book BOOK           Scrapes this book only, takes the Blinkist URL for the\n                        book (e.g. https://www.blinkist.com/en/books/... or\n                        https://www.blinkist.com/en/nc/reader/...)\n  --daily-book          Scrapes the free daily book only.\n  --books BOOKS         Scrapes the list of books, takes a txt file with the\n                        list of Blinkist URL's for the books (e.g.\n                        https://www.blinkist.com/en/books/... or\n                        https://www.blinkist.com/en/nc/reader/...)\n  --book-category BOOK_CATEGORY\n                        When scraping a single book, categorize it under this\n                        category (works with '--book' and '--daily-book' only)\n  --categories CATEGORIES [CATEGORIES ...]\n                        Only the categories whose label contains at least one\n                        string here will be scraped. Case-insensitive; use\n                        spaces to separate categories. (e.g. '--categories\n                        entrep market' will only scrape books under\n                        'Entrepreneurship' and 'Marketing \u0026 Sales')\n  --ignore-categories IGNORE_CATEGORIES [IGNORE_CATEGORIES ...]\n                        If a category label contains anything in\n                        ignored_categories, books under that category will not\n                        be scraped. Case-insensitive; use spaces to separate\n                        categories. (e.g. '--ignored-categories entrep market'\n                        will skip scraping of 'Entrepreneurship' and\n                        'Marketing \u0026 Sales')\n  --create-html         Generate a formatted html document for the book\n  --create-epub         Generate a formatted epub document for the book\n  --create-pdf          Generate a formatted pdf document for the book.\n                        Requires wkhtmltopdf\n  --save-cover          Save a copy of the Blink cover artwork in the folder\n  --embed-cover-art     Embed the Blink cover artwork into the concatenated\n                        audio file (works with '--concat-audio' only)\n  --chromedriver CHROMEDRIVER\n                        Path to a specific chromedriver executable instead of\n                        the built-in one\n  --no-ublock           Disable the uBlock Chrome extension. This will\n                        completely skip the installation (and setup) of\n                        ublock. If you want to use ublock content blocking, then\n                        run the script again without this flag.\n  --no-sandbox          When running as root (e.g. in Docker), Chrome requires\n                        the '--no-sandbox' argument     \n  -v, --verbose         Increases logging verbosity\n```\n\n## Basic usage\n`python blinkistscraper email password` where email and password are the login details to your premium Blinkist account.\n\nThe script uses Selenium with a Chrome driver to scrape the site automatically using the provided credentials. Sometimes during scraping, a captcha block-page will appear. When this happens, the script will try to pause and wait for the user to solve it. After some time (i.e. one minute), the script will time out.\nThe output files are stored in the `books` folder, arranged in subfolders by category and by the book's title and author.\n\n## Customizing HTML output\nThe script builds a nice-looking html version of the book by using the 'book.html' and 'chapter.html' files in the 'templates' folder as a base. Every parameter between curly braces in those files (e.g. `{title}`) is replaced by the appropriate value from the book metadata (dumped in the `dump` folder upon scraping), following a 1-to-1 naming convention with the json parameters (.e.g `{title}` will be replaced by the `title` parameter, `{who_should_read}` but the `who_should_read` one and so on).\n\nThe special field `{__chapters__}` is replaced with all the book's chapters. Chapters are created by parsing each `chapter` object in the book metadata and using the `chapter.html` template file in the same fashion, replacing tokens with the parameters inside the `chapter` object.\n\n## Generating .pdf\nAdd the `--create-pdf` argument to the script to generate a .pdf file from the .html one. This requires the [wkhtmltopdf](https://wkhtmltopdf.org/) tool to be installed and present in the PATH.\n\n## Downloading audio\nThe script download audio blinks as well when adding the `--audio` argument. This is done by waiting for a request to the Blinkist's `audio` endpoint in their `library` api for the first chapter's audio blink which is sent as soon as the user navigates to a book's reader page; then re-using the valid request's headers to build additional requests to the rest of the chapter's audio files. The files are downloaded as `.m4a`.\n\n## Concatenating audio files\nAdd the `--concat-audio` argument to the script to concatenate the individual audio blinks into a single file and tag it with the appropriate book title and author. Doing this will delete all individual blinks and replace them with one audio file (per book), only. To keep both the individual blink audio files, also, use the `--keep-noncat` argument together with the `--concat-audio` argument (i.e. `--concat-audio --keep-noncat`). This requires the [ffmpeg](https://www.ffmpeg.org/) tool to be installed and present in the PATH.\n\n## Processing book dumps with no scraping\nDuring scraping, the script saves all book's metadata in json files inside the `dump` folder. Those can be used by the script to re-generate the .html, .epub and .pdf output files without having to scrape the website again. To do so, pass the `--no-scrape` argument to the script without providing an email or a password.\n\n## Scraping with a free account\nIf you don't have a Blinkist premium account, you can still scrape the free daily book. To do so automatically, pass the `--daily-book` argument - this behaves like scraping a single book.\n\n## Quirks \u0026 known Bugs\n- Some people have had troubles when dealing with long generated book files (\u003e 260 characters in Windows). Although this should be handled gracefully by the script, if you keep seeing \"FileNotFoundError\" when trying to create the .html / .m4a files, try and turn on long filenames support on your system: https://www.itprotoday.com/windows-10/enable-long-file-name-support-windows-10, and make sure you have a recent distribution of ffmpeg if using it (old versions had some bugs in dealing with long filenames)\n\n## Support [![Buy me a coffee](https://img.shields.io/badge/-buy%20me%20a%20coffee-lightgrey?style=flat\u0026logo=buy-me-a-coffee\u0026color=FF813F\u0026logoColor=white \"Buy me a coffee\")](https://www.buymeacoffee.com/leoncvlt)\nIf this tool has proven useful to you, consider [buying me a coffee](https://www.buymeacoffee.com/leoncvlt) to support development of this and [many other projects](https://github.com/leoncvlt?tab=repositories).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleoncvlt%2Fblinkist-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fleoncvlt%2Fblinkist-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleoncvlt%2Fblinkist-scraper/lists"}