Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bsorrentino/pdf-tools

Extract Markdown + Images from PDF
https://github.com/bsorrentino/pdf-tools

extract-images markdown pdf

Last synced: 3 months ago
JSON representation

Extract Markdown + Images from PDF

Awesome Lists containing this project

README

        

[![npm](https://img.shields.io/npm/v/@bsorrentino/pdf-tools.svg)](https://www.npmjs.com/package/@bsorrentino/pdf-tools) 
 
 

 
![example workflow](https://github.com/bsorrentino/pdf-tools/actions/workflows/npm-publish.yml/badge.svg)

# pdf-tools

Tools to extract/transform data from PDF

> inspired by project: [pdf-to-markdown](https://github.com/jzillmann/pdf-to-markdown)

## Installation

```
npm install @bsorrentino/pdf-tools -g
```

## Requirements

* NodeJs >= 16
* Since **pdf-tools** use [`canvas`] that is a [`Cairo`]-backed Canvas implementation for Node.js take a look to its [reqirements]

## pdftools Commands

**common options**
```
-o, --outdir [folder] output folder (default: "out")
```

### pdfximages

extract images (as png) from pdf and save it to the given folder

**Usage:**
```
pdftools pdfximages|pxi [options]
```

### pdf2images

create an image (as png) for each pdf page

**Usage:**
```
pdftools pdf2images|p2i
```

### pdf2md

convert pdf to markdown format.

**Usage:**
```
pdftools pdf2md|p2md [options]
```

**Options:**
```
-ps, --pageseparator [separator] add page separator (default: "---")
--imageurl [url prefix] imgage url prefix
--stats print stats information
--debug print debug information
```
----

## Conversion to Markdown

### supported features

* Detect headers
* Detect and extract images
* Extract plain text
* Extract fonts and allow custom mapping through a generated file `.font.json`
> Supported fonts **bold**, _italic_, `monospace`, **_bold+italic_**
* Detect code block ( i.e. ` ``` `)
* Detect external link

### TO DO

* Detect TOC

[`canvas`]: https://www.npmjs.com/package/canvas
[`Cairo`]: http://cairographics.org/
[reqirements]: https://github.com/Automattic/node-canvas#compiling