https://github.com/l2ysho/afpp
A fast, efficient, and minimal PDF parser for Node.js. Zero bloat. One dependency. Production-ready.
https://github.com/l2ysho/afpp
pdf pdfjs pdfparser pdftoimage pdftoimg pdftotext
Last synced: about 2 months ago
JSON representation
A fast, efficient, and minimal PDF parser for Node.js. Zero bloat. One dependency. Production-ready.
- Host: GitHub
- URL: https://github.com/l2ysho/afpp
- Owner: l2ysho
- License: mit
- Created: 2024-09-18T11:38:52.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-01-24T11:09:41.000Z (2 months ago)
- Last Synced: 2026-01-24T22:39:08.902Z (2 months ago)
- Topics: pdf, pdfjs, pdfparser, pdftoimage, pdftoimg, pdftotext
- Language: TypeScript
- Homepage:
- Size: 2.82 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# afpp

[](https://codecov.io/github/l2ysho/afpp)




> **afpp** — A modern, dependency-light PDF parser for Node.js.
>
> Built for performance, reliability, and developer sanity.
---
## Overview
`afpp` (Another PDF Parser, Properly) is a Node.js library for extracting text and images from PDF files without heavyweight native dependencies, event-loop blocking, or fragile runtime assumptions.
The project was created to address recurring problems encountered with existing PDF tooling in the Node.js ecosystem:
- Excessive bundle sizes and transitive dependencies
- Native build steps (canvas, ImageMagick, Ghostscript)
- Browser-specific assumptions (`window`, DOM, canvas)
- Poor TypeScript support
- Unreliable handling of encrypted PDFs
- Performance and memory inefficiencies
`afpp` focuses on **predictable behavior**, **explicit APIs**, and **production-ready defaults**.
---
## Key Features
- Zero native build dependencies
- Fully asynchronous, non-blocking architecture
- First-class TypeScript support
- Supports local files, buffers, and remote URLs
- Handles encrypted PDFs
- Configurable concurrency and rendering scale
- Minimal and auditable dependency graph
---
## Requirements
- **Node.js** >= 22.14.0
---
## Installation
Install using your preferred package manager:
```bash
npm install afpp
# or
yarn add afpp
# or
pnpm add afpp
```
---
## Quick Start
All parsing functions accept the same input types:
- `string` (file path)
- `Buffer`
- `URL`
### Extract Text from a PDF
```ts
import { readFile } from 'fs/promises';
import path from 'path';
import { pdf2string } from 'afpp';
(async () => {
const filePath = path.join('..', 'test', 'example.pdf');
const buffer = await readFile(filePath);
const pages = await pdf2string(buffer);
console.log(pages); // ['Page 1 text', 'Page 2 text', ...]
})();
```
---
### Render PDF Pages as Images
```ts
import { pdf2image } from 'afpp';
(async () => {
const url = new URL('https://pdfobject.com/pdf/sample.pdf');
const images = await pdf2image(url);
console.log(images); // [Buffer, Buffer, ...]
})();
```
---
### Streaming API (Large PDFs)
For large PDFs, use streaming functions to process pages incrementally without loading all results into memory:
```ts
import { writeFile } from 'fs/promises';
import { streamPdf2image, streamPdf2string } from 'afpp';
// Stream images - process each page as it's rendered
for await (const { pageNumber, pageCount, data } of streamPdf2image(
'./large.pdf',
)) {
await writeFile(`page-${pageNumber}.png`, data);
console.log(`Processed ${pageNumber}/${pageCount}`);
}
// Stream text - process each page as it's extracted
for await (const { pageNumber, data } of streamPdf2string('./large.pdf')) {
console.log(`Page ${pageNumber}: ${data.substring(0, 100)}...`);
}
```
**Benefits:**
- Lower peak memory usage
- Faster time-to-first-result
- Built-in progress tracking via `pageNumber` and `pageCount`
---
### Low-Level Parsing API
For advanced use cases, `parsePdf` exposes page-level control and transformation.
```ts
import { parsePdf } from 'afpp';
(async () => {
const response = await fetch('https://pdfobject.com/pdf/sample.pdf');
const buffer = Buffer.from(await response.arrayBuffer());
const result = await parsePdf(buffer, {}, (pageContent) => pageContent);
console.log(result);
})();
```
---
## Configuration
All public APIs accept a shared options object.
```ts
const result = await parsePdf(buffer, {
concurrency: 5,
imageEncoding: 'jpeg',
password: 'STRONG_PASS',
scale: 4,
});
```
### AfppParseOptions
| Option | Type | Default | Description |
| --------------- | ------------------------------------- | ------- | --------------------------------------------- |
| `concurrency` | `number` | `1` | Number of pages processed in parallel |
| `imageEncoding` | `'png' \| 'jpeg' \| 'webp' \| 'avif'` | `'png'` | Output format for rendered images |
| `password` | `string` | — | Password for encrypted PDFs |
| `scale` | `number` | `1.0` | Rendering scale (1.0 = 72 DPI, 2.0 = 144 DPI) |
---
## Design Principles
- **Node-first**: No browser globals or DOM assumptions
- **Explicit over implicit**: No magic configuration
- **Fail fast**: Clear errors instead of silent corruption
- **Production-oriented**: Optimized for long-running processes
---
## License
MIT © Richard Solár