https://github.com/dulajkavinda/pdf-lecture

📄Convert PDFs into vanilla text.
https://github.com/dulajkavinda/pdf-lecture

nodejs npm npm-package open-source pdf-converter pdf-lecture pdf2json pdfs

Last synced: 5 days ago
JSON representation

📄Convert PDFs into vanilla text.

Host: GitHub
URL: https://github.com/dulajkavinda/pdf-lecture
Owner: dulajkavinda
Created: 2020-05-11T06:51:02.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-05-11T06:51:37.000Z (over 5 years ago)
Last Synced: 2025-09-13T06:52:44.680Z (29 days ago)
Topics: nodejs, npm, npm-package, open-source, pdf-converter, pdf-lecture, pdf2json, pdfs
Language: JavaScript
Homepage: https://www.npmjs.com/package/pdf-lecture
Size: 2.92 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # pdf-lecture

**Javascript cross-platform module to extract texts from PDFs.**

[![version](https://img.shields.io/npm/v/pdf-lecture.svg)](https://www.npmjs.org/package/pdf-lecture)

[![downloads](https://img.shields.io/npm/dt/pdf-lecture.svg)](https://www.npmjs.org/package/pdf-lecture)

## Similar Packages

* [pdf2json](https://www.npmjs.com/package/pdf2json) buggy, no support anymore, memory leak, throws non-catchable fatal errors

* [j-pdfjson](https://www.npmjs.com/package/j-pdfjson) fork of pdf2json

* [pdf-parser](https://github.com/dunso/pdf-parse) buggy, no tests

* [pdfreader](https://www.npmjs.com/package/pdfreader) using pdf2json

* [pdf-extract](https://www.npmjs.com/package/pdf-extract) not cross-platform using xpdf

## Installation

`npm install pdf-lecture`

 

## Basic Usage - Local Files

```js

var path = require("path");

var fs = require("fs");

var filePath = path.join(__dirname, "..your path");

var PDF = require("pdf-lecture");

PDF(filePath).then((data) => {

   console.log(data.numpages)

   console.log(data.text)

   console.log(data.pageTextArray)

});

```

## Exception Handling

```js

const fs = require('fs');

var filePath = path.join(__dirname, "..your path");

var PDF = require("pdf-lecture");

PDF(filePath).then(function(data) {

	// use data

})

.catch(function(error){

	// handle exceptions

})

```

```js

// default render callback

function render_page(pageData) {

  let render_options = {

    normalizeWhitespace: false,

    disableCombineTextItems: false,

  };

  return pageData.getTextContent(render_options).then(function (textContent) {

    let lastY,

      text = "";

    for (let item of textContent.items) {

      if (lastY == item.transform[5] || !lastY) {

        text += item.str;

      } else {

        text += "\n" + item.str;

      }

      lastY = item.transform[5];

    }

    return text;

  });

}

let options = {

    pagerender: render_page

}

let dataBuffer = fs.readFileSync('path to PDF file...');

pdf(dataBuffer,options).then(function(data) {

	//use new format

});

```

## Options

```js

const DEFAULT_OPTIONS = {

	// internal page parser callback

	// you can set this option, if you need another format except raw text

	pagerender: render_page,

	

	// max page number to parse

	max: 0,

	

	//check https://mozilla.github.io/pdf.js/getting_started/

	version: 'v1.10.100'

}

```

### *pagerender* (callback)

If you need another format except raw text.  

### *max* (number)

Max number of page to parse. If the value is less than or equal to 0, parser renders all pages.  

### *version* (string, pdf.js version)

check [pdf.js](https://mozilla.github.io/pdf.js/getting_started/)

* `'default'`

* `'v1.9.426'`

* `'v1.10.100'`

* `'v1.10.88'`

* `'v2.0.550'`

>*default* version is *v1.10.100*   

>[mozilla.github.io/pdf.js](https://mozilla.github.io/pdf.js/getting_started/#download)

### Submitting an Issue

If you find a bug or a mistake, you can help by submitting an issue to [Github Repository](https://github.com/dulajkavinda/pdf-lecture/issues)

## License

[MIT licensed]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dulajkavinda/pdf-lecture

Awesome Lists containing this project

README