https://github.com/adrienjoly/npm-pdfreader-example

Example of use of pdfreader: parse a PDF résumé
https://github.com/adrienjoly/npm-pdfreader-example

example pdf-parsing

Last synced: 4 months ago
JSON representation

Example of use of pdfreader: parse a PDF résumé

Host: GitHub
URL: https://github.com/adrienjoly/npm-pdfreader-example
Owner: adrienjoly
Created: 2017-03-19T14:43:26.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2022-05-01T09:49:03.000Z (over 3 years ago)
Last Synced: 2025-08-30T15:32:04.974Z (5 months ago)
Topics: example, pdf-parsing
Language: JavaScript
Homepage: https://www.npmjs.com/package/pdfreader
Size: 895 KB
Stars: 16
Watchers: 2
Forks: 11
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Examples of use of [pdfreader](https://github.com/adrienjoly/npm-pdfreader)

## How to install and run

```sh

git clone https://github.com/adrienjoly/npm-pdfreader-example.git

cd npm-pdfreader-example

npm install

node parseItems.js

```

## 1. Parsing lines of text

![example cv resume parse convert pdf to text](parseRows.png)

Here is the code required to convert this PDF file into text:

```js

var pdfreader = require('pdfreader');

var rows = {}; // indexed by y-position

function printRows() {

  Object.keys(rows) // => array of y-positions (type: float)

    .sort((y1, y2) => parseFloat(y1) - parseFloat(y2)) // sort float positions

    .forEach((y) => console.log((rows[y] || []).join('')));

}

new pdfreader.PdfReader().parseFileItems('CV_ErhanYasar.pdf', function(err, item){

  if (!item || item.page) {

    // end of file, or page

    printRows();

    console.log('PAGE:', item.page);

    rows = {}; // clear rows for next page

  }

  else if (item.text) {

    // accumulate text items into rows object, per line

    (rows[item.y] = rows[item.y] || []).push(item.text);

  }

});

```

## 2. Parsing a table

![example cv resume parse convert pdf table to text](parseTable.png)

Here is the code required to convert this PDF file into a textual table:

```js

var pdfreader = require('pdfreader');

const nbCols = 2;

const cellPadding = 40; // each cell is padded to fit 40 characters

const columnQuantitizer = (item) => parseFloat(item.x) >= 20;

const padColumns = (array, nb) =>

  Array.apply(null, {length: nb}).map((val, i) => array[i] || []);

  // .. because map() skips undefined elements

const mergeCells = (cells) => (cells || [])

  .map((cell) => cell.text).join('') // merge cells

  .substr(0, cellPadding).padEnd(cellPadding, ' '); // padding

const renderMatrix = (matrix) => (matrix || [])

  .map((row, y) => padColumns(row, nbCols)

    .map(mergeCells)

    .join(' | ')

  ).join('\n');

var table = new pdfreader.TableParser();

new pdfreader.PdfReader().parseFileItems(filename, function(err, item){

  if (!item || item.page) {

    // end of file, or page

    console.log(renderMatrix(table.getMatrix()));

    console.log('PAGE:', item.page);

    table = new pdfreader.TableParser(); // new/clear table for next page

  } else if (item.text) {

    // accumulate text items into rows object, per line

    table.processItem(item, columnQuantitizer(item));

  }

});

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adrienjoly/npm-pdfreader-example

Awesome Lists containing this project

README