https://github.com/vishwagauravin/pdf-parser-client-side

A lightweight easy to use package to parse text from PDF files on client side without any server dependency.
https://github.com/vishwagauravin/pdf-parser-client-side

client-side pdf pdf-parser pdf-reader pdfjs

Last synced: about 1 year ago
JSON representation

A lightweight easy to use package to parse text from PDF files on client side without any server dependency.

Host: GitHub
URL: https://github.com/vishwagauravin/pdf-parser-client-side
Owner: VishwaGauravIn
License: mit
Created: 2023-10-15T16:39:06.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-06-16T16:23:12.000Z (almost 2 years ago)
Last Synced: 2025-03-23T11:05:02.131Z (about 1 year ago)
Topics: client-side, pdf, pdf-parser, pdf-reader, pdfjs
Language: TypeScript
Homepage: https://www.npmjs.com/package/pdf-parser-client-side
Size: 26.4 KB
Stars: 10
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          


  
PDF Parser Client Side

  

 

 

 

 

 



## PDF Parser Client Side

A lightweight easy to use package to parse text from PDF files on client side without any server dependency.

## How to Install ?

Use npm or yarn to install this npm package

```js

npm i pdf-parser-client-side

```

or

```js

yarn add pdf-parser-client-side

```

Include the package

```js

import extractTextFromPDF from "pdf-parser-client-side";

```

#### `variant` Parameter

The `variant` parameter is used to specify the type of text extraction and replacement to be performed on the `extractedText`. Depending on the value of the `variant` parameter, different types of characters will be removed or retained.

| `variant` Value                                 | Description                                                                            | Regular Expression                 | Retained Characters        |

| ----------------------------------------------- | -------------------------------------------------------------------------------------- | ---------------------------------- | -------------------------- |

| `clean`                                         | Removes all non-ASCII characters and any spaces that follow them.                      | `/[^\x00-\x7F]+\ \*(?:[^\x00-\x7F] | )\*/g`                     | ASCII characters only |

| `alphanumeric`                                  | Retains only alphanumeric characters (letters and numbers).                            | `/[^a-zA-Z0-9]+/g`                 | A-Z, a-z, 0-9              |

| `alphanumericwithspace`                         | Retains alphanumeric characters and spaces.                                            | `/[^a-zA-Z0-9 ]+/g`                | A-Z, a-z, 0-9, space       |

| `alphanumericwithspaceandpunctuation`           | Retains alphanumeric characters, spaces, and basic punctuation marks (.,!?,).          | `/[^a-zA-Z0-9 .,!?]+/g`            | A-Z, a-z, 0-9, space, .,!? |

| `alphanumericwithspaceandpunctuationandnewline` | Retains alphanumeric characters, spaces, basic punctuation marks (.,!?), and newlines. | `/[^a-zA-Z0-9 .,!?]+/g`            | A-Z, a-z, 0-9, space, .,!? |

#### Example Usage

Javascript

```jsx

import React from "react";

import extractTextFromPDF from "pdf-parser-client-side";

export default function Test() {

  const handleFileChange = async (e, variant) => {

    const file = e.target.files?.[0];

    if (file) {

      try {

        const text = await extractTextFromPDF(file, variant);

        console.log("Extracted Text:", text);

      } catch (error) {

        console.error("Error extracting text from PDF:", error);

      }

    }

  };

  return (

    


       handleFileChange(e, "clean")}

      />

    

  );

}

```

Typescript

```tsx

import React from "react";

import extractTextFromPDF, { Variant } from "pdf-parser-client-side";

export default function Test() {

  const handleFileChange = async (

    e: React.ChangeEvent,

    variant: Variant

  ) => {

    const file = e.target.files?.[0];

    if (file) {

      try {

        const text = await extractTextFromPDF(file, variant);

        console.log("Extracted Text:", text);

      } catch (error) {

        console.error("Error extracting text from PDF:", error);

      }

    }

  };

  return (

    


       handleFileChange(e, "clean")}

      />

    

  );

}

```

## Contributing

Feel free to contribute!

1. Fork the repository

2. Make changes

3. Submit a pull request

### [> with 💛 by Vishwa Gaurav](https://itsvg.in)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vishwagauravin/pdf-parser-client-side

Awesome Lists containing this project

README

PDF Parser Client Side