https://github.com/vishwagauravin/pdf-parser-client-side
A lightweight easy to use package to parse text from PDF files on client side without any server dependency.
https://github.com/vishwagauravin/pdf-parser-client-side
client-side pdf pdf-parser pdf-reader pdfjs
Last synced: 9 months ago
JSON representation
A lightweight easy to use package to parse text from PDF files on client side without any server dependency.
- Host: GitHub
- URL: https://github.com/vishwagauravin/pdf-parser-client-side
- Owner: VishwaGauravIn
- License: mit
- Created: 2023-10-15T16:39:06.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-16T16:23:12.000Z (over 1 year ago)
- Last Synced: 2025-03-23T11:05:02.131Z (10 months ago)
- Topics: client-side, pdf, pdf-parser, pdf-reader, pdfjs
- Language: TypeScript
- Homepage: https://www.npmjs.com/package/pdf-parser-client-side
- Size: 26.4 KB
- Stars: 10
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
## PDF Parser Client Side
A lightweight easy to use package to parse text from PDF files on client side without any server dependency.
## How to Install ?
Use npm or yarn to install this npm package
```js
npm i pdf-parser-client-side
```
or
```js
yarn add pdf-parser-client-side
```
Include the package
```js
import extractTextFromPDF from "pdf-parser-client-side";
```
#### `variant` Parameter
The `variant` parameter is used to specify the type of text extraction and replacement to be performed on the `extractedText`. Depending on the value of the `variant` parameter, different types of characters will be removed or retained.
| `variant` Value | Description | Regular Expression | Retained Characters |
| ----------------------------------------------- | -------------------------------------------------------------------------------------- | ---------------------------------- | -------------------------- |
| `clean` | Removes all non-ASCII characters and any spaces that follow them. | `/[^\x00-\x7F]+\ \*(?:[^\x00-\x7F] | )\*/g` | ASCII characters only |
| `alphanumeric` | Retains only alphanumeric characters (letters and numbers). | `/[^a-zA-Z0-9]+/g` | A-Z, a-z, 0-9 |
| `alphanumericwithspace` | Retains alphanumeric characters and spaces. | `/[^a-zA-Z0-9 ]+/g` | A-Z, a-z, 0-9, space |
| `alphanumericwithspaceandpunctuation` | Retains alphanumeric characters, spaces, and basic punctuation marks (.,!?,). | `/[^a-zA-Z0-9 .,!?]+/g` | A-Z, a-z, 0-9, space, .,!? |
| `alphanumericwithspaceandpunctuationandnewline` | Retains alphanumeric characters, spaces, basic punctuation marks (.,!?), and newlines. | `/[^a-zA-Z0-9 .,!?]+/g` | A-Z, a-z, 0-9, space, .,!? |
#### Example Usage
Javascript
```jsx
import React from "react";
import extractTextFromPDF from "pdf-parser-client-side";
export default function Test() {
const handleFileChange = async (e, variant) => {
const file = e.target.files?.[0];
if (file) {
try {
const text = await extractTextFromPDF(file, variant);
console.log("Extracted Text:", text);
} catch (error) {
console.error("Error extracting text from PDF:", error);
}
}
};
return (
handleFileChange(e, "clean")}
/>
);
}
```
Typescript
```tsx
import React from "react";
import extractTextFromPDF, { Variant } from "pdf-parser-client-side";
export default function Test() {
const handleFileChange = async (
e: React.ChangeEvent,
variant: Variant
) => {
const file = e.target.files?.[0];
if (file) {
try {
const text = await extractTextFromPDF(file, variant);
console.log("Extracted Text:", text);
} catch (error) {
console.error("Error extracting text from PDF:", error);
}
}
};
return (
handleFileChange(e, "clean")}
/>
);
}
```
## Contributing
Feel free to contribute!
1. Fork the repository
2. Make changes
3. Submit a pull request
### [> with 💛 by Vishwa Gaurav](https://itsvg.in)