https://github.com/jishnusv23/pdf-to-json-converter
https://github.com/jishnusv23/pdf-to-json-converter
fsmodule mammoth pdf-converter pdfextractor reactjs typescript utf-8
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/jishnusv23/pdf-to-json-converter
- Owner: jishnusv23
- Created: 2024-11-06T08:42:26.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-11-08T06:45:03.000Z (7 months ago)
- Last Synced: 2025-01-13T21:44:36.541Z (5 months ago)
- Topics: fsmodule, mammoth, pdf-converter, pdfextractor, reactjs, typescript, utf-8
- Language: TypeScript
- Homepage: https://pdf-to-json-converter.vercel.app
- Size: 2.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PDF-to-JSON-Converter - Machine Task
## Different ways to convert files into JSON format, supporting various modules and the `fs` module.
## PDF to JSON Conversion using ("pdf.js-extract")
- This is one of the easiest ways to extract PDF content, providing additional features like:
- Extracting detailed metadata.
- Retrieving page details such as width, string (`str`), and font size.## Markdown File to JSON Conversion
- Different npm packages are available for this task, but I used the `fs` module with UTF-8 encoding for simplicity.
- Additional **regex methods** were used to structure the JSON format effectively.## MS Word to JSON Conversion using ("mammoth")
- The `mammoth` package makes it easy to extract raw text from MS Word files.
- Additional iteration methods were implemented to refine the extracted content.## Hosting the Project
- **Frontend:** Vercel
- **Backend:** Render## Notes
- These methods are some of the ways to convert files into JSON.
- Many other packages and AI APIs are available for more structured and advanced conversions.
- Exploring these options can lead to better results.