Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/opengovsg/pdf2md
A PDF to Markdown converter
https://github.com/opengovsg/pdf2md
markdown pdf-converter
Last synced: about 2 months ago
JSON representation
A PDF to Markdown converter
- Host: GitHub
- URL: https://github.com/opengovsg/pdf2md
- Owner: opengovsg
- License: mit
- Fork: true (jzillmann/pdf-to-markdown)
- Created: 2019-06-04T03:05:28.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-11-27T06:12:17.000Z (2 months ago)
- Last Synced: 2024-12-02T02:12:06.401Z (about 2 months ago)
- Topics: markdown, pdf-converter
- Language: JavaScript
- Homepage: https://www.npmjs.com/package/@opendocsg/pdf2md
- Size: 1.88 MB
- Stars: 216
- Watchers: 4
- Forks: 41
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pdf2md
JavaScript npm library to parse PDF files and convert them into Markdown
## Major Changes
See [Releases](https://github.com/opendocsg/pdf2md/releases)
## Usage
### Library
```js
const fs = require('fs')
const pdf2md = require('@opendocsg/pdf2md')const pdfBuffer = fs.readFileSync(filePath)
pdf2md(pdfBuffer, callbacks)
.then(text => {
let outputFile = allOutputPaths[i] + '.md'
console.log(`Writing to ${outputFile}...`)
fs.writeFileSync(path.resolve(outputFile), text)
console.log('Done.')
})
.catch(err => {
console.error(err)
})
```### CLI tool
```
$ cd [project_folder]
$ npx @opendocsg/pdf2md --inputFolderPath=[your input folder path] --outputFolderPath=[your output folder path] --recursive
```If you are converting recursively on a large number of files you might encounter the error "Allocation failed - JavaScript heap out of memory”. Instead, run the command
```
$ node lib/pdf2md-cli.js --max-old-space-size=4096 --inputFolderPath=[your input folder path] --outputFolderPath=[your output folder path] --recursive
```Options:
1. Input folder path (should exist)
2. Output folder path (should exist)
3. Recursive - convert all PDFs for folders within folders. Specify the tag if you require recursive, and omit if you don't## Credits
[pdf-to-markdown](https://github.com/jzillmann/pdf-to-markdown) - original project by Johannes Zillmann
[pdf.js](https://mozilla.github.io/pdf.js/) - Mozilla's PDF parsing & rendering platform which is used as a raw parser