https://github.com/sidmishraw/docpruner
DocPruner is an utility for pruning bad PDFs for cs 267 project and PDF processor
https://github.com/sidmishraw/docpruner
docpruner
Last synced: 8 months ago
JSON representation
DocPruner is an utility for pruning bad PDFs for cs 267 project and PDF processor
- Host: GitHub
- URL: https://github.com/sidmishraw/docpruner
- Owner: sidmishraw
- Created: 2017-05-17T20:58:55.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-05-17T20:59:28.000Z (about 9 years ago)
- Last Synced: 2025-01-15T06:48:41.756Z (over 1 year ago)
- Topics: docpruner
- Language: Java
- Size: 32.2 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
### DocPruner
Prunes the bad PDFs(probably scanned images of IEEE documents from IEEE Xplore) and
moves them out of the `input_pdfs` folder and moves folders `pdf_jsons` and
`pdf_grouped_jsons` out of the cs267_project folder so that the PDF - JSON generation
process can be started from scratch.
The artifact/jar (executable) jar is located in [here](./out/artifacts/DocPruner_jar/DocPruner.jar)
#### Usage:
```
java -jar path_to_DocPruner.jar
```
Incase of concerns contact: sidharth.mishra@sjsu.edu