https://github.com/commitd/krill
Improved HTML output for Tika extraction
https://github.com/commitd/krill
baleen docx pdf pdfbox tika
Last synced: 5 months ago
JSON representation
Improved HTML output for Tika extraction
- Host: GitHub
- URL: https://github.com/commitd/krill
- Owner: commitd
- License: apache-2.0
- Created: 2017-03-29T09:41:57.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2022-09-01T22:18:56.000Z (almost 4 years ago)
- Last Synced: 2025-07-04T06:07:27.851Z (12 months ago)
- Topics: baleen, docx, pdf, pdfbox, tika
- Language: Java
- Size: 1.92 MB
- Stars: 4
- Watchers: 2
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Krill
Generates HTML representations of a documents (PDF, CSV, XLS, etc) along with metadata.
Uses Apache Tika (https://tika.apache.org/) and PDFBox (https://pdfbox.apache.org/).
[](https://travis-ci.org/commitd/krill)
[](https://coveralls.io/github/commitd/krill)