An open API service indexing awesome lists of open source software.

https://github.com/commitd/krill

Improved HTML output for Tika extraction
https://github.com/commitd/krill

baleen docx pdf pdfbox tika

Last synced: 5 months ago
JSON representation

Improved HTML output for Tika extraction

Awesome Lists containing this project

README

          

# Krill

Generates HTML representations of a documents (PDF, CSV, XLS, etc) along with metadata.

Uses Apache Tika (https://tika.apache.org/) and PDFBox (https://pdfbox.apache.org/).

[![Travis branch](https://img.shields.io/travis/commitd/krill.svg?style=flat-square)](https://travis-ci.org/commitd/krill)
[![Coveralls](https://img.shields.io/coveralls/commitd/krill.svg?style=flat-square)](https://coveralls.io/github/commitd/krill)