An open API service indexing awesome lists of open source software.

https://github.com/norconex/importer

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
https://github.com/norconex/importer

extract html java java-library manipulation norconex-importer parse pdf

Last synced: about 1 year ago
JSON representation

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.

Awesome Lists containing this project

README

          

Importer
==========

Norconex Importer is a Java library and command-line application meant to
"parse" and "extract" content out of a computer file as plain text, whatever
its format (HTML, PDF, Word, etc). In addition, it allows you to perform any
manipulation on the extracted text before importing/using it in your own
service or application.

Website: https://opensource.norconex.com/importer/