Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/norconex/importer

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
https://github.com/norconex/importer

extract html java java-library manipulation norconex-importer parse pdf

Last synced: about 1 month ago
JSON representation

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.

Awesome Lists containing this project

README

        

Importer
==========

Norconex Importer is a Java library and command-line application meant to
"parse" and "extract" content out of a computer file as plain text, whatever
its format (HTML, PDF, Word, etc). In addition, it allows you to perform any
manipulation on the extracted text before importing/using it in your own
service or application.

Website: https://opensource.norconex.com/importer/