Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/norconex/importer
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
https://github.com/norconex/importer
extract html java java-library manipulation norconex-importer parse pdf
Last synced: about 1 month ago
JSON representation
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
- Host: GitHub
- URL: https://github.com/norconex/importer
- Owner: Norconex
- License: apache-2.0
- Created: 2013-09-17T15:24:48.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2023-08-24T20:27:51.000Z (over 1 year ago)
- Last Synced: 2024-03-25T22:12:52.835Z (9 months ago)
- Topics: extract, html, java, java-library, manipulation, norconex-importer, parse, pdf
- Language: Java
- Homepage: http://www.norconex.com/collectors/importer/
- Size: 6.37 MB
- Stars: 32
- Watchers: 17
- Forks: 22
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.xml
- License: LICENSE.txt
Awesome Lists containing this project
README
Importer
==========Norconex Importer is a Java library and command-line application meant to
"parse" and "extract" content out of a computer file as plain text, whatever
its format (HTML, PDF, Word, etc). In addition, it allows you to perform any
manipulation on the extracted text before importing/using it in your own
service or application.Website: https://opensource.norconex.com/importer/