Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/qwefgh90/jsearch
This library supports to extract text from documents (office, pdf, hwp)
https://github.com/qwefgh90/jsearch
extract extract-strings hwp java jsearch office
Last synced: 3 days ago
JSON representation
This library supports to extract text from documents (office, pdf, hwp)
- Host: GitHub
- URL: https://github.com/qwefgh90/jsearch
- Owner: qwefgh90
- License: other
- Created: 2014-11-09T12:15:33.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2017-11-23T08:08:35.000Z (almost 7 years ago)
- Last Synced: 2024-04-14T22:44:57.696Z (7 months ago)
- Topics: extract, extract-strings, hwp, java, jsearch, office
- Language: Ruby
- Homepage:
- Size: 67.4 MB
- Stars: 2
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# JSearch
[![Build Status](https://travis-ci.org/qwefgh90/jsearch.svg?branch=master)](https://travis-ci.org/qwefgh90/jsearch)
## Overview
JSearch is the open software to extract string and find keyword from HWP and Office format.## Download (Maven Central Repository)
```
io.github.qwefgh90
jsearch
0.3.0```
## Requirement
1. It should work with various types of document. ex) hwp, pdf, office
2. It should support extract string and rapidly find keyword from doucments.
3. It will be jar library.
4. All functions are synchronous.
5. a result of extraction contains full string.
6. a result of finding contains word count.### HWP
This software has been developed with reference to
the HWP file format open specification by Hancom, Inc.
http://www.hancom.co.kr/userofficedata.userofficedataList.do?menuFlag=3
한글과컴퓨터의 한/글 문서 파일(.hwp) 공개 문서를 참고하여 개발하였습니다.a part to handle *.hwp* format is forked source in *[java-hwp](https://github.com/ddoleye/java-hwp)* project.