https://github.com/dgrothe-phd/wordcheckerjava
Java UI that gathers words from raw text files, sorted in alphabetical order. Search for multiple terms at once. Quickly find misspelt words in large texts, or get context around keywords.
https://github.com/dgrothe-phd/wordcheckerjava
html java-8 spellcheck text tokenizer word
Last synced: about 2 months ago
JSON representation
Java UI that gathers words from raw text files, sorted in alphabetical order. Search for multiple terms at once. Quickly find misspelt words in large texts, or get context around keywords.
- Host: GitHub
- URL: https://github.com/dgrothe-phd/wordcheckerjava
- Owner: DGrothe-PhD
- License: mit
- Created: 2021-02-03T20:53:14.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-02-17T12:40:10.000Z (3 months ago)
- Last Synced: 2025-02-17T13:37:45.918Z (3 months ago)
- Topics: html, java-8, spellcheck, text, tokenizer, word
- Language: Java
- Homepage:
- Size: 1.91 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# WordCheckerJava
A Java UI which counts word occurrences in a text file. Results are sorted in alphabetical order and saved in a simple HTML file. With this, you can:
* Search for multiple terms in one go.
* Quickly find misspelt words in large texts
* Sneak into the context around some keywords.## How it works
Numbers and bracketed numbers (often reference signs) are listed separately.
User can uncheck words, symbols, numbers or custom search terms, thereby shortening the result html file. Scenarios:
Unchecking symbols can be useful if a source text contains some mathematical expressions, for example. Unchecking numbers may be helpful if a text contains line numbers.With regular expressions, simple filtering for data types similar to ISBN, date or time, URLs or e-mail addresses is done; these tokens are listed separately as well. As regular expressions can be fiddly, I focused on a simple and stable solution so this filtering may not always yield perfect results.
## Use cases
* Filter ISBNs, e-mail addresses or URLs that occur in a long e-mail or messenger thread.
* Have all occurrences of a word or a name in a long thesis been spelt in the same manner?
* Does a text contain a specific search term?
If so, in which context, that is, within which sentence or paragraph, does the search term appear?
The size of the shown text around that search term can be specified by the user.## Usage
`Open File`: browse to a text file. Set the topic (this will be the HTML title) and the target filename. The ending `.html` will be added automatically if missing. Target file will be placed in the same folder as the source text file (folder choice button may be implemented later).Then click on `Start`. Words from the text file are gathered, collected and sorted. Finally, click the button `Show Results` to open the result file in a browser.
## Browser and editor support
Basically, the results.html file can be opened with any later browser, e.g. Firefox and Microsoft Edge.
As clickable `` tags are used, Internet Explorer is not fully supported. The html source code itself is kept readable so that compatibility issues are minimized.This way, quick review of an entry by command line tools as well: such as
(Linux only) `less results.html | grep ` or (Windows command line) `type results.html | findstr "1"`. Any suitable editor tool works as well.## Requirements
To compile, a JDK is required, it will run on a JRE 8 or later.
The jar file can then be used on different platforms including a Raspberry Pi (it has been tested on Raspberry Pi3).
![]()
![]()
![]()