Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hunspell/hunspell
The most popular spellchecking library.
https://github.com/hunspell/hunspell
natural-language-processing spell-check spell-checker spell-checking-engine spellcheck spellchecker stemming
Last synced: 5 days ago
JSON representation
The most popular spellchecking library.
- Host: GitHub
- URL: https://github.com/hunspell/hunspell
- Owner: hunspell
- License: lgpl-2.1
- Created: 2015-06-11T11:57:22.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2024-11-30T19:33:06.000Z (about 1 month ago)
- Last Synced: 2024-12-31T19:05:41.028Z (12 days ago)
- Topics: natural-language-processing, spell-check, spell-checker, spell-checking-engine, spellcheck, spellchecker, stemming
- Language: C++
- Homepage: http://hunspell.github.io/
- Size: 8.06 MB
- Stars: 2,180
- Watchers: 56
- Forks: 247
- Open Issues: 262
-
Metadata Files:
- Readme: README
- Changelog: ChangeLog
- License: COPYING
- Authors: AUTHORS
Awesome Lists containing this project
- awesome-starred-test - hunspell/hunspell - The most popular spellchecking library. (C++)
- low-resource-languages - hunspell - Spell checker and morphological analyzer library and program designed for languages with rich morphology and complex word compounding or character encoding. (Software / Utilities)
README
# About Hunspell
Hunspell is a free spell checker and morphological analyzer library
and command-line tool, licensed under LGPL/GPL/MPL tri-license.Hunspell is used by LibreOffice office suite, free browsers, like
Mozilla Firefox and Google Chrome, and other tools and OSes, like
Linux distributions and macOS. It is also a command-line tool for
Linux, Unix-like and other OSes.It is designed for quick and high quality spell checking and
correcting for languages with word-level writing system,
including languages with rich morphology, complex word compounding
and character encoding.Hunspell interfaces: Ispell-like terminal interface using Curses
library, Ispell pipe interface, C++/C APIs and shared library, also
with existing language bindings for other programming languages.Hunspell's code base comes from OpenOffice.org's MySpell library,
developed by Kevin Hendricks (originally a C++ reimplementation of
spell checking and affixation of Geoff Kuenning's International
Ispell from scratch, later extended with eg. n-gram suggestions),
see http://lingucomponent.openoffice.org/MySpell-3.zip, and
its README, CONTRIBUTORS and license.readme (here: license.myspell) files.Main features of Hunspell library, developed by László Németh:
- Unicode support
- Highly customizable suggestions: word-part replacement tables and
stem-level phonetic and other alternative transcriptions to recognize
and fix all typical misspellings, don't suggest offensive words etc.
- Complex morphology: dictionary and affix homonyms; twofold affix
stripping to handle inflectional and derivational morpheme groups for
agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian,
Turkish; 64 thousand affix classes with arbitrary number of affixes;
conditional affixes, circumfixes, fogemorphemes, zero morphemes,
virtual dictionary stems, forbidden words to avoid overgeneration etc.
- Handling complex compounds (for example, for Finno-Ugric, German and
Indo-Aryan languages): recognizing compounds made of arbitrary
number of words, handle affixation within compounds etc.
- Custom dictionaries with affixation
- Stemming
- Morphological analysis (in custom item and arrangement style)
- Morphological generation
- SPELLML XML API over plain spell() API function for easier integration
of stemming, morpological generation and custom dictionaries with affixation
- Language specific algorithms, like special casing of Azeri or Turkish
dotted i and German sharp s, and special compound rules of Hungarian.Main features of Hunspell command line tool, developed by László Németh:
- Reimplementation of quick interactive interface of Geoff Kuenning's Ispell
- Parsing formats: text, OpenDocument, TeX/LaTeX, HTML/SGML/XML, nroff/troff
- Custom dictionaries with optional affixation, specified by a model word
- Multiple dictionary usage (for example hunspell -d en_US,de_DE,de_medical)
- Various filtering options (bad or good words/lines)
- Morphological analysis (option -m)
- Stemming (option -s)See man hunspell, man 3 hunspell, man 5 hunspell for complete manual.
Translations: Hunspell has been translated into several languages already. If your language is missing or incomplete, please use [Weblate](https://hosted.weblate.org/engage/hunspell/) to help translate Hunspell.
# Dependencies
Build only dependencies:
g++ make autoconf automake autopoint libtool
Runtime dependencies:
| | Mandatory | Optional |
|---------------|------------------|------------------|
|libhunspell | | |
|hunspell tool | libiconv gettext | ncurses readline |# Compiling on GNU/Linux and Unixes
We first need to download the dependencies. On Linux, `gettext` and
`libiconv` are part of the standard library. On other Unixes we
need to manually install them.For Ubuntu:
sudo apt install autoconf automake autopoint libtool
Then run the following commands:
autoreconf -vfi
./configure
make
sudo make install
sudo ldconfigFor dictionary development, use the `--with-warnings` option of
configure.For interactive user interface of Hunspell executable, use the
`--with-ui` option.Optional developer packages:
- ncurses (need for --with-ui), eg. libncursesw5 for UTF-8
- readline (for fancy input line editing, configure parameter:
--with-readline)In Ubuntu, the packages are:
libncurses5-dev libreadline-dev
# Compiling on OSX and macOS
On macOS for compiler always use `clang` and not `g++` because Homebrew
dependencies are build with that.brew install autoconf automake libtool gettext
brew link gettext --forceThen run:
autoreconf -vfi
./configure
make# Compiling on Windows
## Compiling with Mingw64 and MSYS2
Download Msys2, update everything and install the following
packages:pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool
Open Mingw-w64 Win64 prompt and compile the same way as on Linux, see
above.## Compiling in Cygwin environment
Download and install Cygwin environment for Windows with the following
extra packages:- make
- automake
- autoconf
- libtool
- gcc-g++ development package
- ncurses, readline (for user interface)
- iconv (character conversion)Then compile the same way as on Linux. Cygwin builds depend on
Cygwin1.dll.# Debugging
It is recommended to install a debug build of the standard library:
libstdc++6-6-dbg
For debugging we need to create a debug build and then we need to start
`gdb`../configure CXXFLAGS='-g -O0 -Wall -Wextra'
make
./libtool --mode=execute gdb src/tools/hunspellYou can also pass the `CXXFLAGS` directly to `make` without calling
`./configure`, but we don't recommend this way during long development
sessions.If you like to develop and debug with an IDE, see documentation at
https://github.com/hunspell/hunspell/wiki/IDE-Setup# Testing
Testing Hunspell (see tests in tests/ subdirectory):
make check
or with Valgrind debugger:
make check
VALGRIND=[Valgrind_tool] make checkFor example:
make check
VALGRIND=memcheck make check# Documentation
features and dictionary format:
man 5 hunspell
man hunspell
hunspell -hhttp://hunspell.github.io/
# Usage
After compiling and installing (see INSTALL) you can run the Hunspell
spell checker (compiled with user interface) with a Hunspell or Myspell
dictionary:hunspell -d en_US text.txt
or without interface:
hunspell
hunspell -d en_GB -lLinking with Hunspell static library:
g++ -lhunspell-1.7 example.cxx
# or better, use pkg-config
g++ $(pkg-config --cflags --libs hunspell) example.cxx# Installing Hunspell (vcpkg)
Alternatively, you can build and install hunspell using [vcpkg](https://github.com/Microsoft/vcpkg/) dependency manager:
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install hunspellThe hunspell port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please [create an issue or pull request](https://github.com/Microsoft/vcpkg) on the vcpkg repository.
## Dictionaries
Hunspell (MySpell) dictionaries:
- https://wiki.documentfoundation.org/Language_support_of_LibreOffice
- http://cgit.freedesktop.org/libreoffice/dictionaries
- http://extensions.libreoffice.org
- https://extensions.openoffice.org
- https://wiki.openoffice.org/wiki/DictionariesAspell dictionaries (conversion: man 5 hunspell):
- ftp://ftp.gnu.org/gnu/aspell/dict
László Németh, nemeth at numbertext org