Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nuspell/nuspell
🖋️ Fast and safe spellchecking C++ library
https://github.com/nuspell/nuspell
natural-language-processing spellcheck spellchecker spellchecking spelling-checker spelling-corrector
Last synced: 18 days ago
JSON representation
🖋️ Fast and safe spellchecking C++ library
- Host: GitHub
- URL: https://github.com/nuspell/nuspell
- Owner: nuspell
- License: lgpl-3.0
- Created: 2017-11-22T14:00:02.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2024-08-09T08:35:06.000Z (3 months ago)
- Last Synced: 2024-08-09T21:45:50.974Z (3 months ago)
- Topics: natural-language-processing, spellcheck, spellchecker, spellchecking, spelling-checker, spelling-corrector
- Language: C++
- Homepage: https://nuspell.github.io
- Size: 7.75 MB
- Stars: 221
- Watchers: 12
- Forks: 24
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: COPYING
- Authors: AUTHORS
Awesome Lists containing this project
README
# About Nuspell
Nuspell is a fast and safe spelling checker software program. It is designed
for languages with rich morphology and complex word compounding.
Nuspell is written in modern C++ and it supports Hunspell dictionaries.Main features of Nuspell spelling checker:
- Provides software library and command-line tool.
- Suggests high-quality spelling corrections.
- Backward compatibility with Hunspell dictionary file format.
- Up to 3.5 times faster than Hunspell.
- Full Unicode support backed by ICU.
- Twofold affix stripping (for agglutinative languages, like Azeri,
Basque, Estonian, Finnish, Hungarian, Turkish, etc.).
- Supports complex compounds (for example, Hungarian, German and Dutch).
- Supports advanced features, for example: special casing rules
(Turkish dotted i or German sharp s), conditional affixes, circumfixes,
fogemorphemes, forbidden words, pseudoroots and homonyms.
- Free and open source software. Licensed under GNU LGPL v3 or later.# Building Nuspell
## Dependencies
Build-only dependencies:
- C++ 17 compiler with support for `std::filesystem`, e.g. GCC >= v9
- CMake >= v3.12
- Catch2 >= v3.1.1 (It is only needed when building the tests. If it is not
available as a system package, then CMake will download it using
`FetchContent`.)
- Getopt (It is needed only on Windows + MSVC and only when the CLI tool or
the tests are built. It is available in vcpkg. Other platforms provide
it out of the box.)
- Pandoc (optional, needed for building the man-page)Run-time (and build-time) dependencies:
- ICU4C
Recommended tools for developers: qtcreator, ninja, clang-format, gdb,
vim, doxygen.## Building on GNU/Linux and Unixes
We first need to download the dependencies. Some may already be
preinstalled.For Ubuntu and Debian:
```bash
sudo apt install g++ cmake libicu-dev catch2 pandoc
```Then run the following commands inside the Nuspell directory:
```bash
mkdir build
cd build
cmake ..
make
sudo make install
```For faster build process run `make -j`, or use Ninja instead
of Make.If you are making a Linux distribution package (dep, rpm) you need
some additional configurations on the CMake invocation. For example:```bash
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr
```## Building on OSX and macOS
1. Install Apple's Command-line tools.
2. Install Homebrew package manager.
3. Install dependencies with the next commands.```bash
brew install cmake icu4c catch2 pandoc
export ICU_ROOT=$(brew --prefix icu4c)
```Then run the standard cmake and make. See above. The ICU\_ROOT variable
is needed because icu4c is keg-only package in Homebrew and CMake can
not find it by default. Alternatively, you can use `-DICU_ROOT=...` on
the cmake command line.If you want to build with GCC instead of Clang, you need to pull GCC
with Homebrew and rebuild all the dependencies with it. See Homewbrew
manuals.## Building on Windows
### Compiling with Visual C++
1. Install Visual Studio 2017 or newer. Alternatively, you can use
Visual Studio Build Tools.
2. Install Git for Windows and Cmake.
3. Install Vcpkg in some folder, e.g. in `c:\vcpkg`.
4. Install Pandoc. You can manually install or use `choco install pandoc`.
5. Run the commands bellow. Vcpkg will work in manifest mode and it will
automatically install the dependencies.```bat
mkdir build
cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=c:\vcpkg\scripts\buildsystems\vcpkg.cmake -A x64
cmake --build .
```### Compiling with Mingw64 and MSYS2
Download MSYS2, update everything and install the following packages:
```bash
pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-icu \
mingw-w64-x86_64-cmake mingw-w64-x86_64-catch
```Then from inside the Nuspell folder run:
```bash
mkdir build
cd build
cmake .. -G "Unix Makefiles" -DBUILD_DOCS=OFF
make
make install
```### Building in Cygwin environment
Download the above mentioned dependencies with Cygwin package manager.
Then compile the same way as on Linux. Cygwin builds depend on
Cygwin1.dll.## Building on FreeBSD
Install the following required packages
```bash
pkg cmake icu catch2 pandoc
```Then run the standard cmake and make as on Linux. See above.
# Using the software
## Using the command-line tool
The main executable is located in `src/nuspell`.
After compiling and installing you can run the Nuspell spell checker
with a Nuspell, Hunspell or Myspell dictionary:nuspell -d en_US text.txt
For more details run see the [man-page](docs/nuspell.1.md).
## Using the Library
Sample program:
```cpp
#include
#include
#includeusing namespace std;
int main()
{
auto dirs = vector();
nuspell::append_default_dir_paths(dirs);
auto dict_path = nuspell::search_dirs_for_one_dict(dirs, "en_US");
if (empty(dict_path))
return 1; // Return error because we can not find the requested
// dictionary.auto dict = nuspell::Dictionary();
try {
dict.load_aff_dic(dict_path);
}
catch (const nuspell::Dictionary_Loading_Error& e) {
cerr << e.what() << '\n';
return 1;
}
auto word = string();
auto sugs = vector();
while (cin >> word) {
if (dict.spell(word)) {
cout << "Word \"" << word << "\" is ok.\n";
continue;
}cout << "Word \"" << word << "\" is incorrect.\n";
dict.suggest(word, sugs);
if (sugs.empty())
continue;
cout << " Suggestions are: ";
for (auto& sug : sugs)
cout << sug << ' ';
cout << '\n';
}
}
```On the command line you can link like this:
```bash
g++ example.cxx -std=c++17 -lnuspell -licuuc -licudata
# or better, use pkg-config
g++ example.cxx -std=c++17 $(pkg-config --cflags --libs nuspell)
```Within Cmake you can use `find_package()` to link. For example:
```cmake
find_package(Nuspell)
add_executable(myprogram main.cpp)
target_link_libraries(myprogram Nuspell::nuspell)
```# Dictionaries
Myspell, Hunspell and Nuspell dictionaries:
# Advanced topics
## Debugging NuspellFirst, always install the debugger:
```bash
sudo apt install gdb
```For debugging we need to create a debug build and then we need to start
`gdb`.```bash
mkdir debug
cd debug
cmake .. -DCMAKE_BUILD_TYPE=Debug
make -j
gdb src/nuspell/nuspell
```We recommend debugging to be done
[with an IDE](https://github.com/nuspell/nuspell/wiki/IDE-Setup).## Testing
To run the tests, run the following command after building:
ctest
# See also
Full documentation in the [wiki](https://github.com/nuspell/nuspell/wiki).
API Documentation for developers can be generated from the source files
by running:doxygen
The result can be viewed by opening `doxygen/html/index.html` in a web
browser.