https://github.com/postgrespro/hunspell_dicts
Hunspell dictionaries for PostgreSQL
https://github.com/postgrespro/hunspell_dicts
Last synced: 2 months ago
JSON representation
Hunspell dictionaries for PostgreSQL
- Host: GitHub
- URL: https://github.com/postgrespro/hunspell_dicts
- Owner: postgrespro
- License: other
- Created: 2016-05-05T13:16:03.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2019-11-25T12:38:43.000Z (over 5 years ago)
- Last Synced: 2025-02-22T17:12:34.351Z (2 months ago)
- Language: TSQL
- Size: 6.35 MB
- Stars: 63
- Watchers: 30
- Forks: 17
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Hunspell dictionaries
## Introduction
The repository contains hunspell dictionaries for several languages listed below.
## Installation
Here the example of installation of the english dictionary:
$ git clone https://github.com/postgrespro/hunspell_dicts
$ cd hunspell_dicts/hunspell_en_us
$ make USE_PGXS=1 install
$ make USE_PGXS=1 installcheck
$ psql DB -c "CREATE EXTENSION hunspell_en_us;"> **Important:** Don't forget to set the `PG_CONFIG` variable in case you want to test dictionaries on a custom build of PostgreSQL. Read more [here](https://wiki.postgresql.org/wiki/Building_and_Installing_PostgreSQL_Extension_Modules).
After the installation you can use english dictionary and configuration:
```sql
=> SELECT ts_lexize('english_hunspell', 'stories');
ts_lexize
-----------
{story}
(1 row)=> SELECT to_tsvector('english_hunspell', 'stories');
to_tsvector
-------------
'story':1
(1 row)
```## Custom installation
If you want to create a dictionary which isn't listed below, you may want to download necessary [OpenOffice](https://extensions.openoffice.org/en/search?f%5B0%5D=field_project_tags%3A157) or [LibreOffice](https://extensions.libreoffice.org/extensions?getCategories=Dictionary&getCompatibility=any) extension.
Then you need to do the following (from the [documentation](https://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY)):
* OpenOffice/LibreOffice extension files have the .oxt extension. It is necessary to extract .aff and .dic files, change extensions to .affix and .dict. For some dictionary files it is also needed to convert characters to the UTF-8 encoding with commands (for example, for a Norwegian language dictionary):
```
iconv -f ISO_8859-1 -t UTF-8 -o nn_no.affix nn_NO.aff
iconv -f ISO_8859-1 -t UTF-8 -o nn_no.dict nn_NO.dic
```
* copy files to the $SHAREDIR/tsearch_data directory
* load files into PostgreSQL with the following command:
```sql
CREATE TEXT SEARCH DICTIONARY norwegian_hunspell (
TEMPLATE = ispell,
DictFile = nn_no,
AffFile = nn_no,
StopWords = norwegian
);
```## New dictionaries and configurations
The modules introduce the following dictionaries and configurations:
| Module | Dictionary/Configuration |
| ------ | ------------------------ |
| [hunspell_cs_cz](huspell_cs_cz) | czech_hunspell |
| [hunspell_de_de](hunspell_de_de) | german_hunspell |
| [hunspell_en_us](hunspell_en_us) | english_hunspell |
| [hunspell_fr](hunspell_fr) | french_hunspell |
| [hunspell_ne_np](hunspell_ne_np) | nepali_hunspell |
| [hunspell_nl_nl](hunspell_nl_nl) | dutch_hunspell |
| [hunspell_nn_no](hunspell_nn_no) | norwegian_hunspell |
| [hunspell_pt_pt](hunspell_pt_pt) | portuguese_hunspell |
| [hunspell_ru_ru](hunspell_ru_ru) | russian_hunspell |
| [hunspell_ru_ru_aot](hunspell_ru_ru_aot) | russian_aot_hunspell |