https://github.com/florents-tselai/pg_fts_greek
Postgres FTS Improvements for Greek
https://github.com/florents-tselai/pg_fts_greek
full-text-search nlp postgresql postgresql-extension
Last synced: 3 months ago
JSON representation
Postgres FTS Improvements for Greek
- Host: GitHub
- URL: https://github.com/florents-tselai/pg_fts_greek
- Owner: Florents-Tselai
- Created: 2023-06-03T17:26:53.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-06-03T17:27:12.000Z (over 2 years ago)
- Last Synced: 2025-02-10T23:35:53.988Z (9 months ago)
- Topics: full-text-search, nlp, postgresql, postgresql-extension
- Language: Makefile
- Homepage:
- Size: 1000 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Postgres FTS Improvements for Greek
Packaged as an extension for now.
It adds another configuration `greek_ext` to be used like `to_tsvector('greek_ext', 'το κείμενο')`
instead of the vanilla `to_tsvector('greek', 'το κείμενο')`
The following sources ar used to create a `greek.stop` file which is then put under `prefix/share/tsearch_data`
- [NLTK Stopwords](https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip)
- [pmav99/greek_stopwords](https://github.com/pmav99/greek_stopwords/tree/master)
- [dourosdimitris/greek_stopwords](https://github.com/dourosdimitris/greek_stopwords)
## Comparison
| t | to_tsvector('greek', t) | to_tsvector('greek_ext', t) |
|----------------------------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------|
| 'το τετράγωνο της υποτείνουσας ενός ορθογωνίου τριγώνου' | 'εν':5 'ορθογων':6 'τ':3 'τετραγων':2 'το':1 'τριγων':7 'υποτεινουσ':4 | 'εν':5 'ορθογων':6 'τετραγων':2 'τριγων':7 'υποτεινουσ':4 |
| 'ο γιώργος είναι πονηρός' | 'γιωργ':2 'εινα':3 'ο':1 'πονηρ':4 | 'γιωργ':2 'πονηρ':4 |
| 'ο ήλιος ο πράσινος o ήλιος που ανατέλλει' | 'o':5 'ανατελλ':8 'ηλι':2,6 'ο':1,3 'π':7 'πρασιν':4 | 'ανατελλ':8 'ηλι':2,6 'πρασιν':4 |
## Installation
```shell
git submodule init
make greek.stop
make install
psql -c "CREATE EXTENSION pg_fts_greek"
```