https://github.com/simon987/pg_asciifold
asciifold C-Language function based on Lucene's ASCIIFoldingFilter
https://github.com/simon987/pg_asciifold
postgresql
Last synced: about 1 month ago
JSON representation
asciifold C-Language function based on Lucene's ASCIIFoldingFilter
- Host: GitHub
- URL: https://github.com/simon987/pg_asciifold
- Owner: simon987
- License: gpl-3.0
- Created: 2020-06-07T16:40:28.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-06-07T17:18:20.000Z (almost 6 years ago)
- Last Synced: 2025-06-29T11:42:33.149Z (11 months ago)
- Topics: postgresql
- Language: C
- Homepage:
- Size: 34.2 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PostgreSQL ASCII folding
Reasonably fast (tested on Musicbrainz dataset, is 40% faster than a simple `UPPER()`)
ASCII folding functions based on [Lucene's ASCIIFoldingFilter](https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) for PostgreSQL
*Example:*
```
postgres=# SELECT asciifold('Hello, ⒩ᴐⱤú⒴⁈~!');
asciifold
----------------------
Hello, (n)ORu(y)?!~!
(1 row)
postgres=# SELECT asciifold_lower('Hello, ⒩ᴐⱤú⒴⁈~!');
asciifold
----------------------
hello, (n)oru(y)?!~!
(1 row)
```
UTF8 input string is not sanitized (invalid UTF8 might lead to undefined behavior)
### Compiling from source (CMake)
```bash
apt install postgresql-server-11-dev
cmake .
make
```
See [asciifolding.c](asciifolding.c) & [build.sh](build.sh) for more information