https://github.com/otobrglez/speedy-limeta

Experimenting with GIN index (intarray) in PostgreSQL
https://github.com/otobrglez/speedy-limeta

postgresql sql

Last synced: about 2 months ago
JSON representation

Experimenting with GIN index (intarray) in PostgreSQL

Host: GitHub
URL: https://github.com/otobrglez/speedy-limeta
Owner: otobrglez
Created: 2017-11-24T18:28:16.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2017-11-24T18:28:39.000Z (over 8 years ago)
Last Synced: 2025-02-24T12:17:20.965Z (over 1 year ago)
Topics: postgresql, sql
Language: HTML
Homepage:
Size: 3.23 MB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Speedy Limeta

## Question

> Nekoč sem se srečal s situacijo, ko je v SQL bazi seznam vseh veljavnih besed (cca 100k).

Imaš npr 8 črk, zanima pa te katere vse besede iz slovarja lahko sestaviš s temi 8 ali manj črkami.

Primer: vhod ETRAMOIL, rešitve MILA, TAM, IME, LIMETA,... // [Uroš Kotolenko](https://twitter.com/ukotolenko) at [Slovenski Developerji](https://www.facebook.com/groups/272812582821861/permalink/1097738063662638/), November 2017  

## Setup

Prepare PostgreSQL...

```bash

# Create PG file structure

initdb -E utf8 db/pg-data -U postgres 

# Run PG in foreground

postgres -D db/pg-data

# Create database

createdb limeta -E utf8 -U postgres

# Create tables and load words

psql -U postgres limeta -a -f ./create_tables.sql

# Load words

psql -U postgres limeta -a -c "COPY words FROM '$(PWD)/data/words_si-SL.csv' CSV DELIMITER ','"

```

## Database structure

```sql

DROP TABLE IF EXISTS words;

CREATE EXTENSION IF NOT EXISTS intarray;

CREATE TABLE words (

  word          VARCHAR(60) NOT NULL,

  letter_numbers INT []

);

CREATE INDEX idx_letter_number ON words USING GIN(letter_numbers gin__int_ops);

```

## Lookup

```sql

WITH wan AS (

	SELECT ARRAY(SELECT ascii(unnest(regexp_split_to_array(upper('ETRAMOIL'), ''))))::int[] as lookup_numbers

)

SELECT words.word

FROM wan, words

WHERE wan.lookup_numbers @> words.letter_numbers

ORDER BY icount(wan.lookup_numbers & words.letter_numbers) DESC

```

## Explain

```bash

psql -U postgres limeta -qAt -f lookup.sql > analyse.json

```

![Explain](plan.png "Explain visualized")

## Data preparation / cleanup

```bash

./cleanup_words.py > ./data/words-si-SL.csv

```

## Credits

- [Slovar besed slovenskega jezika - Inštitut za slovenski jezik Frana Ramovša ZRC SAZU](http://bos.zrc-sazu.si/sbsj.html), November 2017

## Author

- [Oto Brglez](https://github.com/otobrglez)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/otobrglez/speedy-limeta

Awesome Lists containing this project

README