{"id":13589975,"url":"https://github.com/postgrespro/rum","last_synced_at":"2025-05-15T03:05:08.286Z","repository":{"id":8846989,"uuid":"58131929","full_name":"postgrespro/rum","owner":"postgrespro","description":"RUM access method - inverted index with additional information in posting lists","archived":false,"fork":false,"pushed_at":"2025-04-18T15:48:51.000Z","size":2837,"stargazers_count":772,"open_issues_count":22,"forks_count":59,"subscribers_count":47,"default_branch":"master","last_synced_at":"2025-05-15T03:04:52.775Z","etag":null,"topics":["access-method","fulltext-search","index","postgresql"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/postgrespro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-05-05T13:15:09.000Z","updated_at":"2025-04-22T09:15:53.000Z","dependencies_parsed_at":"2024-06-19T01:48:53.629Z","dependency_job_id":"57d16226-24f0-4748-bdc7-63220065bb5c","html_url":"https://github.com/postgrespro/rum","commit_stats":{"total_commits":484,"total_committers":27,"mean_commits":"17.925925925925927","dds":0.7107438016528926,"last_synced_commit":"34619f96302f496e10e0cb6c9b4c28a846cf9a42"},"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Frum","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Frum/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Frum/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Frum/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/postgrespro","download_url":"https://codeload.github.com/postgrespro/rum/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254264765,"owners_count":22041793,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["access-method","fulltext-search","index","postgresql"],"created_at":"2024-08-01T16:00:37.131Z","updated_at":"2025-05-15T03:05:08.260Z","avatar_url":"https://github.com/postgrespro.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":"[![Build Status](https://api.travis-ci.com/postgrespro/rum.svg?branch=master)](https://travis-ci.com/postgrespro/rum)\n[![PGXN version](https://badge.fury.io/pg/rum.svg)](https://badge.fury.io/pg/rum)\n[![GitHub license](https://img.shields.io/badge/license-PostgreSQL-blue.svg)](https://raw.githubusercontent.com/postgrespro/rum/master/LICENSE)\n\n[![Postgres Professional](img/PGpro-logo.png)](https://postgrespro.com/)\n\n# RUM - RUM access method\n\n## Introduction\n\nThe **rum** module provides an access method to work with a `RUM` index. It is based\non the `GIN` access method's code.\n\nA `GIN` index allows performing fast full-text search using `tsvector` and\n`tsquery` types. But full-text search with a GIN index has several problems:\n\n- Slow ranking. It needs positional information about lexemes to do ranking. A `GIN`\nindex doesn't store positions of lexemes. So after index scanning, we need an\nadditional heap scan to retrieve lexeme positions.\n- Slow phrase search with a `GIN` index. This problem relates to the previous\nproblem. It needs positional information to perform phrase search.\n- Slow ordering by timestamp. A `GIN` index can't store some related information\nin the index with lexemes. So it is necessary to perform an additional heap scan.\n\n`RUM` solves these problems by storing additional information in a posting tree.\nFor example, positional information of lexemes or timestamps. You can get an\nidea of `RUM` with the following diagram:\n\n![How RUM stores additional information](img/gin_rum.png)\n\nA drawback of `RUM` is that it has slower build and insert times than `GIN`.\nThis is because we need to store additional information besides keys and because\n`RUM` uses generic Write-Ahead Log (WAL) records.\n\n## License\n\nThis module is available under the [license](LICENSE) similar to\n[PostgreSQL](http://www.postgresql.org/about/licence/).\n\n## Installation\n\nBefore building and installing **rum**, you should ensure following are installed:\n\n* PostgreSQL version is 9.6+.\n\nTypical installation procedure may look like this:\n\n### Using GitHub repository\n\n    $ git clone https://github.com/postgrespro/rum\n    $ cd rum\n    $ make USE_PGXS=1\n    $ make USE_PGXS=1 install\n    $ make USE_PGXS=1 installcheck\n    $ psql DB -c \"CREATE EXTENSION rum;\"\n\n### Using PGXN\n\n    $ USE_PGXS=1 pgxn install rum\n\n\u003e **Important:** Don't forget to set the `PG_CONFIG` variable in case you want to test `RUM` on a custom build of PostgreSQL. Read more [here](https://wiki.postgresql.org/wiki/Building_and_Installing_PostgreSQL_Extension_Modules).\n\n## Tests\n\n$ make check\n\nThis command runs:\n- regression tests;\n- isolation tests;\n- tap tests.\n\n    One of the tap tests downloads a 1GB archive and then unpacks it\n    into a file weighing almost 3GB. It is disabled by default.\n\n    To run this test, you need to set an environment variable:\n\n        $ export PG_TEST_EXTRA=big_values\n\n    The way to turn it off again:\n\n        $ export -n PG_TEST_EXTRA\n\n## Common operators and functions\n\nThe **rum** module provides next operators.\n\n|       Operator       | Returns |                 Description\n| -------------------- | ------- | ----------------------------------------------\n| tsvector \u0026lt;=\u0026gt; tsquery | float4  | Returns distance between tsvector and tsquery.\n| timestamp \u0026lt;=\u0026gt; timestamp | float8 | Returns distance between two timestamps.\n| timestamp \u0026lt;=\u0026#124; timestamp | float8 | Returns distance only for left timestamps.\n| timestamp \u0026#124;=\u0026gt; timestamp | float8 | Returns distance only for right timestamps.\n\nThe last three operations also work for types timestamptz, int2, int4, int8, float4, float8,\nmoney and oid.\n\n## Operator classes\n\n**rum** provides the following operator classes.\n\n### rum_tsvector_ops\n\nFor type: `tsvector`\n\nThis operator class stores `tsvector` lexemes with positional information. It supports\nordering by the `\u003c=\u003e` operator and prefix search. See the example below.\n\nLet us assume we have the table:\n\n```sql\nCREATE TABLE test_rum(t text, a tsvector);\n\nCREATE TRIGGER tsvectorupdate\nBEFORE UPDATE OR INSERT ON test_rum\nFOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');\n\nINSERT INTO test_rum(t) VALUES ('The situation is most beautiful');\nINSERT INTO test_rum(t) VALUES ('It is a beautiful');\nINSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');\n```\n\nTo create the **rum** index we need create an extension:\n\n```sql\nCREATE EXTENSION rum;\n```\n\nThen we can create new index:\n\n```sql\nCREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);\n```\n\nAnd we can execute the following queries:\n\n```sql\nSELECT t, a \u003c=\u003e to_tsquery('english', 'beautiful | place') AS rank\n    FROM test_rum\n    WHERE a @@ to_tsquery('english', 'beautiful | place')\n    ORDER BY a \u003c=\u003e to_tsquery('english', 'beautiful | place');\n                t                |  rank\n---------------------------------+---------\n It looks like a beautiful place | 8.22467\n The situation is most beautiful | 16.4493\n It is a beautiful               | 16.4493\n(3 rows)\n\nSELECT t, a \u003c=\u003e to_tsquery('english', 'place | situation') AS rank\n    FROM test_rum\n    WHERE a @@ to_tsquery('english', 'place | situation')\n    ORDER BY a \u003c=\u003e to_tsquery('english', 'place | situation');\n                t                |  rank\n---------------------------------+---------\n The situation is most beautiful | 16.4493\n It looks like a beautiful place | 16.4493\n(2 rows)\n```\n\n### rum_tsvector_hash_ops\n\nFor type: `tsvector`\n\nThis operator class stores a hash of `tsvector` lexemes with positional information.\nIt supports ordering by the `\u003c=\u003e` operator. It **doesn't** support prefix search.\n\n### rum_TYPE_ops\n\nFor types: int2, int4, int8, float4, float8, money, oid, time, timetz, date,\ninterval, macaddr, inet, cidr, text, varchar, char, bytea, bit, varbit,\nnumeric, timestamp, timestamptz\n\nSupported operations: `\u003c`, `\u003c=`, `=`, `\u003e=`, `\u003e` for all types and\n`\u003c=\u003e`, `\u003c=|` and `|=\u003e` for int2, int4, int8, float4, float8, money, oid,\ntimestamp and timestamptz types.\n\nThis operator supports ordering by the `\u003c=\u003e`, `\u003c=|` and `|=\u003e` operators. It can be used with\n`rum_tsvector_addon_ops`, `rum_tsvector_hash_addon_ops' and `rum_anyarray_addon_ops` operator classes.\n\n### rum_tsvector_addon_ops\n\nFor type: `tsvector`\n\nThis operator class stores `tsvector` lexemes with any supported by module\nfield. See the example below.\n\nLet us assume we have the table:\n\n```sql\nCREATE TABLE tsts (id int, t tsvector, d timestamp);\n\n\\copy tsts from 'rum/data/tsts.data'\n\nCREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_addon_ops, d)\n    WITH (attach = 'd', to = 't');\n```\n\nNow we can execute the following queries:\n```sql\nEXPLAIN (costs off)\n    SELECT id, d, d \u003c=\u003e '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr\u0026qh' ORDER BY d \u003c=\u003e '2016-05-16 14:21:25' LIMIT 5;\n                                    QUERY PLAN\n-----------------------------------------------------------------------------------\n Limit\n   -\u003e  Index Scan using tsts_idx on tsts\n         Index Cond: (t @@ '''wr'' \u0026 ''qh'''::tsquery)\n         Order By: (d \u003c=\u003e 'Mon May 16 14:21:25 2016'::timestamp without time zone)\n(4 rows)\n\nSELECT id, d, d \u003c=\u003e '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr\u0026qh' ORDER BY d \u003c=\u003e '2016-05-16 14:21:25' LIMIT 5;\n id  |                d                |   ?column?\n-----+---------------------------------+---------------\n 355 | Mon May 16 14:21:22.326724 2016 |      2.673276\n 354 | Mon May 16 13:21:22.326724 2016 |   3602.673276\n 371 | Tue May 17 06:21:22.326724 2016 |  57597.326724\n 406 | Wed May 18 17:21:22.326724 2016 | 183597.326724\n 415 | Thu May 19 02:21:22.326724 2016 | 215997.326724\n(5 rows)\n```\n\n\u003e **Warning:** Currently RUM has bogus behaviour when one creates an index using ordering over pass-by-reference additional information. This is due to the fact that posting trees have fixed length right bound and fixed length non-leaf posting items. It isn't allowed to create such indexes.\n\n### rum_tsvector_hash_addon_ops\n\nFor type: `tsvector`\n\nThis operator class stores a hash of `tsvector` lexemes with any supported by module\nfield.\n\nIt **doesn't** support prefix search.\n\n### rum_tsquery_ops\n\nFor type: `tsquery`\n\nIt stores branches of query tree in additional information. For example, we have the table:\n```sql\nCREATE TABLE query (q tsquery, tag text);\n\nINSERT INTO query VALUES ('supernova \u0026 star', 'sn'),\n    ('black', 'color'),\n    ('big \u0026 bang \u0026 black \u0026 hole', 'bang'),\n    ('spiral \u0026 galaxy', 'shape'),\n    ('black \u0026 hole', 'color');\n\nCREATE INDEX query_idx ON query USING rum(q);\n```\n\nNow we can execute the following fast query:\n```sql\nSELECT * FROM query\n    WHERE to_tsvector('black holes never exists before we think about them') @@ q;\n        q         |  tag\n------------------+-------\n 'black'          | color\n 'black' \u0026 'hole' | color\n(2 rows)\n```\n\n### rum_anyarray_ops\n\nFor type: `anyarray`\n\nThis operator class stores `anyarray` elements with length of the array.\nIt supports operators `\u0026\u0026`, `@\u003e`, `\u003c@`, `=`, `%` operators. It also supports ordering by `\u003c=\u003e` operator.\nFor example, we have the table:\n\n```sql\nCREATE TABLE test_array (i int2[]);\n\nINSERT INTO test_array VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}');\n\nCREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);\n```\n\nNow we can execute the query using index scan:\n\n```sql\nSET enable_seqscan TO off;\n\nEXPLAIN (COSTS OFF) SELECT * FROM test_array WHERE i \u0026\u0026 '{1}' ORDER BY i \u003c=\u003e '{1}' ASC;\n                QUERY PLAN\n------------------------------------------\n Index Scan using idx_array on test_array\n   Index Cond: (i \u0026\u0026 '{1}'::smallint[])\n   Order By: (i \u003c=\u003e '{1}'::smallint[])\n(3 rows\n\nSELECT * FROM test_array WHERE i \u0026\u0026 '{1}' ORDER BY i \u003c=\u003e '{1}' ASC;\n     i\n-----------\n {1}\n {1,2}\n {1,2,3}\n {1,2,3,4}\n(4 rows)\n```\n\n### rum_anyarray_addon_ops\n\nFor type: `anyarray`\n\nThis operator class stores `anyarray` elements with any supported by module\nfield.\n\n## Todo\n\n- Allow multiple additional information (lexemes positions + timestamp).\n- Improve ranking function to support TF/IDF.\n- Improve insert time.\n- Improve GENERIC WAL to support shift (PostgreSQL core changes).\n\n## Authors\n\nAlexander Korotkov \u003ca.korotkov@postgrespro.ru\u003e Postgres Professional Ltd., Russia\n\nOleg Bartunov \u003co.bartunov@postgrespro.ru\u003e Postgres Professional Ltd., Russia\n\nTeodor Sigaev \u003cteodor@postgrespro.ru\u003e Postgres Professional Ltd., Russia\n\nArthur Zakirov \u003ca.zakirov@postgrespro.ru\u003e Postgres Professional Ltd., Russia\n\nPavel Borisov \u003cp.borisov@postgrespro.com\u003e Postgres Professional Ltd., Russia\n\nMaxim Orlov \u003cm.orlov@postgrespro.ru\u003e Postgres Professional Ltd., Russia\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Frum","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpostgrespro%2Frum","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Frum/lists"}