https://github.com/petere/pgpcre

PCRE functions for PostgreSQL
https://github.com/petere/pgpcre

pcre perl postgresql postgresql-extension regular-expression

Last synced: 3 months ago
JSON representation

PCRE functions for PostgreSQL

Host: GitHub
URL: https://github.com/petere/pgpcre
Owner: petere
License: other
Created: 2013-02-09T21:20:22.000Z (over 12 years ago)
Default Branch: master
Last Pushed: 2025-03-26T10:56:48.000Z (3 months ago)
Last Synced: 2025-03-26T11:41:53.475Z (3 months ago)
Topics: pcre, perl, postgresql, postgresql-extension, regular-expression
Language: C
Size: 24.4 KB
Stars: 24
Watchers: 4
Forks: 4
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # pgpcre

[![Build Status](https://secure.travis-ci.org/petere/pgpcre.png)](http://travis-ci.org/petere/pgpcre)

This is a module for PostgreSQL that exposes Perl-compatible regular expressions (PCRE) functionality as functions and operators.  It is based on the popular [PCRE library](http://www.pcre.org/).

## Installation

You need to have libpcre installed.  pkg-config will be used to find it.

To build and install this module:

    make

    make install

or selecting a specific PostgreSQL installation:

    make PG_CONFIG=/some/where/bin/pg_config

    make PG_CONFIG=/some/where/bin/pg_config install

And finally inside the database:

    CREATE EXTENSION pgpcre;

## Using

A regular expression is a separate data type, named `pcre`.  (This is different from how the built-in regular expressions in PostgreSQL work, which are simply values of type `text`.)

The supported regular expressions are documented on the [pcrepattern(3)](http://linux.die.net/man/3/pcrepattern) man page.

### Basic matching

Boolean operators are available for checking whether a pattern matches a string.  These operators return true or false, respectively.  They only return null when one of the operands is null.

Examples:

    SELECT 'foo' ~ pcre 'fo+';

    SELECT 'bar' !~ pcre 'fo+';

You can also write it the other way around:

    SELECT pcre 'fo+' ~ 'foo';

    SELECT pcre 'fo+' !~ 'bar';

This can be handy for writing things like

    SELECT pcre 'fo+' ~ ANY(ARRAY['foo', 'bar']);

For Perl nostalgia, you can also use this operator:

    SELECT 'foo' =~ pcre 'fo+';

And if this operator is unique (which it should be, unless you have

something else installed that uses it), you can also write:

    SELECT 'foo' =~ 'fo+';

(The `~` operator, by contrast, is not unique, of course, because it is used by the built-in regular expressions.)

To get case-insensitive matching, set the appropriate option in the pattern, for example:

    SELECT 'FOO' ~ pcre '(?i)fo+';

### Extracting the matched string

To extract the substring that was matched by the pattern, use the

function `pcre_match`.  It returns either a value of type text, or

null if the pattern did not match.  Examples:

    SELECT pcre_match('fo+', 'foobar');  --> 'foo'

    SELECT pcre_match('fo+', 'barbar');  --> NULL

There is no support for extracting multiple matches of a pattern in a

string, because PCRE does not (easily) support that.

### Extracting captured substrings

Captured substrings (parenthesized subexpressions) are extracted using

the function `pcre_captured_substrings`.  It returns either an array

of text, or null if the pattern did not match.  Examples:

    SELECT pcre_captured_substrings('(fo+)(b..)', 'foobar');  --> ARRAY['foo','bar']

    SELECT pcre_captured_substrings('(fo+)(b..)', 'abcdef');  --> NULL

Note that elements of the array can be null if a substring was not used, for example:

    SELECT pcre_captured_substrings('(a|(z))(bc)', 'abc');  --> ARRAY['a',NULL,'bc']

### Storing regular expressions

You can store regular expression values of type `pcre` in tables, like

any other data.  Note, however, that the binary representation of the

`pcre` values contains the compiled regular expression, which is tied

to the version of the PCRE library.  If you upgrade the PCRE library

and use a compiled value created by a different version, things might

not work or even crash (according to the PCRE documentation; I don't

know how likely that is).  pgpcre will warn if you attempt to use a

value that was compiled by a different version of the library.  If

that happens, it is advisable to recompile and rewrite all stored

`pcre` values by doing something like

    UPDATE ... SET pcre_col = pcre_col::text::pcre

(To be clear, storing regular expressions in tables is not a typical

use.  Normally, you store text in tables and match it against regular

expressions provided by your application.)

## Discussion

Some possible advantages over the regular expression support built into PostgreSQL:

- richer pattern language, more familiar to Perl and Python programmers

- complete Unicode support

- saner operators and functions

Some disadvantages:

- no repeated matching (`'g'` flag)

- no index optimization

You can workaround the lack of index optimization by manually augmenting queries like

    column =~ '^foo'

with

    AND column ~>=~ 'foo' AND column ~<~ 'fop'

and creating the appropriate `text_pattern_ops` index as you would for the built-in pattern matching.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/petere/pgpcre

Awesome Lists containing this project

README