Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/simonw/datasette-rure
Datasette plugin that adds a custom SQL function for executing matches using the Rust regular expression engine
https://github.com/simonw/datasette-rure
datasette datasette-io datasette-plugin regular-expressions sqlite
Last synced: 3 months ago
JSON representation
Datasette plugin that adds a custom SQL function for executing matches using the Rust regular expression engine
- Host: GitHub
- URL: https://github.com/simonw/datasette-rure
- Owner: simonw
- License: apache-2.0
- Created: 2019-09-10T18:09:33.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-09-11T22:59:38.000Z (over 5 years ago)
- Last Synced: 2024-10-06T20:28:55.843Z (3 months ago)
- Topics: datasette, datasette-io, datasette-plugin, regular-expressions, sqlite
- Language: Python
- Size: 18.6 KB
- Stars: 5
- Watchers: 3
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# datasette-rure
[![PyPI](https://img.shields.io/pypi/v/datasette-rure.svg)](https://pypi.org/project/datasette-rure/)
[![CircleCI](https://circleci.com/gh/simonw/datasette-rure.svg?style=svg)](https://circleci.com/gh/simonw/datasette-rure)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette-rure/blob/master/LICENSE)Datasette plugin that adds a custom SQL function for executing matches using the Rust regular expression engine
Install this plugin in the same environment as Datasette to enable the `regexp()` SQL function.
$ pip install datasette-rure
The plugin is built on top of the [rure-python](https://github.com/davidblewett/rure-python) library by David Blewett.
## regexp() to test regular expressions
You can test if a value matches a regular expression like this:
select regexp('hi.*there', 'hi there')
-- returns 1
select regexp('not.*there', 'hi there')
-- returns 0You can also use SQLite's custom syntax to run matches:
select 'hi there' REGEXP 'hi.*there'
-- returns 1This means you can select rows based on regular expression matches - for example, to select every article where the title begins with an E or an F:
select * from articles where title REGEXP '^[EF]'
Try this out: [REGEXP interactive demo](https://datasette-rure-demo.datasette.io/24ways?sql=select+*+from+articles+where+title+REGEXP+%27%5E%5BEF%5D%27)
## regexp_match() to extract groups
You can extract captured subsets of a pattern using `regexp_match()`.
select regexp_match('.*( and .*)', title) as n from articles where n is not null
-- Returns the ' and X' component of any matching titles, e.g.
-- and Recognition
-- and Transitions Their Place
-- etcThis will return the first parenthesis match when called with two arguments. You can call it with three arguments to indicate which match you would like to extract:
select regexp_match('.*(and)(.*)', title, 2) as n from articles where n is not null
The function will return `null` for invalid inputs e.g. a pattern without capture groups.
Try this out: [regexp_match() interactive demo](https://datasette-rure-demo.datasette.io/24ways?sql=select+%27WHY+%27+%7C%7C+regexp_match%28%27Why+%28.*%29%27%2C+title%29+as+t+from+articles+where+t+is+not+null)
## regexp_matches() to extract multiple matches at once
The `regexp_matches()` function can be used to extract multiple patterns from a single string. The result is returned as a JSON array, which can then be further processed using SQLite's [JSON functions](https://www.sqlite.org/json1.html).
The first argument is a regular expression with named capture groups. The second argument is the string to be matched.
select regexp_matches(
'hello (?P\w+) the (?P\w+)',
'hello bob the dog, hello maggie the cat, hello tarquin the otter'
)This will return a list of JSON objects, each one representing the named captures from the original regular expression:
[
{"name": "bob", "species": "dog"},
{"name": "maggie", "species": "cat"},
{"name": "tarquin", "species": "otter"}
]Try this out: [regexp_matches() interactive demo](https://datasette-rure-demo.datasette.io/24ways?sql=select+regexp_matches%28%0D%0A++++%27hello+%28%3FP%3Cname%3E%5Cw%2B%29+the+%28%3FP%3Cspecies%3E%5Cw%2B%29%27%2C%0D%0A++++%27hello+bob+the+dog%2C+hello+maggie+the+cat%2C+hello+tarquin+the+otter%27%0D%0A%29)