Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/peter-evans/gaps-and-islands

Gaps and islands: Merging contiguous ranges
https://github.com/peter-evans/gaps-and-islands

contiguous gaps-and-islands postgresql row-merge

Last synced: 21 days ago
JSON representation

Gaps and islands: Merging contiguous ranges

Host: GitHub
URL: https://github.com/peter-evans/gaps-and-islands
Owner: peter-evans
Created: 2022-02-25T14:19:32.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2022-03-02T13:33:02.000Z (over 2 years ago)
Last Synced: 2024-10-04T11:41:24.363Z (about 1 month ago)
Topics: contiguous, gaps-and-islands, postgresql, row-merge
Homepage:
Size: 20.5 KB
Stars: 1
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Gaps and islands: Merging contiguous ranges [](https://peterevans.dev/posts/gaps-and-islands-merging-contiguous-ranges/)

I recently needed a solution to merge rows of contiguous ranges in a PostgreSQL table.

The approach I took was based on solutions to the [gaps and islands](https://www.red-gate.com/simple-talk/databases/sql-server/t-sql-programming-sql-server/gaps-islands-sql-server-data/) problem.

Note that you can avoid needing a solution like this if you are able to upgrade to PostgreSQL 14 and take advantage of [multirange](https://www.postgresql.org/docs/14/rangetypes.html) types. If not, read on!

Requirements for my particular use case:

- Find gaps and islands between rows containing a numerical range, expressed as two columns, `from_id` and `to_id`.

- Merge the islands (rows of contiguous ranges) into a single row.

- Perform the merge and update the table in a single SQL transaction to avoid race conditions with concurrent processes.

## Solution

This is the table we'll use for the following examples.

`set_id` is a set of ranges, and the merge operation targets a specific set.

The `EXCLUDE` constraint is added to prevent overlapping ranges from being inserted into the table.

```sql

CREATE TABLE ranges (

  set_id integer NOT NULL,

  from_id bigint NOT NULL,

  to_id bigint NOT NULL,

  EXCLUDE USING GIST (

    set_id WITH =,

    int8range(from_id, to_id, '[]') WITH &&

  )

);

```

### Identify islands

Identifying islands is done in two steps.

The first step adds the column `island_start`, marking the start of an island.

```sql

SELECT

  *,

  CASE from_id - LAG(ranges.to_id)

      OVER (ORDER BY ranges.from_id ASC)

    WHEN NULL THEN 1

    WHEN 1 THEN 0

    ELSE 1

  END AS island_start

FROM ranges

WHERE set_id = 1;

```

The query uses the `LAG` [window function](https://www.postgresql.org/docs/current/functions-window.html) to evaluate the previous row, and determine if the current row is the start of an island or not. Since the first row has no previous row, we must check for `NULL` to handle that case.

Here is an example result, showing the start of four islands have been marked.

| set_id | from_id | to_id | island_start |

| ------ | ------- | ----- | ------------ |

|      1 |       1 |    10 |            1 |

|      1 |      11 |    15 |            0 |

|      1 |      16 |    20 |            0 |

|      1 |      25 |    30 |            1 |

|      1 |      31 |    40 |            0 |

|      1 |      45 |    50 |            1 |

|      1 |      55 |    60 |            1 |

|      1 |      61 |    80 |            0 |

The next step is to give each island a unique ID, so that we can identify which island each row belongs to.

```sql

WITH range_islands AS (

  SELECT

    *,

    CASE from_id - LAG(ranges.to_id)

        OVER (ORDER BY ranges.from_id ASC)

      WHEN NULL THEN 1

      WHEN 1 THEN 0

      ELSE 1

    END AS island_start

  FROM ranges

  WHERE set_id = 1

)

SELECT

  *,

  SUM(range_islands.island_start)

    OVER (ORDER BY range_islands.from_id ASC) AS island_id

FROM range_islands;

```

The query uses `SUM` as a windowed function over the `island_start` column in the result of our previous query.

This creates a rolling sum, where each island start row increases the sum by one, giving us a unique ID.

Here is an example result, showing four islands with their unique IDs.

| set_id | from_id | to_id | island_start | island_id |

| ------ | ------- | ----- | ------------ | --------- |

|      1 |       1 |    10 |            1 |         1 |

|      1 |      11 |    15 |            0 |         1 |

|      1 |      16 |    20 |            0 |         1 |

|      1 |      25 |    30 |            1 |         2 |

|      1 |      31 |    40 |            0 |         2 |

|      1 |      45 |    50 |            1 |         3 |

|      1 |      55 |    60 |            1 |         4 |

|      1 |      61 |    80 |            0 |         4 |

### Merge islands

Once each row has an ID, identifying what island it belongs to, the next step is straightforward.

We group by `island_id` and find the `MIN` and `MAX` of the contiguous ranges.

```sql

WITH range_islands AS (

  SELECT

    *,

    CASE from_id - LAG(ranges.to_id)

        OVER (ORDER BY ranges.from_id ASC)

      WHEN NULL THEN 1

      WHEN 1 THEN 0

      ELSE 1

    END AS island_start

  FROM ranges

  WHERE set_id = 1

),

range_island_ids AS (

  SELECT

    *,

    SUM(range_islands.island_start)

      OVER (ORDER BY range_islands.from_id ASC) AS island_id

  FROM range_islands

)

SELECT

  set_id,

  MIN(from_id) AS from_id,

  MAX(to_id) AS to_id

FROM range_island_ids

GROUP BY set_id, island_id;

```

Here is the result, showing the four merged islands.

| set_id | from_id | to_id |

| ------ | ------- | ----- |

|      1 |       1 |    20 |

|      1 |      25 |    40 |

|      1 |      45 |    50 |

|      1 |      55 |    80 |

### Update islands

Updating the table with the merged rows takes place in two steps.

Firstly, any rows that were identified as not being the start of an island can be deleted.

```sql

DELETE FROM ranges

USING range_islands

WHERE

  ranges.set_id = range_islands.set_id AND

  ranges.from_id = range_islands.from_id AND

  range_islands.island_start = 0

```

Secondly, the remaining rows representing the islands are updated with the `to_id` of the merged ranges.

```sql

UPDATE ranges SET

  to_id = merged_ranges.to_id

FROM merged_ranges

WHERE

  ranges.set_id = merged_ranges.set_id AND

  ranges.from_id = merged_ranges.from_id

```

That completes all the steps necessary to execute a merge of contiguous ranges in a single PostgreSQL transaction.

See [gaps-and-islands.sql](gaps-and-islands.sql) for a complete example.

You can also check out the example in [dbfiddle](https://dbfiddle.uk/?rdbms=postgres_12&fiddle=cd6bae615d8caa90eff0fd275e292cb5).