Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/steve-chavez/pg_bzip
Bzip compression and decompression for Postgres
https://github.com/steve-chavez/pg_bzip
bzip bzip2 c compression decompression postgres postgresql postgresql-extension
Last synced: about 1 month ago
JSON representation
Bzip compression and decompression for Postgres
- Host: GitHub
- URL: https://github.com/steve-chavez/pg_bzip
- Owner: steve-chavez
- License: mit
- Created: 2023-12-11T05:03:06.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2023-12-17T22:50:05.000Z (about 1 year ago)
- Last Synced: 2024-10-14T09:28:25.872Z (3 months ago)
- Topics: bzip, bzip2, c, compression, decompression, postgres, postgresql, postgresql-extension
- Language: C
- Homepage:
- Size: 1.79 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pg_bzip
## Motivation
If you get data compressed as bzip2, whether through [HTTP](https://github.com/pramsey/pgsql-http) or from a file, it's convenient to decompress it in SQL.
`pg_bzip` does that, it provides functions to decompress and compress data using bzip2.## Functions
- `bzcat(data bytea) returns bytea`
This function mimics the [bzcat](https://linux.die.net/man/1/bzcat) command, which decompresses data using bzip2.
For this example, we'll use the native [pg_read_binary_file](https://pgpedia.info/p/pg_read_binary_file.html) to read from a file.
```sql
select convert_from(bzcat(pg_read_binary_file('/path/to/all_movies.csv.bz2')), 'utf8') as contents;contents
--------------------------------------------------------------------------------------------------------------------------------------------
"id","name","parent_id","date" +
"2","Ariel","8384","1988-10-21" +
"3","Varjoja paratiisissa","8384","1986-10-17" +
"4","État de siège",\N,"1972-12-30" +
"5","Four Rooms",\N,"1995-12-22" +
"6","Judgment Night",\N,"1993-10-15" +
"8","Megacities - Life in Loops",\N,"2006-01-01" +
"9","Sonntag, im August",\N,"2004-09-22" +
"11","Star Wars: Episode IV – A New Hope","10","1977-05-25" +
"12","Finding Nemo","112246","2003-05-30" +
...
....
.....
```- `bzip2(data bytea, compression_level int default 9) returns bytea`
This function is a simplified version of the [bzip2](https://linux.die.net/man/1/bzip2) command. It compresses data using bzip2.
For this example we'll use `fio_writefile` from [pgsql-fio](https://github.com/csimsek/pgsql-fio), which offers a convenient way to write a file from SQL.
```sql
select fio_writefile('/path/to/my_text.bz2', bzip2(repeat('my text to be compressed', 1000)::bytea)) as writesize;writesize
-----------
109
```## Installation
bzip2 is required. Under Debian/Ubuntu you can get it with
```bash
sudo apt install libbz2-dev
```Then on this repo
```bash
make && make install
```Now on SQL you can do:
```sql
CREATE EXTENSION bzip;
````pg_bzip` is tested to work on PostgreSQL >= 12.
## Development
[Nix](https://nixos.org/download.html) is used to get an isolated and reproducible enviroment with multiple postgres versions.
```bash
# enter the Nix environment
$ nix-shell# to run the tests
$ with-pg-16 make installcheck# to interact with the isolated pg
$ with-pg-16 psql# you can choose the pg version
$ with-pg-15 psql
```