https://github.com/maxpert/phanpy

Phanpy is dead simple JSON streaming HTTP interface over Postgres to execute your queries.
https://github.com/maxpert/phanpy

Last synced: 8 months ago
JSON representation

Phanpy is dead simple JSON streaming HTTP interface over Postgres to execute your queries.

Host: GitHub
URL: https://github.com/maxpert/phanpy
Owner: maxpert
License: mit
Created: 2021-09-09T14:41:13.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2023-09-29T13:18:20.000Z (over 2 years ago)
Last Synced: 2025-09-18T23:08:56.138Z (8 months ago)
Language: Go
Size: 96.7 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Phanpy

Phanpy is dead simple JSON streaming HTTP interface over Postgres to execute your queries.

> Disclaimer: This is an experimental software. Use in production at your own risk.

## Motivation

Unlike [PostgREST](https://postgrest.org/en/v8.0/index.html) Phanpy only embraces the philosophy of 

providing an HTTP interface to Postgres and embraces SQL queries and parameters as first class citizens. 

## Zero Abstraction

Phanpy takes SQL for what it is; a well understood, well document, and well practiced language 

with precise controls that no other DSL has matched so far. That's why projects like Spark, Cassandra etc. adopted

various dialects of SQL. All queries are prepared and executed as is!

## Tutorial

Phanpy requires only two environment variables:

 - `ENV` enviroment of running `prod` means production where it uses production loggers, everything else is development

 - `DB_URL` a full URL according to [pgx](https://github.com/jackc/pgx) 

    specifications e.g. `postgresql://:@localhost:5432/?statement_cache_capacity=128&statement_cache_mode=prepare&pool_max_conns=16`

After launching `phanpy` you can use cURL to run your queries:

```bash

curl -v --data '{"query": "SELECT * FROM generate_series(1, 1000000)", "params": [], "timeout": 30}' http://localhost:8080/

```

## Performance

Here is the example query payload used to generate upto 400 rows at random, with random ordering and some text payload.

```json

{

  "query": "SELECT md5(random()::text) AS some_data, * FROM generate_series(1, (random() * $1)::int) ORDER BY RANDOM()",

  "params": [400],

  "timeout": 30

}

```

Benchmarking with [vegeta](https://github.com/tsenart/vegeta) here are rough benchmarks with everything running on 

(2015 Macbook pro, 2.2 GHz 8 core with 16 GB RAM) same machine (all vegeta, postgres, phanpy):

```bash

➜  phanpy git:(master) cat vegeta.txt | vegeta attack -rate=1000 -duration=30s | vegeta report

Requests      [total, rate, throughput]         30000, 1000.03, 999.78

Duration      [total, attack, wait]             30.007s, 29.999s, 7.478ms

Latencies     [min, mean, 50, 90, 95, 99, max]  400.425µs, 9.288ms, 6.457ms, 17.67ms, 28.689ms, 59.23ms, 166.477ms

Bytes In      [total, mean]                     422840350, 14094.68

Bytes Out     [total, mean]                     4710000, 157.00

Success       [ratio]                           100.00%

Status Codes  [code:count]                      200:30000

Error Set:

```

During all of this maximum memory was reported under 50MB RSS; avg runtime stats looked something like this:

```json

{

  "acquire_count": 101490,

  "acquire_duration": 508260937427,

  "acquired_conns": 16,

  "empty_acquire_count": 12972,

  "idle_conns": 0,

  "max_conns": 16,

  "stats": {

    "time": 1631159177493892000,

    "go_version": "go1.17",

    "go_os": "darwin",

    "go_arch": "amd64",

    "cpu_num": 8,

    "goroutine_num": 213,

    "gomaxprocs": 8,

    "cgo_call_num": 133,

    "memory_alloc": 13717472,

    "memory_total_alloc": 21575081440,

    "memory_sys": 44714768,

    "memory_lookups": 0,

    "memory_mallocs": 374057114,

    "memory_frees": 373986984,

    "memory_stack": 2424832,

    "heap_alloc": 13717472,

    "heap_sys": 35323904,

    "heap_idle": 16744448,

    "heap_inuse": 18579456,

    "heap_released": 6774784,

    "heap_objects": 70130,

    "gc_next": 21642400,

    "gc_last": 1631159177475508000,

    "gc_num": 2348,

    "gc_per_second": 20.843679212802737,

    "gc_pause_per_second": 42.506663,

    "gc_pause": [

      0.19,

      0.232078,

      0.310384,

      0.175009,

      0.151908,

      0.178259,

      0.181853,

      "more ..."

    ]

  },

  "total_conns": 16

}

```

## Named queries

Often in production it will require you to have clear visibility on various queries you are running. Unlike everything

under single path (e.g. `/graphql`) this will not only let you integrate existing tools to measure latency, it will

also let you do predefine well written queries and optimize them with tooling.

Right now named queries can be defined in `queries.yml` file under current working directory (or file path in 

`NAMED_QUERIES` environment variable). Here is an example:

```yaml

- name: get_series

  query: SELECT md5(random()::text) AS hash, * FROM generate_series(1, $1) ORDER BY RANDOM()

  timeout: 10

```

Now you can run this named query using:

```bash

curl -v --data '[40000]' http://localhost:8080/query/get_series

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/maxpert/phanpy

Awesome Lists containing this project

README