{"id":37644795,"url":"https://github.com/ybrs/riffq","last_synced_at":"2026-01-16T11:25:08.310Z","repository":{"id":292659639,"uuid":"966235742","full_name":"ybrs/riffq","owner":"ybrs","description":"Connect to your data in python with postgresql protocol (pandas, polars, duckdb etc)","archived":false,"fork":false,"pushed_at":"2025-10-17T10:24:09.000Z","size":1705,"stargazers_count":53,"open_issues_count":6,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-17T19:06:02.226Z","etag":null,"topics":["duckdb","pgwire","pgwire-protocol","postgresql"],"latest_commit_sha":null,"homepage":"https://riffq.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ybrs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"agents.md","dco":null,"cla":null}},"created_at":"2025-04-14T15:58:04.000Z","updated_at":"2025-10-16T16:21:48.000Z","dependencies_parsed_at":null,"dependency_job_id":"13c175a7-14fa-43da-b7bc-283e1024321f","html_url":"https://github.com/ybrs/riffq","commit_stats":null,"previous_names":["ybrs/riffq"],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/ybrs/riffq","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ybrs%2Friffq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ybrs%2Friffq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ybrs%2Friffq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ybrs%2Friffq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ybrs","download_url":"https://codeload.github.com/ybrs/riffq/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ybrs%2Friffq/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478240,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["duckdb","pgwire","pgwire-protocol","postgresql"],"created_at":"2026-01-16T11:25:08.248Z","updated_at":"2026-01-16T11:25:08.300Z","avatar_url":"https://github.com/ybrs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# riffq\n\n**riffq** is a toolkit in python (built in Rust) for building PostgreSQL wire-compatible databases.  \nIt allows you to serve data from Python over the PostgreSQL protocol — turning your Python-based logic or in-memory data into a queryable, network-exposed system. We also have a catalog emulation system in rust with datafusion.\n\nDocumentation: https://ybrs.github.io/riffq/\n\n---\n\n## What It Does\n\n- Implements the PostgreSQL wire protocol in Rust for performance and concurrency\n- Sends raw SQL queries (Simple or Extended protocol) to Python for interpretation\n- Implements postgres catalog compatibility layer, see [`pg_catalog_rs`](https://github.com/ybrs/pg_catalog)\n\nSince you are in python, you can\n- Allows you to connect to remote data sources (e.g., analytics DB, CRM) and expose them as a unified PostgreSQL database\n- Enables serving Pandas DataFrames over the network as virtual SQL tables\n- Can delegate SQL execution to DuckDB, Polars, or any other Python engine\n- Acts as a programmable federated query engine or custom data service\n\n---\n\n## Example Use Cases\n\n- Serve a Pandas DataFrame as a PostgreSQL table to BI tools\n- Build a custom federated engine from multiple APIs or databases\n- Implement your own data lake query frontend\n- Expose dynamic ML feature stores for training or real-time inference\n- Provide fine-grained, code-controlled access to internal metrics or logs\n\n---\n\n## Example\n\n```python\nimport logging\nimport duckdb\nimport pyarrow as pa\nimport riffq\nlogging.basicConfig(level=logging.DEBUG)\n\nclass Connection(riffq.BaseConnection):\n    def handle_auth(self, user, password, host, database=None, callback=callable):\n        # simple username/password check\n        callback(user == \"user\" and password == \"secret\")\n\n    def handle_connect(self, ip, port, callback=callable):\n        # allow every incoming connection\n        callback(True)\n\n    def handle_disconnect(self, ip, port, callback=callable):\n        # invoked when client disconnects\n        callback(True)\n\n    def _handle_query(self, sql, callback, **kwargs):\n        cur = duckdb_con.cursor()\n        try:\n            if sql.strip().lower() == \"select err\":\n                # custom error returned to client\n                callback((\"ERROR\", \"42846\", \"bad type\"), is_error=True)\n                return\n            reader = cur.execute(sql).fetch_record_batch()\n            self.send_reader(reader, callback)\n        except Exception as exc:\n            logging.exception(\"error on executing query\")\n            batch = self.arrow_batch(\n                [pa.array([\"ERROR\"]), pa.array([str(exc)])],\n                [\"error\", \"message\"],\n            )\n            self.send_reader(batch, callback)\n\n    def handle_query(self, sql, callback=callable, **kwargs):\n        self.executor.submit(self._handle_query, sql, callback, **kwargs)\n\ndef main():\n    global duckdb_con\n    duckdb_con = duckdb.connect()\n    duckdb_con.execute(\n        \"\"\"\n        CREATE VIEW klines AS \n        SELECT * \n        FROM 'data/klines.parquet'\n        \"\"\"\n    )\n    server = riffq.RiffqServer(\"127.0.0.1:5433\", connection_cls=Connection)\n    server.set_tls(\"certs/server.crt\", \"certs/server.key\")\n    server.start(tls=True)\n\nif __name__ == \"__main__\":\n    main()\n```\n\nThe Rust side calls this Python handler when a SQL query comes in via the PostgreSQL protocol.\n\n---\n\n## Architecture\n\n- **Rust layer** handles:\n  - PostgreSQL protocol (via [`pgwire`](https://crates.io/crates/pgwire))\n  - Connection management\n  - Query routing\n  - Metadata compatibility (`pg_catalog` emulation)\n- **Python layer** handles:\n  - SQL execution (via any engine: DuckDB, Polars, etc.)\n  - Data transformation\n  - Custom logic and dynamic schema definitions\n\n### Zero Copy\n\nWe try to achieve zero-copy by using arrow/pycapsule. So data from duckdb comes to python as a pycapsule pointer, which goes to thread in python which goes to the callback in rust still as a pycapsule pointer. We then stream to network with postgresql using pgwire.  \n\nhttps://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n\n---\n\n## Getting Started\n\n### Install the package\n\n```bash\npip install riffq --pre\n```\n\n### Extend BaseConnection class\n\nYou can extend the base class for \n- handle_query\n- handle_auth - To check for username/password\n- handle_connect - To check ip/port restriction\n- handle_disconnect - WIP\n\n```python\nclass Connection(riffq.BaseConnection):\n    def handle_auth(self, user, password, host, database=None, callback=callable):\n        # simple username/password check\n        callback(user == \"user\" and password == \"secret\")\n\n    def handle_connect(self, ip, port, callback=callable):\n        # allow every incoming connection\n        callback(True)\n\n    def handle_disconnect(self, ip, port, callback=callable):\n        # invoked when client disconnects\n        callback(True)\n\n    def _handle_query(self, sql, callback, **kwargs):\n        cur = duckdb_con.cursor()\n        try:\n            if sql.strip().lower() == \"select err\":\n                # custom error returned to client\n                callback((\"ERROR\", \"42846\", \"bad type\"), is_error=True)\n                return\n            reader = cur.execute(sql).fetch_record_batch()\n            self.send_reader(reader, callback)\n        except Exception as exc:\n            logging.exception(\"error on executing query\")\n            batch = self.arrow_batch(\n                [pa.array([\"ERROR\"]), pa.array([str(exc)])],\n                [\"error\", \"message\"],\n            )\n            self.send_reader(batch, callback)\n\n    def handle_query(self, sql, callback=callable, **kwargs):\n        self.executor.submit(self._handle_query, sql, callback, **kwargs)\n\n```\n\n### Start the server\n\n```python\n    global duckdb_con\n    duckdb_con = duckdb.connect()\n    duckdb_con.execute(\n        \"\"\"\n        CREATE VIEW klines AS \n        SELECT * \n        FROM 'data/klines.parquet'\n        \"\"\"\n    )\n    server = riffq.RiffqServer(\"127.0.0.1:5433\", connection_cls=Connection)\n    server.set_tls(\"certs/server.crt\", \"certs/server.key\")\n    server.start()\n```\n\nYou can check server implementations on test_concurrency/ and example/ directory\n\n\nThen connect using any PostgreSQL client:\n\n```bash\npsql -h localhost -p 5433\n```\n\n\n### Enabling TLS\n\nGenerate a temporary certificate and key:\n\n```bash\nopenssl req -newkey rsa:2048 -nodes -keyout server.key -x509 -days 1 -out server.crt -subj \"/CN=localhost\"\n```\n\n### Enabling Catalog Emulation\n\nPostgresql clients sends queries to pg_catalog schema to find out databases, schemas, tables, columns.\n\nWe have this [`pg_catalog_rs`](https://github.com/ybrs/pg_catalog) for this purpose.\n\nYou can register your own database and your tables. \n\nFor example\n\n```python\n    server = riffq.RiffqServer(f\"127.0.0.1:{port}\", connection_cls=Connection)\n    server.set_tls(\"certs/server.crt\", \"certs/server.key\")\n\n    server._server.register_database(\"duckdb\")\n\n    tbls = duckdb_con.execute(\n        \"SELECT table_schema, table_name FROM information_schema.tables \"\n        \"WHERE table_schema NOT IN ('pg_catalog','information_schema')\"\n    ).fetchall()\n\n    for schema_name, table_name in tbls:\n        server._server.register_schema(\"duckdb\", schema_name)\n        cols_info = duckdb_con.execute(\n            \"SELECT column_name, data_type, is_nullable FROM information_schema.columns \"\n            \"WHERE table_schema=? AND table_name=?\",\n            (schema_name, table_name),\n        ).fetchall()\n        columns = []\n        for col_name, data_type, is_nullable in cols_info:\n            columns.append(\n                {\n                    col_name: {\n                        \"type\": map_type(data_type),\n                        \"nullable\": is_nullable.upper() == \"YES\",\n                    }\n                }\n            )\n        server._server.register_table(\"duckdb\", schema_name, table_name, columns)\n\n    server.start(catalog_emulation=True)\n```\n\n\n---\n\n## Status\n\n- ✅ Wire protocol support (simple + extended)\n- ✅ Query dispatching to Python\n- ✅ Thread based non-blocking query execution (long queries don't block)\n- ✅ DuckDB, Pandas, Polars compatibility\n- ✅ Limited SQL parsing on Rust side (forwarded to Python)\n- ✅ Optional TLS encryption\n- ✅ Integration with optional catalog emulation layer\n- 🟡 More examples\n- 🟡 Better logging, monitoring, observability\n---\n\n## Installation\n\nWe currently have a pre release on pypi. You can install it with `--pre` tag.\n\n```bash\npip install riffq --pre\n```\n\n## Running Locally\n\nInstall the development requirements and run the test suite:\n\n```bash\ngit clone git@github.com:ybrs/riffq.git\ncd riffq\npython -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\n\nmaturin build --profile=fast -i python3\npip install target/wheels/*.whl\n# or maturin developer\nmake all-tests\n```\n\nThe tests require the Rust extension to build successfully; any build failure will cause the suite to fail.\n\n---\n\n## License\n\nMIT or Apache 2.0 — your choice.\n\n---\n\n## Contributing\n\nContributions are welcome! Especially for:\n- For the emulation layer, I am currently testing with\n  - Intellij Datagrid\n  - Dbeaver\n  - psql cli\n  - [Vscode Postgresql Extension](https://marketplace.visualstudio.com/items?itemName=ms-ossdata.vscode-pgsql)\n\n  So testing with other clients, especially BI tools are very welcomed. \n  \n- Better Python DX\n- Example apps (data lake, feature store, etc.)\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fybrs%2Friffq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fybrs%2Friffq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fybrs%2Friffq/lists"}