An open API service indexing awesome lists of open source software.

https://github.com/apecloud/myduckserver

Unified MySQL, Postgres & FlightSQL Server, Powered by DuckDB.
https://github.com/apecloud/myduckserver

analytics arrow business-analytics business-intelligence columnar-storage data-engineering data-science database duckdb htap mariadb mysql olap pandas parquet polars postgres replication sql zero-etl

Last synced: about 1 month ago
JSON representation

Unified MySQL, Postgres & FlightSQL Server, Powered by DuckDB.

Awesome Lists containing this project

README

        


duck under dolphin
MyDuck Server

**MyDuck Server** unlocks serious power for your MySQL & Postgres analytics. Imagine the simplicity of (MySQL|Postgres)’s familiar interface fused with the raw analytical speed of [DuckDB](https://duckdb.org/). Now you can supercharge your analytical queries with DuckDB’s lightning-fast OLAP engine, all while using the tools and dialect you know.


duck under dolphin

## πŸ“‘ Table of Contents

- [Why MyDuck](#-why-myduck-)
- [Key Features](#-key-features)
- [Performance](#-performance)
- [Getting Started](#-getting-started)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Replicating Data](#replicating-data)
- [Connecting to Cloud MySQL & Postgres](#connecting-to-cloud-mysql--postgres)
- [HTAP Setup](#htap-setup)
- [Customizing the Docker Container](#customizing-the-docker-container)
- [Query Parquet Files](#query-parquet-files)
- [Already Using DuckDB?](#already-using-duckdb)
- [Backup and Restore with Object Storage](#backup-and-restore-with-object-storage)
- [LLM Integration](#llm-integration)
- [Access from Python](#access-from-python)
- [Roadmap](#-roadmap)
- [Contributing](#-contributing)
- [Acknowledgements](#-acknowledgements)
- [License](#-license)

## ❓ Why MyDuck ❓

While MySQL and Postgres are the most popular open-source databases for OLTP, their performance in analytics often falls short. DuckDB, on the other hand, is built for fast, embedded analytical processing. MyDuck Server lets you enjoy DuckDB's high-speed analytics without leaving the (MySQL|Postgres) ecosystem.

With MyDuck Server, you can:

- **Set up an isolated, fast, and real-time replica** dedicated to ad-hoc analytics, batch jobs, and LLM-generated queries, without exhausting or corrupting your primary database πŸ”₯
- **Accelerate existing MySQL & Postgres analytics** to new heights through DuckDB's high-speed engine with minimal changes πŸš€
- **Enable richer & faster connectivity** between modern data manipulation & analysis tools and your MySQL & Postgres data πŸ› οΈ
- **Go beyond MySQL & Postgres syntax** with DuckDB's advanced SQL features to expand your analytics potential πŸ¦†
- **Run DuckDB in server mode** to share a DuckDB instance with your team or among your applications 🌩️
- **Build HTAP systems** by combining (MySQL|Postgres) for transactions with MyDuck for analytics πŸ”„
- and much more! See below for a full list of feature highlights.

MyDuck Server isn't here to replace MySQL & Postgres β€” it's here to help MySQL & Postgres users do more with their data. This open-source project provides a convenient way to integrate high-speed analytics into your workflow while embracing the flexibility and efficiency of DuckDB.

## ✨ Key Features

- **Blazing Fast OLAP with DuckDB**: MyDuck stores data in DuckDB, an OLAP-optimized database known for lightning-fast analytical queries. DuckDB enables MyDuck to execute queries up to 1000x faster than traditional MySQL & Postgres setups, making complex analytics practical that were previously unfeasible.

- **MySQL-Compatible Interface**: MyDuck implements the MySQL wire protocol and understands MySQL syntax, allowing you to connect with any MySQL client and run MySQL-style SQL. MyDuck automatically translates your queries and executes them in DuckDB.

- **Postgres-Compatible Interface**: MyDuck implements the Postgres wire protocol, enabling you to send DuckDB SQL directly using any Postgres client. Since DuckDB's SQL dialect [closely resembles PostgreSQL](https://duckdb.org/docs/sql/dialect/postgresql_compatibility.html), you can speed up existing Postgres queries with minimal changes.

- **Raw DuckDB Power**: MyDuck provides full access to DuckDB's analytical capabilities through raw DuckDB SQL, including [friendly SQL syntax](https://duckdb.org/docs/sql/dialect/friendly_sql.html), [advanced aggregates](https://duckdb.org/docs/sql/functions/aggregates), [remote data source access](https://duckdb.org/docs/data/data_sources), [nested data types](https://duckdb.org/docs/sql/data_types/overview#nested--composite-types), and more.

- **Zero-ETL**: Simply start replication and begin querying! MyDuck can function as a MySQL replica or Postgres standby, replicating data from your primary server in real-time. It works like standard MySQL & Postgres replication - using MySQL's `START REPLICA` or Postgres' `CREATE SUBSCRIPTION` commands, eliminating the need for complex ETL pipelines.

- **Consistent and Efficient Replication**: Thanks to DuckDB's [solid ACID support](https://duckdb.org/2024/09/25/changing-data-with-confidence-and-acid.html), we've carefully managed transaction boundaries in the replication stream to ensure a **consistent data view** β€” you'll never see dirty data mid-transaction. Plus, MyDuck's **transaction batching** collects updates from multiple transactions and applies them to DuckDB in batches, significantly reducing write overhead (since DuckDB isn’t designed for high-frequency OLTP writes).

- **HTAP Architecture Support**: MyDuck works well with database proxy tools to enable hybrid transactional/analytical processing setups. You can route DML operations to (MySQL|Postgres) and analytical queries to MyDuck, creating a powerful HTAP architecture that combines the best of both worlds.

- **Bulk Upload & Download**: MyDuck supports fast bulk data loading from the client side with the standard MySQL `LOAD DATA LOCAL INFILE` command or the PostgreSQL `COPY FROM STDIN` command. You can also extract data from MyDuck using the PostgreSQL `COPY TO STDOUT` command.

- **End-to-End Columnar IO**: In addition to the traditional row-oriented data transfer in MySQL & Postgres protocol, MyDuck can also send query results and receive data uploads in columnar format, which can be significantly faster for high-volume data. This is implemented on top of the standard Postgres `COPY` protocol with extended columnar format support, e.g., `COPY ... TO STDOUT (FORMAT parquet | arrow)`, allowing you to use the standard Postgres client library to interact with MyDuck in an optimized way.

- **Standalone Mode**: MyDuck can run in standalone mode without replication. In this mode, it is a drop-in replacement for (MySQL|Postgres), but with a DuckDB heart. You can `CREATE TABLE`, transactionally `INSERT`, `UPDATE`, and `DELETE` data, and run blazingly fast `SELECT` queries.

- **DuckDB in Server Mode**: If you aren't interested in MySQL & Postgres but just want to share a DuckDB instance with your team or among your applications, MyDuck is also a great solution. You can deploy MyDuck to a server, connect to it with the Postgres client library in your favorite programming language, and start running DuckDB SQL queries directly.

- **Seamless Integration with Dump & Copy Utilities**: MyDuck plays well with modern MySQL & Postgres data migration tools, especially the [MySQL Shell](https://dev.mysql.com/doc/mysql-shell/en/) and [pg_dump](https://www.postgresql.org/docs/current/app-pgdump.html). For MySQL, you can load data into MyDuck in parallel from a MySQL Shell dump, or leverage the Shell’s `copy-instance` utility to copy a consistent snapshot of your running MySQL server to MyDuck. For Postgres, MyDuck can load data from a `pg_dump` archive.

## πŸ“Š Performance

Typical OLAP queries can run **up to 1000x faster** with MyDuck Server compared to MySQL & Postgres alone, especially on large datasets. Under the hood, it's just DuckDB doing what it does best: processing analytical queries at lightning speed. You are welcome to run your own benchmarks and prepare to be amazed! Alternatively, you can refer to well-known benchmarks like the [ClickBench](https://benchmark.clickhouse.com/) and [H2O.ai db-benchmark](https://duckdblabs.github.io/db-benchmark/) to see how DuckDB performs against other databases and data science tools. Also remember that DuckDB has robust support for transactions, JOINs, and [larger-than-memory query processing](https://duckdb.org/2024/07/09/memory-management.html), which are unavailable in many competing systems and tools.

## πŸƒβ€β™‚οΈ Getting Started

### Prerequisites

- **Docker** (recommended) for setting up MyDuck Server quickly.
- MySQL or PostgreSQL CLI clients for connecting and testing your setup.

### Installation

Get a standalone MyDuck Server up and running in minutes using Docker:

```bash
docker run -p 13306:3306 -p 15432:5432 apecloud/myduckserver:latest
```

This setup exposes:

- **Port 13306** for MySQL wire protocol connections.
- **Port 15432** for PostgreSQL wire protocol connections, allowing direct DuckDB SQL.

### Usage

#### Connecting via MySQL client

Connect using any MySQL client to run MySQL-style SQL queries:

```bash
mysql -h127.0.0.1 -P13306 -uroot
```

> [!NOTE]
> MySQL CLI clients version 9.0 and above are not yet supported on macOS. Consider `brew install [email protected]`.

#### Connecting via PostgreSQL client

For full analytical power, connect to the Postgres port and run DuckDB SQL queries directly:

```bash
psql -h 127.0.0.1 -p 15432 -U postgres
```

### Replicating Data

We have integrated a setup tool in the Docker image that helps replicate data from your primary (MySQL|Postgres) server to MyDuck Server. The tool is available via the `SETUP_MODE` environment variable. In `REPLICA` mode, the container will start MyDuck Server, dump a snapshot of your primary (MySQL|Postgres) server, and start replicating data in real-time.

> [!NOTE]
> Supported primary database versions: MySQL>=8.0 and PostgreSQL>=13. In addition to the default settings,
logical replication must be enabled for PostgreSQL by setting `wal_level=logical`.
> For MySQL, GTID-based replication (`gtid_mode=ON` and `enforce_gtid_consistency=ON`) is recommended but not required.

```bash
docker run -d --name myduck \
-p 13306:3306 \
-p 15432:5432 \
--env=SETUP_MODE=REPLICA \
--env=SOURCE_DSN="://:@:/"
apecloud/myduckserver:latest
```
`SOURCE_DSN` specifies the connection string to the primary database server, which can be either MySQL or PostgreSQL.

- **MySQL Primary:** Use the `mysql` URI scheme, e.g.,
`--env=SOURCE_DSN=mysql://root:[email protected]:3306`

- **PostgreSQL Primary:** Use the `postgres` URI scheme, e.g.,
`--env=SOURCE_DSN=postgres://postgres:[email protected]:5432/db01`

> [!NOTE]
> To replicate from a server running on the host machine, use `host.docker.internal` as the hostname instead of `localhost` or `127.0.0.1`. On Linux, you must also add `--add-host=host.docker.internal:host-gateway` to the `docker run` command.

### Connecting to Cloud MySQL & Postgres

MyDuck Server supports setting up replicas from common cloud-based MySQL & Postgres offerings. For more information, please refer to the [replica setup guide](docs/tutorial/replica-setup-rds.md).

### HTAP Setup

With MyDuck's powerful analytics capabilities, you can create an hybrid transactional/analytical processing system where high-frequency data writes are directed to a standard MySQL or Postgres instance, while analytical queries are handled by a MyDuck Server instance. Follow our HTAP setup instructions to easily set up an HTAP demonstration:
* Provisioning a MySQL HTAP cluster based on [ProxySQL](docs/tutorial/mysql-htap-proxysql-setup.md) or [MariaDB MaxScale](docs/tutorial/mysql-htap-maxscale-setup.md).
* Provisioning a PostgreSQL HTAP cluster based on [PGPool-II](docs/tutorial/pg-htap-pgpool-setup.md)

### Customizing the Docker Container

To rename the default database, pass the `DEFAULT_DB` environment variable to the Docker container:

```bash
docker run -d -p 13306:3306 -p 15432:5432 \
--env=DEFAULT_DB=mydbname \
apecloud/myduckserver:latest
```

To set the superuser password, pass the `SUPERUSER_PASSWORD` environment variable to the Docker container:

```bash
docker run -d -p 13306:3306 -p 15432:5432 \
--env=SUPERUSER_PASSWORD=mysecretpassword \
apecloud/myduckserver:latest
```

To initialize MyDuck Server with custom SQL statements, mount your `.sql` file to either `/docker-entrypoint-initdb.d/mysql/` or `/docker-entrypoint-initdb.d/postgres/` inside the Docker container, depending on the SQL dialect you're using.

For example:
```bash
# Execute `init.sql` via MySQL protocol
docker run -d -p 13306:3306 --name=myduck \
-v ./init.sql:/docker-entrypoint-initdb.d/mysql/init.sql \
apecloud/myduckserver:latest

# Execute `init.sql` via PostgreSQL protocol
docker run -d -p 15432:5432 --name=myduck \
-v ./init.sql:/docker-entrypoint-initdb.d/postgres/init.sql \
apecloud/myduckserver:latest
```

### Query Parquet Files

Looking to load Parquet files into MyDuck Server and start querying? Follow our [Parquet file loading guide](docs/tutorial/load-parquet-files.md) for easy setup.

### Already Using DuckDB?

Already have a DuckDB file? You can seamlessly bootstrap MyDuck Server with it. See our [DuckDB file bootstrapping guide](docs/tutorial/bootstrap.md) for more details.

### Managing Multiple Databases

Easily manage multiple databases in MyDuck Server, same as Postgres. For step-by-step instructions and detailed guidance, check out our [Database Management Guide](docs/tutorial/manage-multiple-databases.md).

### Backup and Restore with Object Storage

To back up and restore your databases inside MyDuck Server using object storage, refer to our [backup and restore guide](docs/tutorial/backup-restore.md) for detailed instructions.

### LLM Integration

MyDuck Server can be integrated with LLM applications via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction). Follow the [MCP integration guide](docs/tutorial/mcp.md) to set up MyDuck Server as an external data source for LLMs.

### Access from Python

MyDuck Server can be seamlessly accessed from the Python data science ecosystem. Follow the [Python integration guide](docs/tutorial/pg-python-data-tools.md) to connect to MyDuck Server from Python and export data to PyArrow, pandas, and Polars. Additionally, check out the [Ibis integration guide](docs/tutorial/connect-with-ibis-setup.md) for using the [Ibis](https://ibis-project.org/) dataframe API to query MyDuck Server directly.

## 🎯 Roadmap

We have big plans for MyDuck Server! Here are some of the features we’re working on:

- [x] Arrow Flight SQL.
- [x] Multiple DB.
- [ ] Authentication.
- [ ] ...and more! We’re always looking for ways to make MyDuck Server better. If you have a feature request, please let us know by [opening an issue](https://github.com/apecloud/myduckserver/issues/new).

## 🏑 Join the Community

Let's connect on [Discord](https://discord.gg/9MC5cgw5YK) to discuss requirements, address issues, and share user experiences.

## πŸ’‘ Contributing

MyDuck Server is open-source, and we’d love your help to keep it growing! Check out our [CONTRIBUTING.md](CONTRIBUTING.md) for ways to get involved. From bug reports to feature requests, all contributions are welcome!

## πŸ’— Acknowledgements

MyDuck Server is built on top of a collection of amazing open-source projects, notably:
- [DuckDB](https://duckdb.org/) - The fast in-process analytical database that powers MyDuck Server.
- [go-mysql-server](https://github.com/dolthub/go-mysql-server) - The outstanding MySQL server implementation in Go maintained by [DoltHub](https://www.dolthub.com/team) that MyDuck Server is bulit on. We also draw significant inspiration from [Dolt](https://github.com/dolthub/dolt) and [Doltgres](https://github.com/dolthub/doltgres).
- [Vitess](https://vitess.io/) - Provides the MySQL replication stream used in MyDuck Server.
- [go-duckdb](https://github.com/marcboeker/go-duckdb): An excellent Go driver for DuckDB that works seamlessly.
- [SQLGlot](https://github.com/tobymao/sqlglot) - The ultimate SQL transpiler.

We are grateful to the developers and contributors of these projects for their hard work and dedication to open-source software.

## πŸ“ License

MyDuck Server is released under the [Apache License 2.0](LICENSE).