https://github.com/toluaina/pgsync
Postgres to Elasticsearch/OpenSearch sync
https://github.com/toluaina/pgsync
change-data-capture elasticsearch elasticsearch-sync etl kibana opensearch postgresql python sql
Last synced: 19 days ago
JSON representation
Postgres to Elasticsearch/OpenSearch sync
- Host: GitHub
- URL: https://github.com/toluaina/pgsync
- Owner: toluaina
- License: mit
- Created: 2019-08-17T11:23:28.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2025-03-12T16:07:21.000Z (11 months ago)
- Last Synced: 2025-05-11T00:42:40.066Z (9 months ago)
- Topics: change-data-capture, elasticsearch, elasticsearch-sync, etl, kibana, opensearch, postgresql, python, sql
- Language: Python
- Homepage: https://pgsync.com
- Size: 1.53 MB
- Stars: 1,277
- Watchers: 20
- Forks: 196
- Open Issues: 197
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.rst
- Contributing: CONTRIBUTING.rst
- Funding: .github/FUNDING.yml
- License: LICENSE
- Authors: AUTHORS.rst
Awesome Lists containing this project
- awesome-github-repos - toluaina/pgsync - Postgres to Elasticsearch/OpenSearch sync (Python)
- awesome-hacking-lists - toluaina/pgsync - Postgres to Elasticsearch/OpenSearch sync (Python)
README
# PGSync
### Real-time PostgreSQL to Elasticsearch/OpenSearch sync
**Keep your relational database as the source of truth while powering lightning-fast search**
[](https://pypi.org/project/pgsync)
[](https://pypi.org/project/pgsync)
[](https://github.com/toluaina/pgsync/actions)
[](https://codecov.io/gh/toluaina/pgsync)
[](https://pypi.org/project/pgsync)
[](LICENSE)
[](https://hub.docker.com/r/toluaina1/pgsync)
[](https://github.com/psf/black)
[Website](https://pgsync.com) ยท [Documentation](https://pgsync.com) ยท [Examples](examples/) ยท [Report Bug](https://github.com/toluaina/pgsync/issues)
---
## What is PGSync?
PGSync is a change data capture tool that syncs data from **PostgreSQL**, **MySQL**, or **MariaDB** to **Elasticsearch** or **OpenSearch** in real-time. Define your document structure in JSON, and PGSync handles the rest โ no custom code required.
```mermaid
flowchart LR
subgraph Source["๐๏ธ Source Database"]
DB[(PostgreSQL
MySQL
MariaDB)]
end
subgraph CDC["โก Change Data Capture"]
P[PGSync]
end
subgraph Search["๐ Search Engine"]
ES[(Elasticsearch
OpenSearch)]
end
DB -->|WAL / Binlog| P
P -->|Bulk Index| ES
```
### Key Features
| Feature | Description |
|---------|-------------|
| **Real-time sync** | Changes propagate instantly via logical replication |
| **Zero code** | Define mappings in JSON โ no ETL pipelines to build |
| **Nested documents** | Automatically denormalize complex relationships |
| **Fault tolerant** | Resumes from checkpoints after crashes |
| **Transactionally consistent** | Documents appear in commit order |
| **Minimal overhead** | Lightweight CDC with negligible database impact |
---
## Quick Start
### Using Docker (Fastest)
```bash
docker run --rm -it \
-e PG_URL=postgres://user:pass@host/db \
-e ELASTICSEARCH_URL=http://localhost:9200 \
-e REDIS_HOST=localhost \
-v "$(pwd)/schema.json:/app/schema.json" \
toluaina1/pgsync:latest -c schema.json -d -b
```
### Using pip
```bash
pip install pgsync
```
```bash
# Bootstrap (one-time setup)
bootstrap --config schema.json
# Run sync
pgsync --config schema.json -d
```
### Using Docker Compose
```bash
git clone https://github.com/toluaina/pgsync
cd pgsync
docker-compose up
```
---
## How It Works
**1. Define your schema** โ Map tables to document structure:
```json
{
"table": "book",
"columns": ["isbn", "title", "description"],
"children": [{
"table": "author",
"columns": ["name"]
}]
}
```
**2. PGSync generates optimized queries** โ Complex JOINs handled automatically:
```sql
SELECT JSON_BUILD_OBJECT(
'isbn', book.isbn,
'title', book.title,
'authors', (SELECT JSON_AGG(author.name) FROM author ...)
) FROM book
```
**3. Get denormalized documents** โ Ready for search:
```json
{
"isbn": "9785811243570",
"title": "Charlie and the Chocolate Factory",
"authors": ["Roald Dahl"]
}
```
Changes to any related table automatically update the document in Elasticsearch/OpenSearch.
---
## Requirements
| Component | Version |
|-----------|---------|
|  | 3.9+ |
|  | 9.6+ (or MySQL 5.7.22+ / MariaDB 10.5+) |
|  | 6.3.1+ (or OpenSearch 1.3.7+) |
|  | 3.1+ (or Valkey 7.2+) โ optional in WAL mode |
---
## Database Setup
PostgreSQL
Enable logical decoding in `postgresql.conf`:
```ini
wal_level = logical
max_replication_slots = 1
```
Optionally limit WAL size:
```ini
max_slot_wal_keep_size = 100GB
```
MySQL / MariaDB
Enable binary logging in `my.cnf`:
```ini
server-id = 1
log_bin = mysql-bin
binlog_row_image = FULL
binlog_expire_logs_seconds = 604800
```
Create replication user:
```sql
CREATE USER 'replicator'@'%' IDENTIFIED WITH mysql_native_password BY 'password';
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;
```
---
## Example
Consider a book library with related authors:
| **Book** | | | |
|----------|--|--|--|
| isbn *(PK)* | title | description |
| 9785811243570 | Charlie and the Chocolate Factory | Willy Wonka's famous... |
| 9781471331435 | 1984 | George Orwell's chilling... |
| **Author** | |
|------------|--|
| id *(PK)* | name |
| 1 | Roald Dahl |
| 4 | George Orwell |
PGSync transforms this into search-ready documents:
```json
[
{
"isbn": "9785811243570",
"title": "Charlie and the Chocolate Factory",
"authors": ["Roald Dahl"]
},
{
"isbn": "9781471331435",
"title": "1984",
"authors": ["George Orwell"]
}
]
```
**Any change** โ updating an author's name, adding a new book, deleting a relationship โ is automatically synced.
---
## Why PGSync?
| Challenge | PGSync Solution |
|-----------|-----------------|
| Dual writes are error-prone | Captures changes from WAL โ single source of truth |
| Complex JOIN queries | Auto-generates optimized SQL from your schema |
| Nested document updates | Detects changes in any related table |
| Data consistency | Transactionally consistent, ordered delivery |
| Crash recovery | Checkpoint-based resumption |
---
## Environment Variables
Full list at [pgsync.com/env-vars](https://pgsync.com/env-vars)
| Variable | Description |
|----------|-------------|
| `PG_URL` | PostgreSQL connection string |
| `ELASTICSEARCH_URL` | Elasticsearch/OpenSearch URL |
| `REDIS_HOST` | Redis/Valkey host |
| `REDIS_CHECKPOINT` | Use Redis for checkpoints (recommended for production) |
---
## One-Click Deploy
[](https://cloud.digitalocean.com/apps/new?repo=https://github.com/toluaina/pgsync/tree/main)
---
## Sponsors
---
## Contributing
Contributions welcome! See [CONTRIBUTING.rst](CONTRIBUTING.rst) for guidelines.
## License
[MIT](LICENSE) โ use it freely in your projects.