https://github.com/cameronnewman/redis-dumper

Simple tool to dump data from Redis into a DuckDB queryable format
https://github.com/cameronnewman/redis-dumper
csv duckdb export golang parquet redis
Last synced: 7 months ago
JSON representation
Simple tool to dump data from Redis into a DuckDB queryable format
Host: GitHub
URL: https://github.com/cameronnewman/redis-dumper
Owner: cameronnewman
License: mit
Created: 2025-07-29T02:23:52.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-07-29T04:33:43.000Z (8 months ago)
Last Synced: 2025-09-01T06:09:27.231Z (7 months ago)
Topics: csv, duckdb, export, golang, parquet, redis
Language: Go
Homepage:
Size: 30.3 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # Redis Dumper

A high-performance tool for exporting Redis data to CSV or Parquet format with Hive-style partitioning for optimal DuckDB querying.

## Features

- Export Redis data to CSV or Parquet format

- Memory-efficient streaming for large datasets

- Hive-style partitioning for efficient querying

- Support for all Redis data types (strings, hashes, sets, sorted sets, lists)

- Configurable batch sizes and file rotation

- TLS/SSL support

- DuckDB-optimized output format

## Installation

```bash

go install github.com/cameronnewman/redis-dumper/cmd/dumper@latest

```

Or build from source:

```bash

git clone https://github.com/cameronnewman/redis-dumper.git

cd redis-dumper

go build -o dumper ./cmd/dumper

```

## Usage

The tool uses subcommands and environment variables for configuration.

### Commands

- `keys-only` - Export only key metadata (recommended for large datasets)

- `pattern` - Export full data for keys matching a pattern

- `full` - Export all data (use with caution on large datasets)

### Basic Usage

Export only key metadata:

```bash

dumper keys-only

```

Export keys matching a pattern:

```bash

dumper pattern "user:*"

```

Export all data:

```bash

dumper full

```

### Using Environment Variables

Configure via environment variables:

```bash

export REDIS_URL=redis://localhost:6379/0

export OUTPUT_DIR=./export

export OUTPUT_FORMAT=csv

export BATCH_SIZE=5000

dumper keys-only

```

Or inline:

```bash

REDIS_URL=redis://localhost:6379 OUTPUT_DIR=./export dumper pattern "session:*"

```

### TLS/SSL Support

For Redis with TLS:

```bash

REDIS_URL=rediss://user:pass@redis.example.com:6380/0 dumper keys-only

```

Or manually enable TLS:

```bash

export REDIS_URL=redis://redis.example.com:6380

export ENABLE_TLS=true

export SKIP_TLS_VERIFY=true

dumper keys-only

```

## Configuration

All configuration is done through environment variables:

| Variable | Description | Default |

|----------|-------------|---------|

| `REDIS_URL` | Redis connection URL | `redis://localhost:6379/0` |

| `OUTPUT_DIR` | Output directory path | `/tmp/dumper` |

| `OUTPUT_FORMAT` | Output format: csv or parquet | `parquet` |

| `BATCH_SIZE` | Number of keys to process in each batch | `1000` |

| `MAX_RECORDS_PER_FILE` | Maximum records per file before rotation | `100000` |

| `ENABLE_TLS` | Enable TLS connection | `false` |

| `SKIP_TLS_VERIFY` | Skip TLS certificate verification | `true` |

### Redis URL Schemes

- `redis://` - Plain connection

- `rediss://` - TLS connection (automatically enables TLS)

## Output Format

Data is exported with Hive-style partitioning:

```

output/

├── year=2024/

│   └── month=01/

│       └── day=15/

│           └── hour=14/

│               ├── redis_data_part_0001.csv

│               └── redis_data_part_0002.csv

└── export_metadata.json

```

### Schema

All Redis data is exported with a unified schema:

| Column | Type | Description |

|--------|------|-------------|

| key | string | Redis key |

| type | string | Redis data type |

| value | string | Serialized value |

| ttl_seconds | int64 | TTL in seconds (-1 if no TTL) |

| exported_at | string | Export timestamp |

| partition_id | int | Partition identifier |

### Parquet Schema Details

The Parquet files use the following schema definition:

```

message redis_data {

  optional binary key (STRING);

  optional binary type (STRING);

  optional binary value (STRING);

  optional int64 ttl_seconds;

  optional binary exported_at (STRING);

  optional int32 partition_id;

}

```

### Data Type Representations

Different Redis data types are stored in the unified schema as follows:

#### Strings

- **key**: Original Redis key (e.g., `"user:123"`)

- **type**: `"string"`

- **value**: The actual string value

#### Hashes

- **key**: `"{original_key}:field:{field_name}"` (e.g., `"user:123:field:email"`)

- **type**: `"hash_field"`

- **value**: The field's value

#### Sets

- **key**: `"{original_key}:member:{member_value}"` (e.g., `"tags:member:golang"`)

- **type**: `"set_member"`

- **value**: The member value

#### Sorted Sets (ZSets)

- **key**: `"{original_key}:member:{member_value}"` (e.g., `"leaderboard:member:player1"`)

- **type**: `"zset_member"`

- **value**: `"score={score},rank={rank}"` (e.g., `"score=95.5,rank=0"`)

#### Lists

- **key**: `"{original_key}:index:{index}"` (e.g., `"queue:index:0"`)

- **type**: `"list_item"`

- **value**: The item value

## Querying with DuckDB

### Basic Queries

Query all exported data:

```sql

-- For Parquet files

SELECT * FROM read_parquet('output/**/*.parquet');

-- For CSV files

SELECT * FROM read_csv('output/**/*.csv');

```

Count by data type:

```sql

SELECT type, COUNT(*) as count 

FROM read_parquet('output/**/*.parquet')

GROUP BY type

ORDER BY count DESC;

```

### Querying String Keys

Find all string values:

```sql

SELECT key, value, ttl_seconds 

FROM read_parquet('output/**/*.parquet')

WHERE type = 'string'

LIMIT 10;

```

### Querying Hash Fields

Get all fields for a specific hash:

```sql

-- Extract the original key and field name

SELECT 

    SPLIT_PART(key, ':field:', 1) as hash_key,

    SPLIT_PART(key, ':field:', 2) as field_name,

    value

FROM read_parquet('output/**/*.parquet')

WHERE type = 'hash_field'

  AND key LIKE 'user:123:field:%'

ORDER BY field_name;

```

Reconstruct hash objects:

```sql

-- Group hash fields into JSON objects

SELECT 

    SPLIT_PART(key, ':field:', 1) as hash_key,

    MAP_FROM_ENTRIES(

        ARRAY_AGG(

            ROW(

                SPLIT_PART(key, ':field:', 2),

                value

            )

        )

    ) as fields

FROM read_parquet('output/**/*.parquet')

WHERE type = 'hash_field'

GROUP BY SPLIT_PART(key, ':field:', 1)

LIMIT 5;

```

### Querying Sets

Get all members of a specific set:

```sql

SELECT 

    SPLIT_PART(key, ':member:', 1) as set_key,

    value as member

FROM read_parquet('output/**/*.parquet')

WHERE type = 'set_member'

  AND key LIKE 'tags:member:%'

ORDER BY member;

```

Count members per set:

```sql

SELECT 

    SPLIT_PART(key, ':member:', 1) as set_key,

    COUNT(*) as member_count

FROM read_parquet('output/**/*.parquet')

WHERE type = 'set_member'

GROUP BY SPLIT_PART(key, ':member:', 1)

ORDER BY member_count DESC;

```

Find sets containing a specific member:

```sql

SELECT DISTINCT SPLIT_PART(key, ':member:', 1) as set_key

FROM read_parquet('output/**/*.parquet')

WHERE type = 'set_member'

  AND value = 'golang';

```

### Querying Sorted Sets

Get leaderboard with scores:

```sql

SELECT 

    SPLIT_PART(key, ':member:', 1) as zset_key,

    SPLIT_PART(key, ':member:', 2) as member,

    CAST(SPLIT_PART(SPLIT_PART(value, 'score=', 2), ',', 1) AS DOUBLE) as score,

    CAST(SPLIT_PART(value, 'rank=', 2) AS INTEGER) as rank

FROM read_parquet('output/**/*.parquet')

WHERE type = 'zset_member'

  AND key LIKE 'leaderboard:%'

ORDER BY score DESC;

```

### Querying Lists

Get list items in order:

```sql

SELECT 

    SPLIT_PART(key, ':index:', 1) as list_key,

    CAST(SPLIT_PART(key, ':index:', 2) AS INTEGER) as index,

    value

FROM read_parquet('output/**/*.parquet')

WHERE type = 'list_item'

  AND key LIKE 'queue:%'

ORDER BY list_key, index;

```

### Advanced Queries

Find keys expiring soon:

```sql

SELECT key, type, ttl_seconds, 

    ttl_seconds / 3600.0 as hours_remaining

FROM read_parquet('output/**/*.parquet')

WHERE ttl_seconds > 0 

  AND ttl_seconds < 3600  -- Expiring within 1 hour

ORDER BY ttl_seconds;

```

Analyze data distribution by partition:

```sql

SELECT 

    partition_id,

    COUNT(*) as record_count,

    COUNT(DISTINCT SPLIT_PART(key, ':', 1)) as unique_key_prefixes

FROM read_parquet('output/**/*.parquet')

GROUP BY partition_id

ORDER BY partition_id;

```

Export query results:

```sql

-- Export filtered data to a new Parquet file

COPY (

    SELECT * FROM read_parquet('output/**/*.parquet')

    WHERE type = 'hash_field' AND key LIKE 'user:%'

) TO 'user_hashes.parquet' (FORMAT 'parquet');

```

## Setting Up DuckDB

### Installation

Install DuckDB CLI:

**macOS:**

```bash

brew install duckdb

```

**Linux:**

```bash

wget https://github.com/duckdb/duckdb/releases/download/v0.10.0/duckdb_cli-linux-amd64.zip

unzip duckdb_cli-linux-amd64.zip

chmod +x duckdb

sudo mv duckdb /usr/local/bin/

```

**Windows:**

Download from [DuckDB releases](https://github.com/duckdb/duckdb/releases)

### Creating Views for Easy Querying

Start DuckDB and create persistent views:

```bash

# Start DuckDB with a persistent database

duckdb redis_export.db

```

In the DuckDB shell:

```sql

-- Create a view for all Redis data

CREATE VIEW redis_data AS 

SELECT * FROM read_parquet('./output/**/*.parquet');

-- Create views for each data type

CREATE VIEW redis_strings AS 

SELECT * FROM redis_data WHERE type = 'string';

CREATE VIEW redis_hashes AS 

SELECT 

    SPLIT_PART(key, ':field:', 1) as hash_key,

    SPLIT_PART(key, ':field:', 2) as field_name,

    value,

    ttl_seconds,

    exported_at

FROM redis_data 

WHERE type = 'hash_field';

CREATE VIEW redis_sets AS 

SELECT 

    SPLIT_PART(key, ':member:', 1) as set_key,

    value as member,

    ttl_seconds,

    exported_at

FROM redis_data 

WHERE type = 'set_member';

CREATE VIEW redis_zsets AS 

SELECT 

    SPLIT_PART(key, ':member:', 1) as zset_key,

    SPLIT_PART(key, ':member:', 2) as member,

    CAST(SPLIT_PART(SPLIT_PART(value, 'score=', 2), ',', 1) AS DOUBLE) as score,

    CAST(SPLIT_PART(value, 'rank=', 2) AS INTEGER) as rank,

    ttl_seconds,

    exported_at

FROM redis_data 

WHERE type = 'zset_member';

CREATE VIEW redis_lists AS 

SELECT 

    SPLIT_PART(key, ':index:', 1) as list_key,

    CAST(SPLIT_PART(key, ':index:', 2) AS INTEGER) as index,

    value,

    ttl_seconds,

    exported_at

FROM redis_data 

WHERE type = 'list_item';

-- Show available views

SHOW TABLES;

```

### Using the Views

Now you can query Redis data more easily:

```sql

-- Count records by type

SELECT COUNT(*) FROM redis_strings;

SELECT COUNT(*) FROM redis_hashes;

SELECT COUNT(*) FROM redis_sets;

-- Query specific hash

SELECT * FROM redis_hashes WHERE hash_key = 'user:123';

-- Find all sets containing a member

SELECT set_key FROM redis_sets WHERE member = 'golang';

-- Get top 10 from leaderboard

SELECT * FROM redis_zsets 

WHERE zset_key = 'leaderboard' 

ORDER BY score DESC 

LIMIT 10;

```

### Performance Tips

1. **Use Parquet format** - It's columnar and compressed, making queries much faster than CSV

2. **Partition your queries** - Use the partition_id or date filters when possible

3. **Create indexes** for frequently queried columns:

   ```sql

   CREATE INDEX idx_key_prefix ON redis_data (SPLIT_PART(key, ':', 1));

   ```

4. **Use EXPLAIN** to understand query plans:

   ```sql

   EXPLAIN SELECT * FROM redis_sets WHERE member = 'test';

   ```

## Running Locally

### Quick Start with Docker Compose

The easiest way to test Redis Dumper is using the included docker-compose setup with test data:

```bash

# Start Redis with test data

docker-compose up -d

# Wait for data to load

sleep 5

# Run the dumper on test data

go run ./cmd/dumper keys-only

# Or export full data for specific patterns

go run ./cmd/dumper pattern "user:*"

go run ./cmd/dumper pattern "product:*"

# Check the output

ls -la ./output/

```

The test data includes:

- String values (configs, sessions, cached pages)

- Hashes (user profiles, products, orders)

- Sets (tags, categories, user skills)

- Sorted sets (leaderboards, trending items, view counts)

- Lists (queues, logs, user history)

- Keys with TTLs

- Complex JSON structures

### Manual Docker Setup

Alternatively, you can manually start Redis:

```bash

# Start a Redis instance

docker run -d --name redis-local -p 6379:6379 redis:latest

# Add some test data

docker exec -it redis-local redis-cli SET mykey "Hello World"

docker exec -it redis-local redis-cli HSET user:123 name "Alice" email "alice@example.com"

docker exec -it redis-local redis-cli SADD fruits "apple" "banana" "orange"

# Run the dumper

go run ./cmd/dumper keys-only

```

### Full Export Example

```bash

export REDIS_URL=redis://localhost:6379

export OUTPUT_DIR=./local-export

export OUTPUT_FORMAT=parquet

export BATCH_SIZE=5000

# Export full data for all keys

go run ./cmd/dumper full

# Or export data matching a pattern

go run ./cmd/dumper pattern "user:*"

```

### Testing with DuckDB

After exporting the test data, you can explore it with DuckDB:

```bash

# View all data types in the export

duckdb -c "SELECT type, COUNT(*) as count FROM read_parquet('./output/**/*.parquet') GROUP BY type ORDER BY count DESC;"

# Check the leaderboard

duckdb -c "

  SELECT 

    SPLIT_PART(key, ':member:', 2) as player,

    CAST(SPLIT_PART(SPLIT_PART(value, 'score=', 2), ',', 1) AS DOUBLE) as score

  FROM read_parquet('./output/**/*.parquet')

  WHERE key LIKE 'leaderboard:global:%'

  ORDER BY score DESC;

"

# View user hashes

duckdb -c "

  SELECT 

    SPLIT_PART(key, ':field:', 1) as user,

    SPLIT_PART(key, ':field:', 2) as field,

    value

  FROM read_parquet('./output/**/*.parquet')

  WHERE type = 'hash_field' AND key LIKE 'user:%'

  ORDER BY user, field;

"

# Find all programming-related tags

duckdb -c "

  SELECT DISTINCT 

    SPLIT_PART(key, ':member:', 1) as set_name,

    value as tag

  FROM read_parquet('./output/**/*.parquet')

  WHERE type = 'set_member' 

    AND value IN ('python', 'golang', 'javascript', 'rust', 'java', 'typescript')

  ORDER BY set_name, tag;

"

```

## Development

### Requirements

- Go 1.21 or higher

- Docker (for running tests with make commands)

- Redis (for local testing)

- Make

### Building

```bash

make build

```

### Testing

```bash

make go-test

```

### Linting

```bash

make go-lint

```

### Formatting

```bash

make go-fmt

```

## Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on:

- Setting up your development environment

- Running tests and linting

- Submitting pull requests

- Reporting issues

## License

MIT License - see LICENSE file for details.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cameronnewman/redis-dumper

Awesome Lists containing this project

README