{"id":28501718,"url":"https://github.com/dutymate/mongo2dynamo","last_synced_at":"2026-04-30T12:35:20.231Z","repository":{"id":297924277,"uuid":"996866723","full_name":"dutymate/mongo2dynamo","owner":"dutymate","description":"A command-line tool for migrating data from MongoDB to DynamoDB","archived":false,"fork":false,"pushed_at":"2026-01-26T13:33:48.000Z","size":9163,"stargazers_count":1,"open_issues_count":5,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-30T12:35:03.970Z","etag":null,"topics":["aws","cli","dynamodb","etl","go","golang","migration","migration-tool","migrator","mongo","mongodb","tool"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dutymate.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-05T15:26:45.000Z","updated_at":"2026-01-19T17:20:41.000Z","dependencies_parsed_at":"2025-07-07T06:19:20.905Z","dependency_job_id":"dea5bd33-bbcb-4e72-8636-ddb1c66c1d09","html_url":"https://github.com/dutymate/mongo2dynamo","commit_stats":null,"previous_names":["dutymate/mongo2dynamo"],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/dutymate/mongo2dynamo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dutymate%2Fmongo2dynamo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dutymate%2Fmongo2dynamo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dutymate%2Fmongo2dynamo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dutymate%2Fmongo2dynamo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dutymate","download_url":"https://codeload.github.com/dutymate/mongo2dynamo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dutymate%2Fmongo2dynamo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32465009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T22:27:22.272Z","status":"online","status_checked_at":"2026-04-30T02:00:05.929Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","cli","dynamodb","etl","go","golang","migration","migration-tool","migrator","mongo","mongodb","tool"],"created_at":"2025-06-08T16:07:36.955Z","updated_at":"2026-04-30T12:35:20.209Z","avatar_url":"https://github.com/dutymate.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mongo2dynamo\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/logo.png\" alt=\"mongo2dynamo Logo\" width=\"200\"/\u003e\n\u003c/p\u003e\n\n**mongo2dynamo** is a high-performance, command-line tool for migrating data from MongoDB to DynamoDB.\n\n[![Build](https://github.com/dutymate/mongo2dynamo/actions/workflows/build.yaml/badge.svg)](https://github.com/dutymate/mongo2dynamo/actions/workflows/build.yaml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n\n- [Features](#features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Configuration](#configuration)\n- [Commands](#commands)\n- [How It Works](#how-it-works)\n- [License](#license)\n\n## Features\n\nmongo2dynamo is designed for efficient and reliable data migration, incorporating several key features for performance and stability.\n\n-   **High-Performance Transformation**: Utilizes a **dynamic worker pool** that scales based on CPU cores (from 2 to 2x `runtime.NumCPU()`) with real-time workload monitoring. Workers auto-scale every 500ms based on pending jobs, maximizing parallel processing efficiency.\n-   **Optimized Memory Management**: Implements strategic memory allocation - extractor uses `ChunkPool` for efficient slice reuse during document streaming, while transformer uses direct allocation with pre-calculated capacity for optimal performance based on benchmarking.\n-   **Advanced Backpressure Control**: Features an **optimized backpressure mechanism** that automatically manages data flow between pipeline stages, preventing memory overflow and ensuring stable performance under high load conditions.\n-   **Robust Loading Mechanism**: Implements a reliable data loading strategy for DynamoDB using the `BatchWriteItem` API with a **concurrent worker pool**. Features **Exponential Backoff with Jitter** algorithm to automatically handle DynamoDB throttling exceptions, ensuring smooth migration process.\n-   **Memory-Efficient Extraction**: Employs a streaming approach to extract data from MongoDB in configurable chunks (default: 2000 documents), minimizing memory footprint even with large datasets. Supports MongoDB query filters and projections for selective migration.\n-   **Intelligent Field Processing**: Removes framework metadata (`__v`, `_class`) while preserving all other fields including `_id`. Pre-calculates output document capacity to minimize memory allocations during transformation.\n-   **Fine-Grained Error Handling**: Defines domain-specific custom error types for each stage of the ETL process (Extract, Transform, Load). This enables precise error identification and facilitates targeted recovery logic.\n-   **Comprehensive CLI**: Built with `Cobra`, providing a user-friendly command-line interface with `plan` (dry-run) and `apply` commands, flexible configuration options (flags, env vars, config file), and an `--auto-approve` flag for non-interactive execution.\n-   **Automatic Table Management**: Automatically creates DynamoDB tables if they don't exist, with user confirmation prompts (unless auto-approved). **Supports custom primary keys (Partition and Sort Keys).** Waits for table activation before proceeding with migration.\n-   **Real-Time Progress Tracking**: Provides visual progress indicators with real-time status updates, processing rate, and estimated completion time. Progress display can be disabled with `--no-progress` flag for non-interactive environments.\n-   **Prometheus Metrics**: Built-in monitoring with Prometheus-compatible metrics for real-time performance tracking, including document processing rates, error counts, migration duration, and worker pool utilization. Metrics server can be enabled with `--metrics-enabled` flag.\n-   **Shell Completion**: Interactive command-line completion support for bash, zsh, fish, and PowerShell, providing intelligent suggestions for commands, flags, and options to enhance CLI usability.\n\n## Installation\n\n### Homebrew\n\n```bash\nbrew tap dutymate/tap\nbrew install mongo2dynamo\n```\n\n### Download Binary\n\nDownload the latest release from the [releases page](https://github.com/dutymate/mongo2dynamo/releases).\n\n### Build from Source\n\n```bash\ngit clone https://github.com/dutymate/mongo2dynamo.git\ncd mongo2dynamo\nmake build\n```\n\n## Quick Start\n\n```bash\n# Preview migration\nmongo2dynamo plan --mongo-db mydb --mongo-collection users\n\n# Execute migration with a custom primary key (Partition + Sort Key)\nmongo2dynamo apply --mongo-db mydb --mongo-collection events \\\n  --dynamo-table user-events \\\n  --dynamo-partition-key event_id \\\n  --dynamo-partition-key-type S \\\n  --dynamo-sort-key timestamp \\\n  --dynamo-sort-key-type N\n\n# With filter and auto-approve\nmongo2dynamo apply --mongo-db mydb --mongo-collection users \\\n  --mongo-filter '{\"status\": \"active\"}' \\\n  --auto-approve\n\n# With projection to select specific fields (default excludes __v and _class)\nmongo2dynamo apply --mongo-db mydb --mongo-collection users \\\n  --mongo-projection '{\"name\": 1, \"email\": 1}' \\\n  --auto-approve\n\n# Disable progress display for non-interactive environments\nmongo2dynamo apply --mongo-db mydb --mongo-collection users \\\n  --no-progress\n\n# Enable Prometheus metrics for monitoring\nmongo2dynamo apply --mongo-db mydb --mongo-collection users \\\n  --metrics-enabled \\\n  --metrics-addr :2112\n\n# Enable shell completion for better CLI experience\nmongo2dynamo completion zsh | source  # For zsh\n# mongo2dynamo completion bash | source  # For bash\n```\n\n## Configuration\n\nConfiguration can be provided via command-line flags, environment variables, or a YAML configuration file. The order of precedence is:\n1. Command-Line Flags\n2. Environment Variables\n3. Configuration File\n4. Default Values\n\n### Command-Line Flags\n\n**MongoDB Flags**\n\n| Flag | Description | Default |\n| --- | --- | --- |\n| `--mongo-host` | MongoDB host. | `localhost` |\n| `--mongo-port` | MongoDB port. | `27017` |\n| `--mongo-user` | MongoDB username. | ` ` |\n| `--mongo-password` | MongoDB password. | ` ` |\n| `--mongo-db` | **(Required)** MongoDB database name. | ` ` |\n| `--mongo-collection` | **(Required)** MongoDB collection name. | ` ` |\n| `--mongo-filter` | MongoDB query filter as a JSON string. | ` ` |\n| `--mongo-projection` | MongoDB projection as a JSON string to select specific fields. | `{\"__v\":0,\"_class\":0}` |\n\n**DynamoDB Flags**\n\n| Flag | Description | Default |\n| --- | --- | --- |\n| `--dynamo-endpoint` | DynamoDB endpoint. | `http://localhost:8000` |\n| `--dynamo-table` | DynamoDB table name. | MongoDB collection name |\n| `--dynamo-partition-key` | The attribute name for the partition key. | `_id` |\n| `--dynamo-partition-key-type` | The attribute type for the partition key (S, N, B). | `S` |\n| `--dynamo-sort-key` | The attribute name for the sort key. (Optional) | ` ` |\n| `--dynamo-sort-key-type` | The attribute type for the sort key (S, N, B). | `S` |\n| `--aws-region` | AWS region. | `us-east-1` |\n| `--max-retries` | Maximum retries for failed DynamoDB batch writes. | `5` |\n\n**Control Flags**\n\n| Flag | Description | Default |\n| --- | --- | --- |\n| `--auto-approve` | Skip all confirmation prompts (applies only to the apply command). | `false` |\n| `--no-progress` | Disable progress display during migration. | `false` |\n\n**Monitoring Flags**\n\n| Flag | Description | Default |\n| --- | --- | --- |\n| `--metrics-enabled` | Enable Prometheus metrics server for monitoring. | `false` |\n| `--metrics-addr` | Address for the metrics server to listen on. | `:2112` |\n\n### Environment Variables\n\n```bash\nexport MONGO2DYNAMO_MONGO_HOST=localhost\nexport MONGO2DYNAMO_MONGO_PORT=27017\nexport MONGO2DYNAMO_MONGO_USER=your_username\nexport MONGO2DYNAMO_MONGO_PASSWORD=your_password\nexport MONGO2DYNAMO_MONGO_DB=your_database\nexport MONGO2DYNAMO_MONGO_COLLECTION=your_collection\nexport MONGO2DYNAMO_MONGO_FILTER='{\"status\": \"active\"}'\nexport MONGO2DYNAMO_MONGO_PROJECTION='{\"__v\":0,\"_class\":0}'\nexport MONGO2DYNAMO_DYNAMO_ENDPOINT=http://localhost:8000\nexport MONGO2DYNAMO_DYNAMO_TABLE=your_table\nexport MONGO2DYNAMO_DYNAMO_PARTITION_KEY=_id\nexport MONGO2DYNAMO_DYNAMO_PARTITION_KEY_TYPE=S\nexport MONGO2DYNAMO_DYNAMO_SORT_KEY=timestamp\nexport MONGO2DYNAMO_DYNAMO_SORT_KEY_TYPE=N\nexport MONGO2DYNAMO_AWS_REGION=us-east-1\nexport MONGO2DYNAMO_MAX_RETRIES=5\nexport MONGO2DYNAMO_AUTO_APPROVE=false\nexport MONGO2DYNAMO_NO_PROGRESS=false\nexport MONGO2DYNAMO_METRICS_ENABLED=false\nexport MONGO2DYNAMO_METRICS_ADDR=:2112\n```\n\n### Config File\n\nCreate `~/.mongo2dynamo/config.yaml`:\n\n```yaml\nmongo_host: localhost\nmongo_port: 27017\nmongo_user: your_username\nmongo_password: your_password\nmongo_db: your_database\nmongo_collection: your_collection\nmongo_filter: '{\"status\": \"active\"}'\nmongo_projection: '{\"__v\":0,\"_class\":0}'\ndynamo_endpoint: http://localhost:8000\ndynamo_table: your_table\ndynamo_partition_key: _id\ndynamo_partition_key_type: S\ndynamo_sort_key: timestamp\ndynamo_sort_key_type: N\naws_region: us-east-1\nmax_retries: 5\nauto_approve: false\nno_progress: false\nmetrics_enabled: false\nmetrics_addr: \":2112\"\n```\n\n## Commands\n\n### `plan` - Preview Migration\n\nPerforms a dry-run to preview the migration by executing the full ETL pipeline without loading to DynamoDB.\n\n**Features:**\n- Connects to MongoDB and validates configuration.\n- Extracts documents from MongoDB (with filters and projections if specified).\n- Transforms documents to DynamoDB format using dynamic worker pools with backpressure control.\n- Counts the total number of documents that would be migrated.\n- No data is loaded to DynamoDB (dry-run mode).\n- Provides Prometheus metrics when enabled (document counts, processing rates, error tracking, worker pool utilization).\n\n**Example Output:**\n```text\nStarting migration plan analysis...\n▶ 904,000/2,000,000 items (45.2%) | 120,000 items/sec | 9s left\nFound 2,000,000 documents to migrate.\n```\n\n### `apply` - Execute Migration\n\nExecutes the complete ETL pipeline to migrate data from MongoDB to DynamoDB.\n\n**Features:**\n- Full ETL pipeline execution (Extract → Transform → Load).\n- Configuration validation and user confirmation prompts.\n- Automatic DynamoDB table creation (with confirmation).\n- Batch processing with optimized chunk sizes (1000 documents per MongoDB batch, 2000 documents per extraction chunk, 25 documents per DynamoDB batch, concurrent loader workers).\n- Dynamic worker pool scaling with intelligent backpressure control for optimal performance.\n- Retry logic for failed operations (configurable via `--max-retries`).\n- Real-time Prometheus metrics for monitoring migration progress, performance, error rates, and worker pool efficiency.\n\n**Example Output:**\n```text\nCreating DynamoDB table 'users'...\nWaiting for table 'users' to become active...\nTable 'users' is now active and ready for use.\nStarting data migration from MongoDB to DynamoDB...\n▶ 904,000/2,000,000 items (45.2%) | 20,000 items/sec | 54s left\nSuccessfully migrated 2,000,000 documents.\n```\n\n### `version` - Show Version\n\nDisplays version information including Git commit and build date.\n\n### `completion` - Generate Shell Completion\n\nGenerates shell completion scripts for interactive command-line usage.\n\n**Supported Shells:**\n- **Bash**: `mongo2dynamo completion bash`\n- **Zsh**: `mongo2dynamo completion zsh`\n- **Fish**: `mongo2dynamo completion fish`\n- **PowerShell**: `mongo2dynamo completion powershell`\n\n**Usage Examples:**\n\n**Bash:**\n```bash\n# Load completion for current session\nsource \u003c(mongo2dynamo completion bash)\n\n# Load completion permanently (add to ~/.bashrc)\nmongo2dynamo completion bash \u003e ~/.mongo2dynamo/completion.bash\necho \"source ~/.mongo2dynamo/completion.bash\" \u003e\u003e ~/.bashrc\n```\n\n**Zsh:**\n```bash\n# Load completion for current session\nsource \u003c(mongo2dynamo completion zsh)\n\n# Load completion permanently (add to ~/.zshrc)\nmongo2dynamo completion zsh \u003e ~/.mongo2dynamo/completion.zsh\necho \"source ~/.mongo2dynamo/completion.zsh\" \u003e\u003e ~/.zshrc\n```\n\n**Fish:**\n```bash\n# Load completion for current session\nmongo2dynamo completion fish | source\n\n# Load completion permanently\nmongo2dynamo completion fish \u003e ~/.config/fish/completions/mongo2dynamo.fish\n```\n\n**PowerShell:**\n```powershell\n# Load completion for current session\nmongo2dynamo completion powershell | Out-String | Invoke-Expression\n\n# Load completion permanently (add to PowerShell profile)\nmongo2dynamo completion powershell \u003e $PROFILE\\mongo2dynamo-completion.ps1\nAdd-Content $PROFILE \"\u0026 '$PROFILE\\mongo2dynamo-completion.ps1'\"\n```\n\n## How It Works\n\nmongo2dynamo follows a standard Extract, Transform, Load (ETL) architecture with parallel processing capabilities. Each stage is designed to perform its task efficiently and reliably.\n\n### Monitoring and Metrics\n\nWhen metrics are enabled (`--metrics-enabled`), mongo2dynamo provides comprehensive Prometheus-compatible metrics for real-time monitoring:\n\n- **Document Processing Metrics**: Total documents, processed documents, and processing rates\n- **Error Tracking**: Transformation errors, loading errors, and error rates by type\n- **Performance Metrics**: Migration duration, throughput, and worker pool utilization\n- **Migration Status**: Success/failure status and completion tracking\n- **Worker Pool Metrics**: Active workers, queue depth, and backpressure status\n- **Pipeline Health**: Channel buffer usage and data flow monitoring\n\nThe metrics server runs on the specified address (default: `:2112`) and can be scraped by Prometheus or other monitoring systems for comprehensive observability during migration operations.\n\n### Pipeline Architecture\n- **Parallel Processing**: The ETL stages run concurrently using Go channels with a buffer size of 10, allowing extraction, transformation, and loading to happen simultaneously for maximum throughput.\n- **Strategic Memory Optimization**: Components use independent memory strategies optimized for their specific workloads - extractor leverages `ChunkPool` for slice reuse, while transformer uses direct allocation for maximum speed.\n- **Advanced Backpressure Control**: Implements intelligent backpressure mechanisms that automatically manage data flow between pipeline stages, preventing memory overflow and ensuring stable performance under high load conditions.\n\n```mermaid\n%%{init: { 'theme': 'neutral' } }%%\nflowchart LR\n    subgraph Input Source\n        MongoDB[(fa:fa-database MongoDB)]\n    end\n\n    subgraph mongo2dynamo\n        direction LR\n        subgraph \"Extract\"\n            Extractor(\"fa:fa-cloud-download Extractor\u003cbr/\u003e\u003cbr/\u003eStreams documents\u003cbr/\u003efrom the source collection.\")\n        end\n        \n        subgraph \"Transform\"\n            Transformer(\"fa:fa-cogs Transformer\u003cbr/\u003e\u003cbr/\u003eUses a dynamic worker pool to process documents in parallel.\")\n        end\n\n        subgraph \"Load\"\n            Loader(\"fa:fa-upload Loader\u003cbr/\u003e\u003cbr/\u003eWrites data using BatchWriteItem API.\u003cbr/\u003eHandles throttling with\u003cbr/\u003eExponential Backoff + Jitter.\")\n        end\n    end\n\n    subgraph Output Target\n        DynamoDB[(fa:fa-database DynamoDB)]\n    end\n\n    MongoDB -- Documents --\u003e Extractor\n    Extractor -- Raw Documents --\u003e Transformer\n    Transformer -- Transformed Items --\u003e Loader\n    Loader -- Batched Items --\u003e DynamoDB\n\n    style Extractor fill:#e6f3ff,stroke:#333\n    style Transformer fill:#fff2e6,stroke:#333\n    style Loader fill:#e6ffed,stroke:#333\n```\n\n### 1. Extraction\n- Connects to MongoDB using optimized connection settings with configurable batch sizes (default: 1000 documents per batch).\n- Uses a streaming approach with `ChunkPool` memory reuse to handle large datasets efficiently.\n- Processes documents in configurable chunks (default: 2000 documents) to maintain low memory footprint.\n- Applies user-defined filters (`--mongo-filter`) with JSON-to-BSON conversion for selective data migration.\n- Applies default projection to exclude framework metadata (`__v`, `_class`) unless overridden by `--mongo-projection`.\n- Implements robust error handling for connection, decode, and cursor operations.\n\n### 2. Transformation\n- Utilizes a **dynamic worker pool** starting with CPU core count, scaling up to 2x CPU cores based on workload.\n- **Intelligent scaling**: Workers auto-adjust every 500ms with optimized thresholds (scale up at 80% load, scale down at 30% load).\n- **Bidirectional scaling**: Automatically scales down when workload decreases to optimize resource usage.\n- **Advanced backpressure control**: Implements optimized backpressure mechanisms that automatically manage data flow, preventing memory overflow and ensuring stable performance.\n- **Memory optimization**: Pre-calculates field counts to allocate maps with optimal capacity, reducing garbage collection overhead.\n- **Field processing**: Preserves all fields including `_id` with intelligent type handling (ObjectID → hex, bson.M → JSON). Framework metadata (`__v`, `_class`) is excluded by default via MongoDB projection.\n- Implements panic recovery and comprehensive error reporting for worker failures.\n\n### 3. Loading\n- Uses a **concurrent worker pool** to maximize DynamoDB throughput with parallel batch processing.\n- Groups documents into optimal batches of 25 items per `BatchWriteItem` request (DynamoDB limit).\n- **Advanced retry logic**: Implements exponential backoff with jitter (100ms to 30s) for unprocessed items, with configurable max retries (default: 5).\n- **Automatic table management**: Creates tables with hash key schema if they don't exist, waits for table activation.\n- Handles context cancellation gracefully across all worker goroutines.\n\n## License\n\nLicensed under the [MIT License](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdutymate%2Fmongo2dynamo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdutymate%2Fmongo2dynamo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdutymate%2Fmongo2dynamo/lists"}