https://github.com/bdgscotland/omd_migrate
Migration tool for OpenMetadata - migrate data catalogs, lineage, and metadata between OpenMetadata instances
https://github.com/bdgscotland/omd_migrate
data-catalog-migration data-governance metadata-migration migration-tool omd openmetadata openmetadata-migration
Last synced: 2 months ago
JSON representation
Migration tool for OpenMetadata - migrate data catalogs, lineage, and metadata between OpenMetadata instances
- Host: GitHub
- URL: https://github.com/bdgscotland/omd_migrate
- Owner: bdgscotland
- License: mit
- Created: 2025-06-28T13:59:44.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-06-28T16:39:04.000Z (12 months ago)
- Last Synced: 2025-06-28T16:47:07.805Z (12 months ago)
- Topics: data-catalog-migration, data-governance, metadata-migration, migration-tool, omd, openmetadata, openmetadata-migration
- Language: Python
- Homepage:
- Size: 71.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# OpenMetadata Migration Tool
[](https://www.python.org/)
[](LICENSE)
[](https://pypi.org/project/openmetadata-ingestion/)
[](CONTRIBUTING.md)
[](https://github.com/psf/black)
[](.github/workflows/ci.yml)
[](https://github.com/PyCQA/bandit)
[](test_migration.py)
A flexible, customizable Python tool for exporting and importing OpenMetadata entities. Supports full backups, selective exports, and cross-instance migrations with clear NDJSON output format.
## Features
- **OpenMetadata SDK Integration**: Uses official OpenMetadata Python SDK for robust API interaction
- **Full Export/Import**: Backup and restore complete OpenMetadata instances
- **Selective Export**: Export specific entity types with `--entities` flag
- **Round-Trip Tested**: Verified export → import → validation workflow with real data
- **Relationship-Aware**: Maintains links between domains, data products, and assets
- **Flexible Configuration**: YAML config with environment variable overrides
- **Rich Console Output**: Beautiful progress indicators and informative logging
- **NDJSON Format**: Human-readable, editable export format
- **Version Flexible**: Configurable OpenMetadata SDK version support (defaults to 1.8.0+)
## Quick Start
### 1. Installation
**Option A: Automated Setup (Recommended)**
```bash
git clone
cd omd_migrate
./setup.sh
```
**Option B: Manual Installation**
```bash
git clone
cd omd_migrate
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
**Note**: The setup.sh script creates a virtual environment (`omd_venv`) and installs all dependencies automatically.
### 2. Configuration
The tool uses both `config.yaml` and `.env` files for configuration:
**Option A: Use .env file (recommended for credentials)**
```bash
cp .env.example .env
# Edit .env with your OpenMetadata server details
```
**Option B: Edit config.yaml directly**
```bash
# Edit config.yaml with your server URL and JWT token
```
### 3. Export Data
```bash
# Export all entities (based on config.yaml settings)
python export.py
# Selective export of specific entity types
python export.py --entities data_products --entities domains
# Clear previous exports before starting
python export.py --clear
# Export to custom directory
python export.py --output-dir /path/to/backup
# Combine options for targeted exports
python export.py --clear --entities data_products --entities domains --output-dir /backup/domains-only
```
### 4. Import Data
```bash
# Import all entities
python import.py
# Import from custom directory
python import.py --input-dir /path/to/backup
# Import specific entity type only
python import.py --entity-type domains
# Dry run (see what would be imported)
python import.py --dry-run
```
## Configuration
### Environment Variables (.env)
```bash
# Server Configuration
OPENMETADATA_SERVER_URL=http://your-openmetadata-server:8585/api
OPENMETADATA_JWT_TOKEN=your_jwt_token_here
# Export Configuration
EXPORT_OUTPUT_DIR=./exports
EXPORT_BATCH_SIZE=100
EXPORT_INCLUDE_DELETED=false
# Import Configuration
IMPORT_INPUT_DIR=./exports
IMPORT_UPDATE_EXISTING=true
IMPORT_SKIP_ON_ERROR=true
# Logging
LOG_LEVEL=INFO
```
### Selective Export
Configure selective exports in `config.yaml`:
```yaml
export:
selective:
# Export specific domains by name
domains: ["Finance", "Marketing"]
# Only export data products linked to specified domains
linked_data_products_only: true
# Only export assets (tables, topics, etc.) linked to domains/data products
linked_assets_only: true
```
### Entity Types
Supported entities for export (use with `--entities` flag):
**Core Entities:**
- `domains` - Business domains and subdomains
- `data_products` - Data products with domain relationships
- `teams` - Teams and users
- `users` - Individual users
- `policies` - Access policies
**Knowledge Management:**
- `glossaries` - Business glossaries
- `glossary_terms` - Glossary terms
**Data Assets:**
- `databases` - Database services and databases
- `database_schemas` - Database schemas
- `tables` - Data tables with lineage
**Additional Entity Types** (available via config.yaml):
- `topics` - Kafka topics and streams
- `dashboards` - BI dashboards
- `charts` - Dashboard charts
- `pipelines` - Data pipelines
- `ml_models` - Machine learning models
- `containers` - Data containers
- `stored_procedures` - Database procedures
- `dashboard_data_models` - Dashboard data models
- `search_indexes` - Search indexes
Example usage:
```bash
# Export core entities only
python export.py --entities domains --entities data_products --entities teams
# Export data assets
python export.py --entities databases --entities tables
```
## Examples
### Full Backup and Restore
```bash
# 1. Export everything from source instance
python export.py --config source-config.yaml --output-dir backup-2024-01-15
# 2. Import to target instance
python import.py --config target-config.yaml --input-dir backup-2024-01-15
```
### Selective Entity Export
Use command-line flags for targeted exports:
```bash
# Export only domains and data products
python export.py --clear --entities domains --entities data_products
# Export specific entities to custom location
python export.py --entities users --entities teams --output-dir /backup/identity
# Clear and export tables only
python export.py --clear --entities tables
```
### Domain-Specific Migration
Configure selective export in `config.yaml`:
```yaml
export:
selective:
domains: ["Data Science", "Analytics"]
linked_data_products_only: true
linked_assets_only: true
```
Then export and import:
```bash
python export.py # Exports only Data Science and Analytics domains + linked entities
python import.py --config target-config.yaml
```
### Cross-Instance Migration
```bash
# Export from production
OPENMETADATA_SERVER_URL=https://prod.your-company.com python export.py
# Import to staging
OPENMETADATA_SERVER_URL=https://staging.your-company.com python import.py
```
## Output Format
Exports are saved as NDJSON files (one JSON object per line):
```
exports/
├── domains.ndjson # Business domains
├── data_products.ndjson # Data products
├── teams.ndjson # Teams and users
├── tables.ndjson # Data tables
├── topics.ndjson # Kafka topics
└── export_summary.json # Export metadata
```
Each NDJSON file can be:
- Viewed and edited with any text editor
- Processed with command-line tools (jq, grep, etc.)
- Imported partially or completely
## Testing
### Unit Tests
Run the test suite:
```bash
pytest test_migration.py -v
```
### Round-Trip Validation
Test the complete export/import workflow:
```bash
# 1. Export current data products
python export.py --clear --entities data_products
# 2. Verify export succeeded
cat exports/export_summary.json
# 3. Test import functionality (creates new entities)
# Note: Import creates new entities, so use carefully in production
python import.py --input-dir exports --entity-type data_products --dry-run
# 4. Validate in OpenMetadata UI
# Check that exported entities maintain all relationships and metadata
```
## Troubleshooting
### Authentication Issues
- Verify your JWT token is valid and not expired
- Check server URL is correct and accessible
- Ensure you have proper permissions for the entities you're trying to export/import
### Export Issues
- Check OpenMetadata server connectivity
- Verify entity types are supported in your OpenMetadata version
- Review export logs for specific entity errors
### Import Issues
- Ensure NDJSON files are properly formatted
- Check import order for dependency issues
- Use `--dry-run` to preview imports before execution
### Performance
- Adjust `batch_size` in configuration for large datasets
- Use selective export for large instances
- Monitor memory usage with `memory_limit_mb` setting
## Development Commands (Makefile)
The project includes a Makefile with useful development commands:
```bash
# Setup and cleanup
make setup # Run setup.sh to create virtual environment
make clean # Clean up virtual environment and exports
make clean-exports # Clean only export files
# Testing
make test # Run all tests with pytest
make test-verbose # Run tests with verbose output
# Export shortcuts
make export # Export all entities
make export-clean # Clean exports then export all
make export-core # Export core entities (domains, data_products, teams)
# Import shortcuts
make import # Import all entities
make import-dry # Dry run import (preview only)
# Development
make lint # Run code linting (if configured)
make format # Format code (if configured)
make help # Show all available commands
```
**Usage Examples:**
```bash
# Quick setup and test
make setup
make export-core
# Clean slate export
make clean-exports
make export
# Safe import testing
make import-dry
```
## Configuration Reference
### Complete config.yaml Structure
```yaml
openmetadata:
server_url: "http://your-openmetadata-server:8585/api"
auth:
jwt_token: "your_jwt_token_here"
export:
output_dir: "./exports"
selective:
domains: []
linked_data_products_only: false
linked_assets_only: false
entities:
domains: true
data_products: true
teams: true
# ... all other entity types
include_deleted: false
batch_size: 100
import:
input_dir: "./exports"
update_existing: true
skip_on_error: true
create_missing_dependencies: true
import_order:
- teams
- users
- domains
- data_products
# ... ordered list for dependency handling
logging:
level: "INFO"
console_output: true
advanced:
request_timeout: 30
max_retries: 3
max_workers: 5
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
### Third-Party Licenses
This project uses the following open-source packages:
- **OpenMetadata SDK**: Apache 2.0 License
- **Rich**: MIT License
- **PyYAML**: MIT License
- **Click**: BSD License
- **python-dotenv**: BSD License
## Contributing
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Submit a pull request
## Support
For issues and questions:
- Check the troubleshooting section above
- Review OpenMetadata documentation
- Open an issue in this repository