{"id":48189897,"url":"https://github.com/bdgscotland/omd_migrate","last_synced_at":"2026-04-04T17:54:36.764Z","repository":{"id":301753784,"uuid":"1010171483","full_name":"bdgscotland/omd_migrate","owner":"bdgscotland","description":"Migration tool for OpenMetadata - migrate data catalogs, lineage, and metadata between OpenMetadata instances","archived":false,"fork":false,"pushed_at":"2025-06-28T16:39:04.000Z","size":73,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-28T16:47:07.805Z","etag":null,"topics":["data-catalog-migration","data-governance","metadata-migration","migration-tool","omd","openmetadata","openmetadata-migration"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bdgscotland.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-28T13:59:44.000Z","updated_at":"2025-06-28T16:36:15.000Z","dependencies_parsed_at":"2025-06-28T16:47:14.539Z","dependency_job_id":"bdace1d0-8dea-4ab1-b7ee-89e83fc0dfea","html_url":"https://github.com/bdgscotland/omd_migrate","commit_stats":null,"previous_names":["bdgscotland/omd_migrate"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bdgscotland/omd_migrate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bdgscotland%2Fomd_migrate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bdgscotland%2Fomd_migrate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bdgscotland%2Fomd_migrate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bdgscotland%2Fomd_migrate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bdgscotland","download_url":"https://codeload.github.com/bdgscotland/omd_migrate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bdgscotland%2Fomd_migrate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31407655,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-catalog-migration","data-governance","metadata-migration","migration-tool","omd","openmetadata","openmetadata-migration"],"created_at":"2026-04-04T17:54:36.181Z","updated_at":"2026-04-04T17:54:36.749Z","avatar_url":"https://github.com/bdgscotland.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenMetadata Migration Tool\n\n[![Python](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/)\n[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n[![OpenMetadata SDK](https://img.shields.io/badge/OpenMetadata%20SDK-1.8.0+-orange.svg)](https://pypi.org/project/openmetadata-ingestion/)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)\n\n[![Code Quality](https://img.shields.io/badge/code%20quality-black%20%7C%20flake8%20%7C%20mypy-blue.svg)](https://github.com/psf/black)\n[![CI/CD](https://img.shields.io/badge/CI%2FCD-GitHub%20Actions-green.svg)](.github/workflows/ci.yml)\n[![Security](https://img.shields.io/badge/security-bandit%20%7C%20trivy-red.svg)](https://github.com/PyCQA/bandit)\n[![Testing](https://img.shields.io/badge/testing-pytest%20%7C%20coverage-blue.svg)](test_migration.py)\n\nA flexible, customizable Python tool for exporting and importing OpenMetadata entities. Supports full backups, selective exports, and cross-instance migrations with clear NDJSON output format.\n\n## Features\n\n- **OpenMetadata SDK Integration**: Uses official OpenMetadata Python SDK for robust API interaction\n- **Full Export/Import**: Backup and restore complete OpenMetadata instances  \n- **Selective Export**: Export specific entity types with `--entities` flag\n- **Round-Trip Tested**: Verified export → import → validation workflow with real data\n- **Relationship-Aware**: Maintains links between domains, data products, and assets\n- **Flexible Configuration**: YAML config with environment variable overrides\n- **Rich Console Output**: Beautiful progress indicators and informative logging\n- **NDJSON Format**: Human-readable, editable export format\n- **Version Flexible**: Configurable OpenMetadata SDK version support (defaults to 1.8.0+)\n\n## Quick Start\n\n### 1. Installation\n\n**Option A: Automated Setup (Recommended)**\n```bash\ngit clone \u003crepository\u003e\ncd omd_migrate\n./setup.sh\n```\n\n**Option B: Manual Installation**\n```bash\ngit clone \u003crepository\u003e\ncd omd_migrate\npython3 -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\npip install -r requirements.txt\n```\n\n**Note**: The setup.sh script creates a virtual environment (`omd_venv`) and installs all dependencies automatically.\n\n### 2. Configuration\n\nThe tool uses both `config.yaml` and `.env` files for configuration:\n\n**Option A: Use .env file (recommended for credentials)**\n```bash\ncp .env.example .env\n# Edit .env with your OpenMetadata server details\n```\n\n**Option B: Edit config.yaml directly**\n```bash\n# Edit config.yaml with your server URL and JWT token\n```\n\n### 3. Export Data\n\n```bash\n# Export all entities (based on config.yaml settings)\npython export.py\n\n# Selective export of specific entity types\npython export.py --entities data_products --entities domains\n\n# Clear previous exports before starting\npython export.py --clear\n\n# Export to custom directory\npython export.py --output-dir /path/to/backup\n\n# Combine options for targeted exports\npython export.py --clear --entities data_products --entities domains --output-dir /backup/domains-only\n```\n\n### 4. Import Data\n\n```bash\n# Import all entities\npython import.py\n\n# Import from custom directory\npython import.py --input-dir /path/to/backup\n\n# Import specific entity type only\npython import.py --entity-type domains\n\n# Dry run (see what would be imported)\npython import.py --dry-run\n```\n\n## Configuration\n\n### Environment Variables (.env)\n\n```bash\n# Server Configuration\nOPENMETADATA_SERVER_URL=http://your-openmetadata-server:8585/api\nOPENMETADATA_JWT_TOKEN=your_jwt_token_here\n\n# Export Configuration\nEXPORT_OUTPUT_DIR=./exports\nEXPORT_BATCH_SIZE=100\nEXPORT_INCLUDE_DELETED=false\n\n# Import Configuration\nIMPORT_INPUT_DIR=./exports\nIMPORT_UPDATE_EXISTING=true\nIMPORT_SKIP_ON_ERROR=true\n\n# Logging\nLOG_LEVEL=INFO\n```\n\n### Selective Export\n\nConfigure selective exports in `config.yaml`:\n\n```yaml\nexport:\n  selective:\n    # Export specific domains by name\n    domains: [\"Finance\", \"Marketing\"]\n    \n    # Only export data products linked to specified domains\n    linked_data_products_only: true\n    \n    # Only export assets (tables, topics, etc.) linked to domains/data products\n    linked_assets_only: true\n```\n\n### Entity Types\n\nSupported entities for export (use with `--entities` flag):\n\n**Core Entities:**\n- `domains` - Business domains and subdomains\n- `data_products` - Data products with domain relationships  \n- `teams` - Teams and users\n- `users` - Individual users\n- `policies` - Access policies\n\n**Knowledge Management:**\n- `glossaries` - Business glossaries\n- `glossary_terms` - Glossary terms\n\n**Data Assets:**\n- `databases` - Database services and databases\n- `database_schemas` - Database schemas  \n- `tables` - Data tables with lineage\n\n**Additional Entity Types** (available via config.yaml):\n- `topics` - Kafka topics and streams\n- `dashboards` - BI dashboards\n- `charts` - Dashboard charts\n- `pipelines` - Data pipelines\n- `ml_models` - Machine learning models\n- `containers` - Data containers\n- `stored_procedures` - Database procedures\n- `dashboard_data_models` - Dashboard data models\n- `search_indexes` - Search indexes\n\nExample usage:\n```bash\n# Export core entities only\npython export.py --entities domains --entities data_products --entities teams\n\n# Export data assets\npython export.py --entities databases --entities tables\n```\n\n## Examples\n\n### Full Backup and Restore\n\n```bash\n# 1. Export everything from source instance\npython export.py --config source-config.yaml --output-dir backup-2024-01-15\n\n# 2. Import to target instance\npython import.py --config target-config.yaml --input-dir backup-2024-01-15\n```\n\n### Selective Entity Export\n\nUse command-line flags for targeted exports:\n\n```bash\n# Export only domains and data products\npython export.py --clear --entities domains --entities data_products\n\n# Export specific entities to custom location  \npython export.py --entities users --entities teams --output-dir /backup/identity\n\n# Clear and export tables only\npython export.py --clear --entities tables\n```\n\n### Domain-Specific Migration\n\nConfigure selective export in `config.yaml`:\n```yaml\nexport:\n  selective:\n    domains: [\"Data Science\", \"Analytics\"]\n    linked_data_products_only: true\n    linked_assets_only: true\n```\n\nThen export and import:\n```bash\npython export.py  # Exports only Data Science and Analytics domains + linked entities\npython import.py --config target-config.yaml\n```\n\n### Cross-Instance Migration\n\n```bash\n# Export from production\nOPENMETADATA_SERVER_URL=https://prod.your-company.com python export.py\n\n# Import to staging  \nOPENMETADATA_SERVER_URL=https://staging.your-company.com python import.py\n```\n\n## Output Format\n\nExports are saved as NDJSON files (one JSON object per line):\n\n```\nexports/\n├── domains.ndjson              # Business domains\n├── data_products.ndjson        # Data products\n├── teams.ndjson                # Teams and users\n├── tables.ndjson               # Data tables\n├── topics.ndjson               # Kafka topics\n└── export_summary.json         # Export metadata\n```\n\nEach NDJSON file can be:\n- Viewed and edited with any text editor\n- Processed with command-line tools (jq, grep, etc.)\n- Imported partially or completely\n\n## Testing\n\n### Unit Tests\n\nRun the test suite:\n\n```bash\npytest test_migration.py -v\n```\n\n### Round-Trip Validation\n\nTest the complete export/import workflow:\n\n```bash\n# 1. Export current data products\npython export.py --clear --entities data_products\n\n# 2. Verify export succeeded\ncat exports/export_summary.json\n\n# 3. Test import functionality (creates new entities)\n# Note: Import creates new entities, so use carefully in production\npython import.py --input-dir exports --entity-type data_products --dry-run\n\n# 4. Validate in OpenMetadata UI\n# Check that exported entities maintain all relationships and metadata\n```\n\n## Troubleshooting\n\n### Authentication Issues\n- Verify your JWT token is valid and not expired\n- Check server URL is correct and accessible\n- Ensure you have proper permissions for the entities you're trying to export/import\n\n### Export Issues\n- Check OpenMetadata server connectivity\n- Verify entity types are supported in your OpenMetadata version\n- Review export logs for specific entity errors\n\n### Import Issues\n- Ensure NDJSON files are properly formatted\n- Check import order for dependency issues\n- Use `--dry-run` to preview imports before execution\n\n### Performance\n- Adjust `batch_size` in configuration for large datasets\n- Use selective export for large instances\n- Monitor memory usage with `memory_limit_mb` setting\n\n## Development Commands (Makefile)\n\nThe project includes a Makefile with useful development commands:\n\n```bash\n# Setup and cleanup\nmake setup          # Run setup.sh to create virtual environment\nmake clean          # Clean up virtual environment and exports\nmake clean-exports  # Clean only export files\n\n# Testing\nmake test           # Run all tests with pytest\nmake test-verbose   # Run tests with verbose output\n\n# Export shortcuts\nmake export         # Export all entities\nmake export-clean   # Clean exports then export all\nmake export-core    # Export core entities (domains, data_products, teams)\n\n# Import shortcuts  \nmake import         # Import all entities\nmake import-dry     # Dry run import (preview only)\n\n# Development\nmake lint           # Run code linting (if configured)\nmake format         # Format code (if configured)\nmake help           # Show all available commands\n```\n\n**Usage Examples:**\n```bash\n# Quick setup and test\nmake setup\nmake export-core\n\n# Clean slate export\nmake clean-exports\nmake export\n\n# Safe import testing\nmake import-dry\n```\n\n## Configuration Reference\n\n### Complete config.yaml Structure\n\n```yaml\nopenmetadata:\n  server_url: \"http://your-openmetadata-server:8585/api\"\n  auth:\n    jwt_token: \"your_jwt_token_here\"\n\nexport:\n  output_dir: \"./exports\"\n  selective:\n    domains: []\n    linked_data_products_only: false\n    linked_assets_only: false\n  entities:\n    domains: true\n    data_products: true\n    teams: true\n    # ... all other entity types\n  include_deleted: false\n  batch_size: 100\n\nimport:\n  input_dir: \"./exports\"\n  update_existing: true\n  skip_on_error: true\n  create_missing_dependencies: true\n  import_order:\n    - teams\n    - users\n    - domains\n    - data_products\n    # ... ordered list for dependency handling\n\nlogging:\n  level: \"INFO\"\n  console_output: true\n\nadvanced:\n  request_timeout: 30\n  max_retries: 3\n  max_workers: 5\n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n### Third-Party Licenses\n\nThis project uses the following open-source packages:\n- **OpenMetadata SDK**: Apache 2.0 License\n- **Rich**: MIT License  \n- **PyYAML**: MIT License\n- **Click**: BSD License\n- **python-dotenv**: BSD License\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Add tests for new functionality\n4. Submit a pull request\n\n## Support\n\nFor issues and questions:\n- Check the troubleshooting section above\n- Review OpenMetadata documentation\n- Open an issue in this repository","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbdgscotland%2Fomd_migrate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbdgscotland%2Fomd_migrate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbdgscotland%2Fomd_migrate/lists"}