{"id":51003459,"url":"https://github.com/erikhoward/atlas","last_synced_at":"2026-06-20T17:32:10.485Z","repository":{"id":322517051,"uuid":"1089732367","full_name":"erikhoward/atlas","owner":"erikhoward","description":"ETL tool for exporting OpenEHR compositions to multiple datastore backends","archived":false,"fork":false,"pushed_at":"2025-11-24T00:24:11.000Z","size":1724,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-24T02:51:34.631Z","etag":null,"topics":["azure","cosmosdb","etl","healthcare","openehr","postgresql","rust","rust-lang"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/erikhoward.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-04T18:29:18.000Z","updated_at":"2025-11-24T00:24:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"9e2ad948-d4c6-4fc5-b925-23b38e96876e","html_url":"https://github.com/erikhoward/atlas","commit_stats":null,"previous_names":["erikhoward/atlas"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/erikhoward/atlas","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikhoward%2Fatlas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikhoward%2Fatlas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikhoward%2Fatlas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikhoward%2Fatlas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/erikhoward","download_url":"https://codeload.github.com/erikhoward/atlas/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erikhoward%2Fatlas/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34580039,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-20T02:00:06.407Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","cosmosdb","etl","healthcare","openehr","postgresql","rust","rust-lang"],"created_at":"2026-06-20T17:32:09.803Z","updated_at":"2026-06-20T17:32:10.475Z","avatar_url":"https://github.com/erikhoward.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"atlas-cover-image.png\" alt=\"Atlas Logo\" /\u003e\r\n\r\n# Atlas\r\n\r\n[![Build Status](https://img.shields.io/github/actions/workflow/status/erikhoward/atlas/ci.yml?branch=main)](https://github.com/erikhoward/atlas/actions)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\r\n[![Rust Version](https://img.shields.io/badge/rust-1.70%2B-orange.svg)](https://www.rust-lang.org/)\r\n[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg)](docs/)\r\n\r\n**Atlas** is a high-performance, open-source ETL tool built in Rust that bridges openEHR clinical data repositories with modern analytics platforms. It enables healthcare organizations to seamlessly export openEHR compositions to Azure Cosmos DB or PostgreSQL for advanced analytics, machine learning, and research.\r\n\r\n## 🎯 Overview\r\n\r\nAtlas solves the challenge of making openEHR clinical data accessible for modern analytics workflows. By exporting compositions from openEHR servers (EHRBase, Better Platform) to your choice of database backend (Azure Cosmos DB or PostgreSQL), Atlas enables:\r\n\r\n- **Clinical Research**: Query patient data using familiar SQL instead of AQL\r\n- **Machine Learning**: Build ML models on flattened, analytics-ready data\r\n- **Operational Analytics**: Power dashboards and reports with Azure-native tools\r\n- **Regulatory Reporting**: Maintain audit trails with data verification\r\n- **Data Integration**: Connect openEHR data to Azure Synapse, Databricks, and Power BI\r\n\r\n## ✨ Key Features\r\n\r\n### Core Capabilities\r\n\r\n- **🚀 High Performance**: Built with Rust for async/concurrent processing\r\n  - Batch processing with configurable sizes (100-5000 compositions)\r\n  - Parallel EHR processing (1-100 concurrent EHRs)\r\n  - Throughput: 1000-2000 compositions/minute\r\n\r\n- **🔄 Incremental Sync**: Smart state management with watermarks\r\n  - Track last export per {template_id, ehr_id} combination\r\n  - Export only new/changed data since last run\r\n  - Automatic checkpoint and resume from failures\r\n\r\n- **🎨 Flexible Transformation**: Multiple composition formats\r\n  - **Preserve Mode**: Maintain exact FLAT JSON structure from openEHR\r\n  - **Flatten Mode**: Convert nested paths to flat field names for ML/analytics\r\n\r\n- **⚙️ Easy Configuration**: TOML-based with environment variable support\r\n  - Simple, human-readable configuration files\r\n  - Secure credential management with env vars\r\n  - Comprehensive validation and error messages\r\n\r\n- **🛡️ Reliable \u0026 Resilient**: Production-ready error handling\r\n  - Automatic retry with exponential backoff\r\n  - Partial batch failure handling\r\n  - Duplicate detection and skipping\r\n  - **Graceful shutdown** with SIGTERM/SIGINT handling\r\n  - Automatic checkpoint on interruption for safe resume\r\n\r\n- **📊 Database Flexibility**: Multiple backend options\r\n  - **Azure Cosmos DB**: Core (SQL) API with automatic partitioning\r\n  - **PostgreSQL**: 14+ with JSONB support for flexible querying\r\n  - Azure Log Analytics integration (Logs Ingestion API)\r\n  - Kubernetes/AKS deployment support\r\n\r\n- **🔒 Privacy \u0026 Compliance**: Built-in anonymization\r\n  - **Automated PII Detection**: Regex-based detection of 24+ PII categories\r\n  - **HIPAA Safe Harbor**: 18 identifiers per 45 CFR §164.514(b)(2)\r\n  - **GDPR Compliance**: HIPAA identifiers + GDPR quasi-identifiers\r\n  - **Flexible Strategies**: Redaction or tokenization\r\n  - **Dry-Run Mode**: Preview PII detection without modifying data\r\n  - **Audit Logging**: SHA-256 hashed values, comprehensive tracking\r\n  - **Zero Performance Impact**: \u003c100ms overhead, \u003c15% throughput impact\r\n\r\n### Technical Highlights\r\n\r\n- **Vendor Abstraction**: Trait-based design supports multiple openEHR vendors (EHRBase, Better Platform)\r\n- **Type Safety**: Strongly-typed domain models with Rust's type system\r\n- **Observability**: Structured logging with tracing, Azure integration\r\n- **Security**: TLS 1.2+, credential management, least-privilege access\r\n- **Compliance**: HIPAA-ready, GDPR-ready, audit logging, data verification\r\n\r\n## 🚀 Quick Start\r\n\r\n### Prerequisites\r\n\r\n- **Rust 1.70+** (for building from source)\r\n- **openEHR Server** (choose one):\r\n  - **EHRBase**: Version 0.30+ with REST API v1.1.x\r\n  - **Better Platform**: Sandbox or production environment with OIDC authentication\r\n- **Database Backend** (choose one):\r\n  - **Azure Cosmos DB**: Core (SQL) API account with database created\r\n  - **PostgreSQL**: Version 14+ with database created\r\n- **Network Access**: Outbound HTTPS to openEHR server and database\r\n\r\n### Installation\r\n\r\n#### Option 1: Pre-built Binary (Recommended)\r\n\r\n```bash\r\n# Download latest release\r\nwget https://github.com/erikhoward/atlas/releases/download/v2.4.0/atlas-linux-x86_64.tar.gz\r\n\r\n# Extract and install\r\ntar -xzf atlas-linux-x86_64.tar.gz\r\nsudo mv atlas /usr/local/bin/\r\nsudo chmod +x /usr/local/bin/atlas\r\n\r\n# Verify installation\r\natlas --version\r\n```\r\n\r\n#### Option 2: Build from Source\r\n\r\n```bash\r\n# Install Rust (if not already installed)\r\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\r\n\r\n# Clone the repository\r\ngit clone https://github.com/erikhoward/atlas.git\r\ncd atlas\r\n\r\n# Build release binary\r\ncargo build --release\r\n\r\n# Install binary\r\nsudo cp target/release/atlas /usr/local/bin/\r\n\r\n# Verify installation\r\natlas --version\r\n```\r\n\r\n#### Option 3: Docker (Recommended for Production)\r\n\r\n```bash\r\n# Pull the latest Docker image\r\ndocker pull erikhoward/atlas:latest\r\n\r\n# Run Atlas with configuration file\r\ndocker run --rm \\\r\n  -v $(pwd)/atlas.toml:/app/config/atlas.toml \\\r\n  -e ATLAS_OPENEHR_USERNAME=your_username \\\r\n  -e ATLAS_OPENEHR_PASSWORD=your_password \\\r\n  -e ATLAS_COSMOSDB_KEY=your_cosmos_key \\\r\n  erikhoward/atlas:latest \\\r\n  export --config /app/config/atlas.toml\r\n\r\n# Or use docker-compose (see docker-compose.yml example)\r\ndocker-compose up\r\n```\r\n\r\n**Docker Benefits:**\r\n\r\n- ✅ No Rust installation required\r\n- ✅ Consistent environment across deployments\r\n- ✅ Easy integration with Kubernetes/AKS\r\n- ✅ Multi-platform support (amd64, arm64)\r\n\r\nSee [Docker Setup Guide](docs/docker-setup.md) for detailed instructions.\r\n\r\n### Configuration\r\n\r\n```bash\r\n# Generate sample configuration with examples\r\natlas init --with-examples --output atlas.toml\r\n\r\n# Edit configuration for your environment\r\nvi atlas.toml\r\n\r\n# Option 1: Use .env file (recommended for development)\r\n# Create a .env file in the project root with your credentials\r\ncat \u003e .env \u003c\u003c EOF\r\nATLAS_OPENEHR_USERNAME=your-openehr-username\r\nATLAS_OPENEHR_PASSWORD=your-openehr-password\r\nATLAS_PG_PASSWORD=your-postgres-password\r\nEOF\r\n\r\n# The .env file is automatically loaded when Atlas starts\r\n\r\n# Option 2: Set environment variables manually\r\nexport ATLAS_OPENEHR_USERNAME=\"your-openehr-username\"\r\nexport ATLAS_OPENEHR_PASSWORD=\"your-openehr-password\"\r\n\r\n# For CosmosDB\r\nexport ATLAS_COSMOSDB_KEY=\"your-cosmos-db-key\"\r\n\r\n# For PostgreSQL\r\nexport ATLAS_PG_PASSWORD=\"your-postgres-password\"\r\n\r\n# Validate configuration\r\natlas validate-config -c atlas.toml\r\n```\r\n\r\n**Minimal Configuration Example (CosmosDB)**:\r\n\r\n```toml\r\n[openehr]\r\nbase_url = \"https://your-ehrbase-server.com/ehrbase\"\r\nusername = \"${ATLAS_OPENEHR_USERNAME}\"\r\npassword = \"${ATLAS_OPENEHR_PASSWORD}\"\r\ntls_verify = true\r\n\r\n[openehr.query]\r\ntemplate_ids = [\"IDCR - Vital Signs.v1\"]\r\n\r\n[export]\r\nmode = \"incremental\"\r\nexport_composition_format = \"preserve\"\r\ndatabase_target = \"cosmosdb\"\r\n\r\n[cosmosdb]\r\nendpoint = \"https://your-account.documents.azure.com:443/\"\r\nkey = \"${ATLAS_COSMOSDB_KEY}\"\r\ndatabase_name = \"openehr_data\"\r\n```\r\n\r\n**Minimal Configuration Example (PostgreSQL)**:\r\n\r\n```toml\r\n[openehr]\r\nbase_url = \"https://your-ehrbase-server.com/ehrbase\"\r\nusername = \"${ATLAS_OPENEHR_USERNAME}\"\r\npassword = \"${ATLAS_OPENEHR_PASSWORD}\"\r\ntls_verify = true\r\n\r\n[openehr.query]\r\ntemplate_ids = [\"IDCR - Vital Signs.v1\"]\r\n\r\n[export]\r\nmode = \"incremental\"\r\nexport_composition_format = \"preserve\"\r\ndatabase_target = \"postgresql\"\r\n\r\n[postgresql]\r\nconnection_string = \"postgresql://atlas_user:${ATLAS_PG_PASSWORD}@localhost:5432/openehr_data?sslmode=require\"\r\nmax_connections = 20\r\n```\r\n\r\nSee `examples/atlas.example.toml` for CosmosDB configuration and `examples/atlas.postgresql.example.toml` for PostgreSQL configuration.\r\n\r\n#### 12-Factor App Configuration\r\n\r\nAtlas supports comprehensive environment variable overrides for all configuration options, enabling containerized deployments and 12-factor app compliance:\r\n\r\n```bash\r\n# Override any configuration value using ATLAS_\u003cSECTION\u003e_\u003cKEY\u003e pattern\r\nexport ATLAS_DATABASE_TARGET=postgresql\r\nexport ATLAS_APPLICATION_LOG_LEVEL=debug\r\nexport ATLAS_OPENEHR_BASE_URL=https://prod-ehrbase.com\r\nexport ATLAS_OPENEHR_USERNAME=atlas_prod\r\nexport ATLAS_OPENEHR_PASSWORD=secret\r\nexport ATLAS_OPENEHR_QUERY_BATCH_SIZE=2000\r\nexport ATLAS_EXPORT_MODE=incremental\r\nexport ATLAS_POSTGRESQL_CONNECTION_STRING=\"postgresql://user:pass@postgres:5432/db\"\r\nexport ATLAS_POSTGRESQL_MAX_CONNECTIONS=20\r\n\r\n# Arrays support JSON or comma-separated format\r\nexport ATLAS_OPENEHR_QUERY_TEMPLATE_IDS='[\"IDCR - Vital Signs.v1\",\"IDCR - Lab Report.v1\"]'\r\nexport ATLAS_OPENEHR_QUERY_EHR_IDS=\"ehr-123,ehr-456,ehr-789\"\r\n\r\n# Run with minimal TOML file (or even no TOML file with all env vars set)\r\natlas export -c minimal.toml\r\n```\r\n\r\n**Docker Example**:\r\n\r\n```bash\r\ndocker run -d \\\r\n  -e ATLAS_DATABASE_TARGET=postgresql \\\r\n  -e ATLAS_OPENEHR_BASE_URL=https://ehrbase.example.com \\\r\n  -e ATLAS_OPENEHR_USERNAME=atlas \\\r\n  -e ATLAS_OPENEHR_PASSWORD=\"${OPENEHR_PASSWORD}\" \\\r\n  -e ATLAS_OPENEHR_QUERY_TEMPLATE_IDS='[\"IDCR - Vital Signs.v1\"]' \\\r\n  -e ATLAS_POSTGRESQL_CONNECTION_STRING=\"${PG_CONNECTION_STRING}\" \\\r\n  -e ATLAS_EXPORT_MODE=incremental \\\r\n  atlas:latest\r\n```\r\n\r\n**Kubernetes Example**:\r\n\r\n```yaml\r\napiVersion: v1\r\nkind: ConfigMap\r\nmetadata:\r\n  name: atlas-config\r\ndata:\r\n  ATLAS_DATABASE_TARGET: \"postgresql\"\r\n  ATLAS_OPENEHR_BASE_URL: \"https://ehrbase.example.com\"\r\n  ATLAS_OPENEHR_QUERY_TEMPLATE_IDS: '[\"IDCR - Vital Signs.v1\"]'\r\n  ATLAS_EXPORT_MODE: \"incremental\"\r\n---\r\napiVersion: v1\r\nkind: Secret\r\nmetadata:\r\n  name: atlas-secrets\r\ntype: Opaque\r\nstringData:\r\n  ATLAS_OPENEHR_PASSWORD: \"secret\"\r\n  ATLAS_POSTGRESQL_CONNECTION_STRING: \"postgresql://user:pass@postgres:5432/db\"\r\n---\r\napiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n  name: atlas\r\nspec:\r\n  template:\r\n    spec:\r\n      containers:\r\n      - name: atlas\r\n        image: atlas:latest\r\n        envFrom:\r\n        - configMapRef:\r\n            name: atlas-config\r\n        - secretRef:\r\n            name: atlas-secrets\r\n```\r\n\r\nSee [Configuration Guide](docs/configuration.md#environment-variable-overrides) for complete list of supported environment variables.\r\n\r\n### Basic Usage\r\n\r\n```bash\r\n# Run export\r\natlas export -c atlas.toml\r\n\r\n# Dry run to preview (no data written)\r\natlas export -c atlas.toml --dry-run\r\n\r\n# Check export status and watermarks\r\natlas status -c atlas.toml\r\n\r\n# Override configuration options\r\natlas export -c atlas.toml --mode full --template-id \"Your Template.v1\"\r\n```\r\n\r\n### Graceful Shutdown\r\n\r\nAtlas supports graceful shutdown for long-running exports, ensuring data integrity and allowing safe resumption:\r\n\r\n```bash\r\n# Start an export\r\natlas export -c atlas.toml\r\n\r\n# Press Ctrl+C or send SIGTERM to gracefully stop\r\n# Atlas will:\r\n# 1. Complete the current batch being processed\r\n# 2. Save watermark state to database\r\n# 3. Display progress summary\r\n# 4. Exit with code 130 (SIGINT) or 143 (SIGTERM)\r\n\r\n# Resume from where it left off\r\natlas export -c atlas.toml\r\n```\r\n\r\n**Key Features:**\r\n\r\n- ✅ **Safe Interruption**: Current batch completes before shutdown (no partial data)\r\n- ✅ **Automatic Checkpoint**: Watermarks saved with `Interrupted` status\r\n- ✅ **Resume Support**: Re-run the same command to continue from checkpoint\r\n- ✅ **Configurable Timeout**: Default 30s grace period (configurable via `export.shutdown_timeout_secs`)\r\n- ✅ **Container-Ready**: Works with Docker stop, Kubernetes pod termination, systemd\r\n\r\n**Configuration:**\r\n\r\n```toml\r\n[export]\r\n# Graceful shutdown timeout in seconds (default: 30)\r\n# Should align with container orchestration grace periods\r\nshutdown_timeout_secs = 30\r\n```\r\n\r\n**Exit Codes:**\r\n\r\n- `0` - Export completed successfully\r\n- `1` - Partial success (some exports failed)\r\n- `130` - Interrupted by SIGINT (Ctrl+C)\r\n- `143` - Interrupted by SIGTERM (graceful termination signal)\r\n- Other codes indicate configuration, authentication, or connection errors\r\n\r\n### Example Use Cases\r\n\r\nSee the [`examples/`](examples/) directory for complete configurations:\r\n\r\n- **[Clinical Research](examples/research-export.toml)**: Full export with data verification\r\n- **[Daily Sync](examples/incremental-sync.toml)**: Incremental sync for production\r\n- **[ML Features](examples/ml-features.toml)**: Flattened data for machine learning\r\n\r\n## 🔒 Anonymization\r\n\r\nAtlas includes built-in anonymization capabilities to protect PHI/PII when exporting openEHR compositions, helping organizations comply with HIPAA and GDPR regulations.\r\n\r\n### Quick Start\r\n\r\nAdd anonymization configuration to your `atlas.toml`:\r\n\r\n```toml\r\n[anonymization]\r\nenabled = true\r\nmode = \"hipaa_safe_harbor\"  # or \"gdpr\"\r\nstrategy = \"token\"          # or \"redact\"\r\ndry_run = false\r\n\r\n[anonymization.audit]\r\nenabled = true\r\nlog_path = \"./audit/anonymization.log\"\r\njson_format = true\r\n```\r\n\r\nRun export with anonymization:\r\n\r\n```bash\r\n# Enable anonymization\r\natlas export --anonymize\r\n\r\n# Override compliance mode\r\natlas export --anonymize --anonymize-mode gdpr\r\n\r\n# Dry-run to preview PII detection\r\natlas export --anonymize --anonymize-dry-run\r\n```\r\n\r\n### Features\r\n\r\n- **Automated PII Detection**: Regex-based detection of 24+ PII categories\r\n- **HIPAA Safe Harbor**: 18 identifiers per 45 CFR §164.514(b)(2)\r\n- **GDPR Compliance**: HIPAA identifiers + 6 GDPR quasi-identifiers\r\n- **Flexible Strategies**:\r\n  - **Token**: Replace with unique random tokens (e.g., `TOKEN_NAME_a1b2c3d4`)\r\n  - **Redact**: Replace with category markers (e.g., `[REDACTED_NAME]`)\r\n- **Dry-Run Mode**: Preview PII detection without modifying data\r\n- **Audit Logging**: SHA-256 hashed values, comprehensive tracking\r\n- **Performance**: \u003c100ms overhead per composition, \u003c15% throughput impact\r\n\r\n### Compliance Modes\r\n\r\n**HIPAA Safe Harbor** (`hipaa_safe_harbor`):\r\n\r\n- Detects 18 identifiers specified in 45 CFR §164.514(b)(2)\r\n- Suitable for US healthcare organizations\r\n\r\n**GDPR** (`gdpr`):\r\n\r\n- Detects all HIPAA identifiers + 6 GDPR quasi-identifiers\r\n- Suitable for European organizations or multi-region deployments\r\n\r\n### Documentation\r\n\r\nFor complete anonymization documentation, see:\r\n\r\n- **[Anonymization User Guide](docs/anonymization-user-guide.md)** - Comprehensive usage guide\r\n\r\n## 🐳 Docker Deployment\r\n\r\nAtlas provides official Docker images for easy deployment and integration with container orchestration platforms.\r\n\r\n### Quick Start with Docker\r\n\r\n```bash\r\n# Pull the latest image\r\ndocker pull erikhoward/atlas:latest\r\n\r\n# Run with configuration file and environment variables\r\ndocker run --rm \\\r\n  -v $(pwd)/atlas.toml:/app/config/atlas.toml \\\r\n  -v $(pwd)/logs:/app/logs \\\r\n  -e ATLAS_OPENEHR_USERNAME=${OPENEHR_USER} \\\r\n  -e ATLAS_OPENEHR_PASSWORD=${OPENEHR_PASS} \\\r\n  -e ATLAS_COSMOSDB_KEY=${COSMOS_KEY} \\\r\n  erikhoward/atlas:latest \\\r\n  export --config /app/config/atlas.toml\r\n```\r\n\r\n### Using Docker Compose\r\n\r\nCreate a `docker-compose.yml` file:\r\n\r\n```yaml\r\nversion: '3.8'\r\n\r\nservices:\r\n  atlas:\r\n    image: erikhoward/atlas:latest\r\n    volumes:\r\n      - ./atlas.toml:/app/config/atlas.toml\r\n      - ./logs:/app/logs\r\n    environment:\r\n      - ATLAS_OPENEHR_USERNAME=${OPENEHR_USER}\r\n      - ATLAS_OPENEHR_PASSWORD=${OPENEHR_PASS}\r\n      - ATLAS_COSMOSDB_KEY=${COSMOS_KEY}\r\n      - RUST_LOG=info\r\n    command: export --config /app/config/atlas.toml\r\n```\r\n\r\nRun with:\r\n\r\n```bash\r\ndocker-compose up\r\n```\r\n\r\n### Available Tags\r\n\r\n- `latest` - Latest stable release from main branch\r\n- `2.4.0`, `2.3`, `2` - Semantic version tags\r\n- `main-\u003csha\u003e` - Specific commit from main branch\r\n\r\n### Multi-Platform Support\r\n\r\nImages are built for multiple architectures:\r\n\r\n- `linux/amd64` - Standard x86_64 servers\r\n- `linux/arm64` - ARM64 (Apple Silicon, AWS Graviton, etc.)\r\n\r\n### Building Custom Images\r\n\r\n```bash\r\n# Build locally\r\ndocker build -t atlas:custom .\r\n\r\n# Build for specific platform\r\ndocker build --platform linux/amd64 -t atlas:custom .\r\n```\r\n\r\nFor detailed Docker setup, configuration, and troubleshooting, see the **[Docker Setup Guide](docs/docker-setup.md)**.\r\n\r\n## 📖 Documentation\r\n\r\n### User Documentation\r\n\r\n- **[User Guide](docs/user-guide.md)** - Complete usage instructions, troubleshooting, and best practices\r\n- **[Configuration Guide](docs/configuration.md)** - Detailed configuration reference with all options\r\n- **[Example Configurations](examples/)** - Ready-to-use configs for common scenarios\r\n\r\n### Technical Documentation\r\n\r\n- **[Architecture Documentation](docs/architecture.md)** - System design, components, and data flow\r\n- **[Developer Guide](docs/developer-guide.md)** - Development setup and contribution guidelines\r\n\r\n### Deployment Guides\r\n\r\n- **[Standalone Deployment](docs/deployment/standalone.md)** - Binary deployment on Linux/macOS/Windows\r\n- **[Docker Deployment](docs/deployment/docker.md)** - Containerized deployment\r\n- **[Kubernetes Deployment](docs/deployment/kubernetes.md)** - AKS and Kubernetes deployment\r\n\r\n## 🏗️ Architecture\r\n\r\nAtlas follows a layered architecture with clear separation of concerns:\r\n\r\n```text\r\n┌─────────────────────────────────────────────────────────────────────────┐\r\n│                            Atlas CLI                                    │\r\n│                         (Rust Binary)                                   │\r\n└──────────────┬──────────────────────────────────────┬───────────────────┘\r\n               │                                      │\r\n               │ REST API v1.1                        │ Database Adapters\r\n               │                                      │\r\n               ▼                                      ▼\r\n┌──────────────────────────┐   ┌──────────────────────────────────────────┐\r\n│   openEHR Server         │   │         Database Backends                │\r\n│   (EHRBase 0.30+)        │   │                                          │\r\n│                          │   │  ┌────────────────────────────────────┐  │\r\n│  ┌────────────────────┐  │   │  │  Azure Cosmos DB (NoSQL)           │  │\r\n│  │  Compositions      │  │   │  │  - Control Container (watermarks)  │  │\r\n│  │  (FLAT JSON)       │  │   │  │  - Data Containers (per template)  │  │\r\n│  └────────────────────┘  │   │  │  - Partitioned by /ehr_id          │  │\r\n│                          │   │  └────────────────────────────────────┘  │\r\n└──────────────────────────┘   │                                          │\r\n                               │  ┌────────────────────────────────────┐  │\r\n                               │  │  PostgreSQL 14+ (Relational)       │  │\r\n                               │  │  - atlas_watermarks table          │  │\r\n                               │  │  - compositions_* tables           │  │\r\n                               │  │  - JSONB columns for flexibility   │  │\r\n                               │  └────────────────────────────────────┘  │\r\n                               └──────────────────────────────────────────┘\r\n```\r\n\r\n**Key Components**:\r\n\r\n- **CLI Layer**: Command-line interface with clap\r\n- **Core Layer**: Business logic (export, transform, state, verification)\r\n- **Adapter Layer**: External integrations (openEHR, Cosmos DB, PostgreSQL)\r\n- **Domain Layer**: Core types and models\r\n\r\nSee [Architecture Documentation](docs/architecture.md) for details.\r\n\r\n## 🎯 Use Cases\r\n\r\n### Clinical Research\r\n\r\nExport patient cohorts for research studies while preserving exact data structures for regulatory compliance.\r\n\r\n### Machine Learning\r\n\r\nFlatten compositions into analytics-ready format for training predictive models on clinical data.\r\n\r\n### Operational Analytics\r\n\r\nPower real-time dashboards and reports by syncing openEHR data to Cosmos DB daily.\r\n\r\n### Data Integration\r\n\r\nConnect openEHR data to Azure Synapse Analytics, Databricks, or Power BI for advanced analytics.\r\n\r\n### Regulatory Reporting\r\n\r\nMaintain comprehensive audit trails and logging for compliance requirements.\r\n\r\n## 🔧 Configuration Options\r\n\r\nAtlas supports extensive configuration options:\r\n\r\n| Category | Options | Description |\r\n|----------|---------|-------------|\r\n| **Export Mode** | `full`, `incremental` | Full export or incremental sync |\r\n| **Format** | `preserve`, `flatten` | Maintain structure or flatten for analytics |\r\n| **Batch Size** | 100-5000 | Compositions per batch |\r\n| **Parallelism** | 1-100 EHRs | Concurrent EHR processing |\r\n| **Logging** | Local, Azure Log Analytics | Structured logging options |\r\n\r\nSee [Configuration Guide](docs/configuration.md) for complete reference.\r\n\r\n## 📊 Performance\r\n\r\n**Typical Performance** (depends on composition size and network):\r\n\r\n- **Throughput**: 1000-2000 compositions/minute\r\n- **Memory**: 2-4 GB RAM (configurable with batch size)\r\n- **Cosmos DB**: ~10 RU per composition write\r\n\r\n**Example Scenarios**:\r\n\r\n- **Daily Sync**: 1,000 compositions in ~1-2 minutes\r\n- **Research Export**: 50,000 compositions in ~50-100 minutes\r\n- **ML Dataset**: 500,000 compositions in ~4-8 hours\r\n\r\n## 🔒 Security\r\n\r\nAtlas implements comprehensive security measures to protect sensitive healthcare data and credentials:\r\n\r\n### Credential Protection\r\n\r\n- **Memory Security**: All credentials (passwords, keys, secrets) are automatically zeroized in memory when no longer needed\r\n- **No Credential Logging**: Credentials are never written to log files or exposed in debug output\r\n- **Redacted Debug Output**: Debug representations show `Secret([REDACTED])` instead of actual values\r\n- **Environment Variables**: Secure credential management using environment variables, never hardcoded\r\n- **Explicit Access Control**: Code must explicitly call `expose_secret()` to access credentials, enabling easy security audits\r\n\r\n**Protected Credentials:**\r\n\r\n- openEHR passwords\r\n- Cosmos DB keys\r\n- PostgreSQL connection strings (including embedded passwords)\r\n- Azure client secrets\r\n\r\n### Network \u0026 Access Security\r\n\r\n- **TLS 1.2+**: All connections encrypted in transit\r\n- **Certificate Verification**: TLS certificate validation enabled by default\r\n- **Least Privilege**: Read-only openEHR access recommended\r\n- **Azure RBAC**: Integrate with Azure role-based access control\r\n\r\n### Compliance \u0026 Audit\r\n\r\n- **Audit Logging**: All operations logged with timestamps\r\n- **PHI/PII Protection**: Sanitized logging, compliance-ready\r\n- **HIPAA-Ready**: Designed for healthcare compliance requirements\r\n- **Data Verification**: Optional SHA-256 checksums for data integrity\r\n\r\nFor detailed security best practices, see the [Configuration Guide](docs/configuration.md#security-best-practices).\r\n\r\n## 🤝 Contributing\r\n\r\nWe welcome contributions! Here's how to get started:\r\n\r\n1. **Fork the repository**\r\n2. **Create a feature branch**: `git checkout -b feature/my-feature`\r\n3. **Make your changes** following the [Developer Guide](docs/developer-guide.md)\r\n4. **Run tests**: `cargo test`\r\n5. **Run linter**: `cargo clippy --all-targets -- -D warnings`\r\n6. **Format code**: `cargo fmt`\r\n7. **Commit changes**: `git commit -m \"feat: add new feature\"`\r\n8. **Push to branch**: `git push origin feature/my-feature`\r\n9. **Open a Pull Request**\r\n\r\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.\r\n\r\n### Development Setup\r\n\r\n```bash\r\n# Clone repository\r\ngit clone https://github.com/erikhoward/atlas.git\r\ncd atlas\r\n\r\n# Install Rust\r\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\r\n\r\n# Install development tools\r\nrustup component add clippy rustfmt\r\n\r\n# Build and test\r\ncargo build\r\ncargo test\r\ncargo clippy --all-targets -- -D warnings\r\ncargo fmt\r\n```\r\n\r\n## 📝 License\r\n\r\nThis project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.\r\n\r\n## 🆘 Support\r\n\r\n### Documentation\r\n\r\n- [User Guide](docs/user-guide.md) - Usage instructions and troubleshooting\r\n- [PostgreSQL Setup Guide](docs/postgresql-setup.md) - PostgreSQL backend configuration\r\n- [Docker Setup Guide](docs/docker-setup.md) - Docker deployment instructions\r\n- [FAQ](docs/user-guide.md#faq) - Frequently asked questions\r\n\r\n### Community\r\n\r\n- **GitHub Issues**: [Report bugs or request features](https://github.com/erikhoward/atlas/issues)\r\n- **Discussions**: [Ask questions and share ideas](https://github.com/erikhoward/atlas/discussions)\r\n\r\n### Commercial Support\r\n\r\nFor enterprise support, training, or custom development, contact: erikhoward@pm.me\r\n\r\n## 🙏 Acknowledgments\r\n\r\nAtlas is built with these excellent open-source projects:\r\n\r\n- [Rust](https://www.rust-lang.org/) - Systems programming language\r\n- [Tokio](https://tokio.rs/) - Async runtime\r\n- [Clap](https://clap.rs/) - Command-line argument parsing\r\n- [Serde](https://serde.rs/) - Serialization framework\r\n- [Tracing](https://tracing.rs/) - Structured logging\r\n- [Azure SDK for Rust](https://github.com/Azure/azure-sdk-for-rust) - Azure integration\r\n- [tokio-postgres](https://github.com/sfackler/rust-postgres) - PostgreSQL async driver\r\n- [deadpool-postgres](https://github.com/bikeshedder/deadpool) - PostgreSQL connection pooling\r\n\r\n## 🗺️ Roadmap\r\n\r\n### Current Version (v2.3)\r\n\r\n- ✅ EHRBase vendor support\r\n- ✅ Better Platform vendor support with OIDC authentication\r\n- ✅ Azure Cosmos DB integration\r\n- ✅ PostgreSQL integration\r\n- ✅ Incremental sync with watermarks\r\n- ✅ Preserve and flatten modes\r\n- ✅ CLI interface\r\n- ✅ Docker and Kubernetes deployment\r\n- ✅ HIPAA \u0026 GDPR anonymization\r\n\r\n### Future Enhancements\r\n\r\n- 🔄 Prometheus metrics export\r\n- 🔄 FHIR transformation\r\n- 🔄 Bi-directional synchronization\r\n- 🔄 Support for other cloud providers (AWS, GCP)\r\n\r\n## 📚 Related Projects\r\n\r\n- [EHRBase](https://ehrbase.org/) - Open-source openEHR server\r\n- [Better Platform](https://www.better.care/) - Enterprise openEHR platform\r\n- [Azure Cosmos DB](https://azure.microsoft.com/en-us/services/cosmos-db/) - Globally distributed database\r\n- [openEHR](https://www.openehr.org/) - Open standard for health data\r\n\r\n---\r\n\r\n**Made with ❤️ by the Erik Howard \u0026 Atlas Contributors**\r\n\r\nIf you find Atlas useful, please consider giving it a ⭐ on GitHub!\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferikhoward%2Fatlas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ferikhoward%2Fatlas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferikhoward%2Fatlas/lists"}