An open API service indexing awesome lists of open source software.

https://github.com/erikhoward/atlas

ETL tool for exporting OpenEHR compositions to multiple datastore backends
https://github.com/erikhoward/atlas

azure cosmosdb etl healthcare openehr postgresql rust rust-lang

Last synced: 4 days ago
JSON representation

ETL tool for exporting OpenEHR compositions to multiple datastore backends

Awesome Lists containing this project

README

          

Atlas Logo

# Atlas

[![Build Status](https://img.shields.io/github/actions/workflow/status/erikhoward/atlas/ci.yml?branch=main)](https://github.com/erikhoward/atlas/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Rust Version](https://img.shields.io/badge/rust-1.70%2B-orange.svg)](https://www.rust-lang.org/)
[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg)](docs/)

**Atlas** is a high-performance, open-source ETL tool built in Rust that bridges openEHR clinical data repositories with modern analytics platforms. It enables healthcare organizations to seamlessly export openEHR compositions to Azure Cosmos DB or PostgreSQL for advanced analytics, machine learning, and research.

## 🎯 Overview

Atlas solves the challenge of making openEHR clinical data accessible for modern analytics workflows. By exporting compositions from openEHR servers (EHRBase, Better Platform) to your choice of database backend (Azure Cosmos DB or PostgreSQL), Atlas enables:

- **Clinical Research**: Query patient data using familiar SQL instead of AQL
- **Machine Learning**: Build ML models on flattened, analytics-ready data
- **Operational Analytics**: Power dashboards and reports with Azure-native tools
- **Regulatory Reporting**: Maintain audit trails with data verification
- **Data Integration**: Connect openEHR data to Azure Synapse, Databricks, and Power BI

## ✨ Key Features

### Core Capabilities

- **πŸš€ High Performance**: Built with Rust for async/concurrent processing
- Batch processing with configurable sizes (100-5000 compositions)
- Parallel EHR processing (1-100 concurrent EHRs)
- Throughput: 1000-2000 compositions/minute

- **πŸ”„ Incremental Sync**: Smart state management with watermarks
- Track last export per {template_id, ehr_id} combination
- Export only new/changed data since last run
- Automatic checkpoint and resume from failures

- **🎨 Flexible Transformation**: Multiple composition formats
- **Preserve Mode**: Maintain exact FLAT JSON structure from openEHR
- **Flatten Mode**: Convert nested paths to flat field names for ML/analytics

- **βš™οΈ Easy Configuration**: TOML-based with environment variable support
- Simple, human-readable configuration files
- Secure credential management with env vars
- Comprehensive validation and error messages

- **πŸ›‘οΈ Reliable & Resilient**: Production-ready error handling
- Automatic retry with exponential backoff
- Partial batch failure handling
- Duplicate detection and skipping
- **Graceful shutdown** with SIGTERM/SIGINT handling
- Automatic checkpoint on interruption for safe resume

- **πŸ“Š Database Flexibility**: Multiple backend options
- **Azure Cosmos DB**: Core (SQL) API with automatic partitioning
- **PostgreSQL**: 14+ with JSONB support for flexible querying
- Azure Log Analytics integration (Logs Ingestion API)
- Kubernetes/AKS deployment support

- **πŸ”’ Privacy & Compliance**: Built-in anonymization
- **Automated PII Detection**: Regex-based detection of 24+ PII categories
- **HIPAA Safe Harbor**: 18 identifiers per 45 CFR Β§164.514(b)(2)
- **GDPR Compliance**: HIPAA identifiers + GDPR quasi-identifiers
- **Flexible Strategies**: Redaction or tokenization
- **Dry-Run Mode**: Preview PII detection without modifying data
- **Audit Logging**: SHA-256 hashed values, comprehensive tracking
- **Zero Performance Impact**: <100ms overhead, <15% throughput impact

### Technical Highlights

- **Vendor Abstraction**: Trait-based design supports multiple openEHR vendors (EHRBase, Better Platform)
- **Type Safety**: Strongly-typed domain models with Rust's type system
- **Observability**: Structured logging with tracing, Azure integration
- **Security**: TLS 1.2+, credential management, least-privilege access
- **Compliance**: HIPAA-ready, GDPR-ready, audit logging, data verification

## πŸš€ Quick Start

### Prerequisites

- **Rust 1.70+** (for building from source)
- **openEHR Server** (choose one):
- **EHRBase**: Version 0.30+ with REST API v1.1.x
- **Better Platform**: Sandbox or production environment with OIDC authentication
- **Database Backend** (choose one):
- **Azure Cosmos DB**: Core (SQL) API account with database created
- **PostgreSQL**: Version 14+ with database created
- **Network Access**: Outbound HTTPS to openEHR server and database

### Installation

#### Option 1: Pre-built Binary (Recommended)

```bash
# Download latest release
wget https://github.com/erikhoward/atlas/releases/download/v2.4.0/atlas-linux-x86_64.tar.gz

# Extract and install
tar -xzf atlas-linux-x86_64.tar.gz
sudo mv atlas /usr/local/bin/
sudo chmod +x /usr/local/bin/atlas

# Verify installation
atlas --version
```

#### Option 2: Build from Source

```bash
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone the repository
git clone https://github.com/erikhoward/atlas.git
cd atlas

# Build release binary
cargo build --release

# Install binary
sudo cp target/release/atlas /usr/local/bin/

# Verify installation
atlas --version
```

#### Option 3: Docker (Recommended for Production)

```bash
# Pull the latest Docker image
docker pull erikhoward/atlas:latest

# Run Atlas with configuration file
docker run --rm \
-v $(pwd)/atlas.toml:/app/config/atlas.toml \
-e ATLAS_OPENEHR_USERNAME=your_username \
-e ATLAS_OPENEHR_PASSWORD=your_password \
-e ATLAS_COSMOSDB_KEY=your_cosmos_key \
erikhoward/atlas:latest \
export --config /app/config/atlas.toml

# Or use docker-compose (see docker-compose.yml example)
docker-compose up
```

**Docker Benefits:**

- βœ… No Rust installation required
- βœ… Consistent environment across deployments
- βœ… Easy integration with Kubernetes/AKS
- βœ… Multi-platform support (amd64, arm64)

See [Docker Setup Guide](docs/docker-setup.md) for detailed instructions.

### Configuration

```bash
# Generate sample configuration with examples
atlas init --with-examples --output atlas.toml

# Edit configuration for your environment
vi atlas.toml

# Option 1: Use .env file (recommended for development)
# Create a .env file in the project root with your credentials
cat > .env << EOF
ATLAS_OPENEHR_USERNAME=your-openehr-username
ATLAS_OPENEHR_PASSWORD=your-openehr-password
ATLAS_PG_PASSWORD=your-postgres-password
EOF

# The .env file is automatically loaded when Atlas starts

# Option 2: Set environment variables manually
export ATLAS_OPENEHR_USERNAME="your-openehr-username"
export ATLAS_OPENEHR_PASSWORD="your-openehr-password"

# For CosmosDB
export ATLAS_COSMOSDB_KEY="your-cosmos-db-key"

# For PostgreSQL
export ATLAS_PG_PASSWORD="your-postgres-password"

# Validate configuration
atlas validate-config -c atlas.toml
```

**Minimal Configuration Example (CosmosDB)**:

```toml
[openehr]
base_url = "https://your-ehrbase-server.com/ehrbase"
username = "${ATLAS_OPENEHR_USERNAME}"
password = "${ATLAS_OPENEHR_PASSWORD}"
tls_verify = true

[openehr.query]
template_ids = ["IDCR - Vital Signs.v1"]

[export]
mode = "incremental"
export_composition_format = "preserve"
database_target = "cosmosdb"

[cosmosdb]
endpoint = "https://your-account.documents.azure.com:443/"
key = "${ATLAS_COSMOSDB_KEY}"
database_name = "openehr_data"
```

**Minimal Configuration Example (PostgreSQL)**:

```toml
[openehr]
base_url = "https://your-ehrbase-server.com/ehrbase"
username = "${ATLAS_OPENEHR_USERNAME}"
password = "${ATLAS_OPENEHR_PASSWORD}"
tls_verify = true

[openehr.query]
template_ids = ["IDCR - Vital Signs.v1"]

[export]
mode = "incremental"
export_composition_format = "preserve"
database_target = "postgresql"

[postgresql]
connection_string = "postgresql://atlas_user:${ATLAS_PG_PASSWORD}@localhost:5432/openehr_data?sslmode=require"
max_connections = 20
```

See `examples/atlas.example.toml` for CosmosDB configuration and `examples/atlas.postgresql.example.toml` for PostgreSQL configuration.

#### 12-Factor App Configuration

Atlas supports comprehensive environment variable overrides for all configuration options, enabling containerized deployments and 12-factor app compliance:

```bash
# Override any configuration value using ATLAS__ pattern
export ATLAS_DATABASE_TARGET=postgresql
export ATLAS_APPLICATION_LOG_LEVEL=debug
export ATLAS_OPENEHR_BASE_URL=https://prod-ehrbase.com
export ATLAS_OPENEHR_USERNAME=atlas_prod
export ATLAS_OPENEHR_PASSWORD=secret
export ATLAS_OPENEHR_QUERY_BATCH_SIZE=2000
export ATLAS_EXPORT_MODE=incremental
export ATLAS_POSTGRESQL_CONNECTION_STRING="postgresql://user:pass@postgres:5432/db"
export ATLAS_POSTGRESQL_MAX_CONNECTIONS=20

# Arrays support JSON or comma-separated format
export ATLAS_OPENEHR_QUERY_TEMPLATE_IDS='["IDCR - Vital Signs.v1","IDCR - Lab Report.v1"]'
export ATLAS_OPENEHR_QUERY_EHR_IDS="ehr-123,ehr-456,ehr-789"

# Run with minimal TOML file (or even no TOML file with all env vars set)
atlas export -c minimal.toml
```

**Docker Example**:

```bash
docker run -d \
-e ATLAS_DATABASE_TARGET=postgresql \
-e ATLAS_OPENEHR_BASE_URL=https://ehrbase.example.com \
-e ATLAS_OPENEHR_USERNAME=atlas \
-e ATLAS_OPENEHR_PASSWORD="${OPENEHR_PASSWORD}" \
-e ATLAS_OPENEHR_QUERY_TEMPLATE_IDS='["IDCR - Vital Signs.v1"]' \
-e ATLAS_POSTGRESQL_CONNECTION_STRING="${PG_CONNECTION_STRING}" \
-e ATLAS_EXPORT_MODE=incremental \
atlas:latest
```

**Kubernetes Example**:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: atlas-config
data:
ATLAS_DATABASE_TARGET: "postgresql"
ATLAS_OPENEHR_BASE_URL: "https://ehrbase.example.com"
ATLAS_OPENEHR_QUERY_TEMPLATE_IDS: '["IDCR - Vital Signs.v1"]'
ATLAS_EXPORT_MODE: "incremental"
---
apiVersion: v1
kind: Secret
metadata:
name: atlas-secrets
type: Opaque
stringData:
ATLAS_OPENEHR_PASSWORD: "secret"
ATLAS_POSTGRESQL_CONNECTION_STRING: "postgresql://user:pass@postgres:5432/db"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: atlas
spec:
template:
spec:
containers:
- name: atlas
image: atlas:latest
envFrom:
- configMapRef:
name: atlas-config
- secretRef:
name: atlas-secrets
```

See [Configuration Guide](docs/configuration.md#environment-variable-overrides) for complete list of supported environment variables.

### Basic Usage

```bash
# Run export
atlas export -c atlas.toml

# Dry run to preview (no data written)
atlas export -c atlas.toml --dry-run

# Check export status and watermarks
atlas status -c atlas.toml

# Override configuration options
atlas export -c atlas.toml --mode full --template-id "Your Template.v1"
```

### Graceful Shutdown

Atlas supports graceful shutdown for long-running exports, ensuring data integrity and allowing safe resumption:

```bash
# Start an export
atlas export -c atlas.toml

# Press Ctrl+C or send SIGTERM to gracefully stop
# Atlas will:
# 1. Complete the current batch being processed
# 2. Save watermark state to database
# 3. Display progress summary
# 4. Exit with code 130 (SIGINT) or 143 (SIGTERM)

# Resume from where it left off
atlas export -c atlas.toml
```

**Key Features:**

- βœ… **Safe Interruption**: Current batch completes before shutdown (no partial data)
- βœ… **Automatic Checkpoint**: Watermarks saved with `Interrupted` status
- βœ… **Resume Support**: Re-run the same command to continue from checkpoint
- βœ… **Configurable Timeout**: Default 30s grace period (configurable via `export.shutdown_timeout_secs`)
- βœ… **Container-Ready**: Works with Docker stop, Kubernetes pod termination, systemd

**Configuration:**

```toml
[export]
# Graceful shutdown timeout in seconds (default: 30)
# Should align with container orchestration grace periods
shutdown_timeout_secs = 30
```

**Exit Codes:**

- `0` - Export completed successfully
- `1` - Partial success (some exports failed)
- `130` - Interrupted by SIGINT (Ctrl+C)
- `143` - Interrupted by SIGTERM (graceful termination signal)
- Other codes indicate configuration, authentication, or connection errors

### Example Use Cases

See the [`examples/`](examples/) directory for complete configurations:

- **[Clinical Research](examples/research-export.toml)**: Full export with data verification
- **[Daily Sync](examples/incremental-sync.toml)**: Incremental sync for production
- **[ML Features](examples/ml-features.toml)**: Flattened data for machine learning

## πŸ”’ Anonymization

Atlas includes built-in anonymization capabilities to protect PHI/PII when exporting openEHR compositions, helping organizations comply with HIPAA and GDPR regulations.

### Quick Start

Add anonymization configuration to your `atlas.toml`:

```toml
[anonymization]
enabled = true
mode = "hipaa_safe_harbor" # or "gdpr"
strategy = "token" # or "redact"
dry_run = false

[anonymization.audit]
enabled = true
log_path = "./audit/anonymization.log"
json_format = true
```

Run export with anonymization:

```bash
# Enable anonymization
atlas export --anonymize

# Override compliance mode
atlas export --anonymize --anonymize-mode gdpr

# Dry-run to preview PII detection
atlas export --anonymize --anonymize-dry-run
```

### Features

- **Automated PII Detection**: Regex-based detection of 24+ PII categories
- **HIPAA Safe Harbor**: 18 identifiers per 45 CFR Β§164.514(b)(2)
- **GDPR Compliance**: HIPAA identifiers + 6 GDPR quasi-identifiers
- **Flexible Strategies**:
- **Token**: Replace with unique random tokens (e.g., `TOKEN_NAME_a1b2c3d4`)
- **Redact**: Replace with category markers (e.g., `[REDACTED_NAME]`)
- **Dry-Run Mode**: Preview PII detection without modifying data
- **Audit Logging**: SHA-256 hashed values, comprehensive tracking
- **Performance**: <100ms overhead per composition, <15% throughput impact

### Compliance Modes

**HIPAA Safe Harbor** (`hipaa_safe_harbor`):

- Detects 18 identifiers specified in 45 CFR Β§164.514(b)(2)
- Suitable for US healthcare organizations

**GDPR** (`gdpr`):

- Detects all HIPAA identifiers + 6 GDPR quasi-identifiers
- Suitable for European organizations or multi-region deployments

### Documentation

For complete anonymization documentation, see:

- **[Anonymization User Guide](docs/anonymization-user-guide.md)** - Comprehensive usage guide

## 🐳 Docker Deployment

Atlas provides official Docker images for easy deployment and integration with container orchestration platforms.

### Quick Start with Docker

```bash
# Pull the latest image
docker pull erikhoward/atlas:latest

# Run with configuration file and environment variables
docker run --rm \
-v $(pwd)/atlas.toml:/app/config/atlas.toml \
-v $(pwd)/logs:/app/logs \
-e ATLAS_OPENEHR_USERNAME=${OPENEHR_USER} \
-e ATLAS_OPENEHR_PASSWORD=${OPENEHR_PASS} \
-e ATLAS_COSMOSDB_KEY=${COSMOS_KEY} \
erikhoward/atlas:latest \
export --config /app/config/atlas.toml
```

### Using Docker Compose

Create a `docker-compose.yml` file:

```yaml
version: '3.8'

services:
atlas:
image: erikhoward/atlas:latest
volumes:
- ./atlas.toml:/app/config/atlas.toml
- ./logs:/app/logs
environment:
- ATLAS_OPENEHR_USERNAME=${OPENEHR_USER}
- ATLAS_OPENEHR_PASSWORD=${OPENEHR_PASS}
- ATLAS_COSMOSDB_KEY=${COSMOS_KEY}
- RUST_LOG=info
command: export --config /app/config/atlas.toml
```

Run with:

```bash
docker-compose up
```

### Available Tags

- `latest` - Latest stable release from main branch
- `2.4.0`, `2.3`, `2` - Semantic version tags
- `main-` - Specific commit from main branch

### Multi-Platform Support

Images are built for multiple architectures:

- `linux/amd64` - Standard x86_64 servers
- `linux/arm64` - ARM64 (Apple Silicon, AWS Graviton, etc.)

### Building Custom Images

```bash
# Build locally
docker build -t atlas:custom .

# Build for specific platform
docker build --platform linux/amd64 -t atlas:custom .
```

For detailed Docker setup, configuration, and troubleshooting, see the **[Docker Setup Guide](docs/docker-setup.md)**.

## πŸ“– Documentation

### User Documentation

- **[User Guide](docs/user-guide.md)** - Complete usage instructions, troubleshooting, and best practices
- **[Configuration Guide](docs/configuration.md)** - Detailed configuration reference with all options
- **[Example Configurations](examples/)** - Ready-to-use configs for common scenarios

### Technical Documentation

- **[Architecture Documentation](docs/architecture.md)** - System design, components, and data flow
- **[Developer Guide](docs/developer-guide.md)** - Development setup and contribution guidelines

### Deployment Guides

- **[Standalone Deployment](docs/deployment/standalone.md)** - Binary deployment on Linux/macOS/Windows
- **[Docker Deployment](docs/deployment/docker.md)** - Containerized deployment
- **[Kubernetes Deployment](docs/deployment/kubernetes.md)** - AKS and Kubernetes deployment

## πŸ—οΈ Architecture

Atlas follows a layered architecture with clear separation of concerns:

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Atlas CLI β”‚
β”‚ (Rust Binary) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β”‚ REST API v1.1 β”‚ Database Adapters
β”‚ β”‚
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ openEHR Server β”‚ β”‚ Database Backends β”‚
β”‚ (EHRBase 0.30+) β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Azure Cosmos DB (NoSQL) β”‚ β”‚
β”‚ β”‚ Compositions β”‚ β”‚ β”‚ β”‚ - Control Container (watermarks) β”‚ β”‚
β”‚ β”‚ (FLAT JSON) β”‚ β”‚ β”‚ β”‚ - Data Containers (per template) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ - Partitioned by /ehr_id β”‚ β”‚
β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ PostgreSQL 14+ (Relational) β”‚ β”‚
β”‚ β”‚ - atlas_watermarks table β”‚ β”‚
β”‚ β”‚ - compositions_* tables β”‚ β”‚
β”‚ β”‚ - JSONB columns for flexibility β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Key Components**:

- **CLI Layer**: Command-line interface with clap
- **Core Layer**: Business logic (export, transform, state, verification)
- **Adapter Layer**: External integrations (openEHR, Cosmos DB, PostgreSQL)
- **Domain Layer**: Core types and models

See [Architecture Documentation](docs/architecture.md) for details.

## 🎯 Use Cases

### Clinical Research

Export patient cohorts for research studies while preserving exact data structures for regulatory compliance.

### Machine Learning

Flatten compositions into analytics-ready format for training predictive models on clinical data.

### Operational Analytics

Power real-time dashboards and reports by syncing openEHR data to Cosmos DB daily.

### Data Integration

Connect openEHR data to Azure Synapse Analytics, Databricks, or Power BI for advanced analytics.

### Regulatory Reporting

Maintain comprehensive audit trails and logging for compliance requirements.

## πŸ”§ Configuration Options

Atlas supports extensive configuration options:

| Category | Options | Description |
|----------|---------|-------------|
| **Export Mode** | `full`, `incremental` | Full export or incremental sync |
| **Format** | `preserve`, `flatten` | Maintain structure or flatten for analytics |
| **Batch Size** | 100-5000 | Compositions per batch |
| **Parallelism** | 1-100 EHRs | Concurrent EHR processing |
| **Logging** | Local, Azure Log Analytics | Structured logging options |

See [Configuration Guide](docs/configuration.md) for complete reference.

## πŸ“Š Performance

**Typical Performance** (depends on composition size and network):

- **Throughput**: 1000-2000 compositions/minute
- **Memory**: 2-4 GB RAM (configurable with batch size)
- **Cosmos DB**: ~10 RU per composition write

**Example Scenarios**:

- **Daily Sync**: 1,000 compositions in ~1-2 minutes
- **Research Export**: 50,000 compositions in ~50-100 minutes
- **ML Dataset**: 500,000 compositions in ~4-8 hours

## πŸ”’ Security

Atlas implements comprehensive security measures to protect sensitive healthcare data and credentials:

### Credential Protection

- **Memory Security**: All credentials (passwords, keys, secrets) are automatically zeroized in memory when no longer needed
- **No Credential Logging**: Credentials are never written to log files or exposed in debug output
- **Redacted Debug Output**: Debug representations show `Secret([REDACTED])` instead of actual values
- **Environment Variables**: Secure credential management using environment variables, never hardcoded
- **Explicit Access Control**: Code must explicitly call `expose_secret()` to access credentials, enabling easy security audits

**Protected Credentials:**

- openEHR passwords
- Cosmos DB keys
- PostgreSQL connection strings (including embedded passwords)
- Azure client secrets

### Network & Access Security

- **TLS 1.2+**: All connections encrypted in transit
- **Certificate Verification**: TLS certificate validation enabled by default
- **Least Privilege**: Read-only openEHR access recommended
- **Azure RBAC**: Integrate with Azure role-based access control

### Compliance & Audit

- **Audit Logging**: All operations logged with timestamps
- **PHI/PII Protection**: Sanitized logging, compliance-ready
- **HIPAA-Ready**: Designed for healthcare compliance requirements
- **Data Verification**: Optional SHA-256 checksums for data integrity

For detailed security best practices, see the [Configuration Guide](docs/configuration.md#security-best-practices).

## 🀝 Contributing

We welcome contributions! Here's how to get started:

1. **Fork the repository**
2. **Create a feature branch**: `git checkout -b feature/my-feature`
3. **Make your changes** following the [Developer Guide](docs/developer-guide.md)
4. **Run tests**: `cargo test`
5. **Run linter**: `cargo clippy --all-targets -- -D warnings`
6. **Format code**: `cargo fmt`
7. **Commit changes**: `git commit -m "feat: add new feature"`
8. **Push to branch**: `git push origin feature/my-feature`
9. **Open a Pull Request**

See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.

### Development Setup

```bash
# Clone repository
git clone https://github.com/erikhoward/atlas.git
cd atlas

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install development tools
rustup component add clippy rustfmt

# Build and test
cargo build
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt
```

## πŸ“ License

This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.

## πŸ†˜ Support

### Documentation

- [User Guide](docs/user-guide.md) - Usage instructions and troubleshooting
- [PostgreSQL Setup Guide](docs/postgresql-setup.md) - PostgreSQL backend configuration
- [Docker Setup Guide](docs/docker-setup.md) - Docker deployment instructions
- [FAQ](docs/user-guide.md#faq) - Frequently asked questions

### Community

- **GitHub Issues**: [Report bugs or request features](https://github.com/erikhoward/atlas/issues)
- **Discussions**: [Ask questions and share ideas](https://github.com/erikhoward/atlas/discussions)

### Commercial Support

For enterprise support, training, or custom development, contact: erikhoward@pm.me

## πŸ™ Acknowledgments

Atlas is built with these excellent open-source projects:

- [Rust](https://www.rust-lang.org/) - Systems programming language
- [Tokio](https://tokio.rs/) - Async runtime
- [Clap](https://clap.rs/) - Command-line argument parsing
- [Serde](https://serde.rs/) - Serialization framework
- [Tracing](https://tracing.rs/) - Structured logging
- [Azure SDK for Rust](https://github.com/Azure/azure-sdk-for-rust) - Azure integration
- [tokio-postgres](https://github.com/sfackler/rust-postgres) - PostgreSQL async driver
- [deadpool-postgres](https://github.com/bikeshedder/deadpool) - PostgreSQL connection pooling

## πŸ—ΊοΈ Roadmap

### Current Version (v2.3)

- βœ… EHRBase vendor support
- βœ… Better Platform vendor support with OIDC authentication
- βœ… Azure Cosmos DB integration
- βœ… PostgreSQL integration
- βœ… Incremental sync with watermarks
- βœ… Preserve and flatten modes
- βœ… CLI interface
- βœ… Docker and Kubernetes deployment
- βœ… HIPAA & GDPR anonymization

### Future Enhancements

- πŸ”„ Prometheus metrics export
- πŸ”„ FHIR transformation
- πŸ”„ Bi-directional synchronization
- πŸ”„ Support for other cloud providers (AWS, GCP)

## πŸ“š Related Projects

- [EHRBase](https://ehrbase.org/) - Open-source openEHR server
- [Better Platform](https://www.better.care/) - Enterprise openEHR platform
- [Azure Cosmos DB](https://azure.microsoft.com/en-us/services/cosmos-db/) - Globally distributed database
- [openEHR](https://www.openehr.org/) - Open standard for health data

---

**Made with ❀️ by the Erik Howard & Atlas Contributors**

If you find Atlas useful, please consider giving it a ⭐ on GitHub!