An open API service indexing awesome lists of open source software.

https://github.com/imjuliengaupin/bourne

A Python-based framework for modular, AI-ready data pipeline orchestration featuring plug-and-play agents, real-time terminal dashboards, and Pydantic-powered schema validation.
https://github.com/imjuliengaupin/bourne

agentic-ai agentic-workflow agents ai etl-framework github-actions orchestration-framework pydantic-v2 python3

Last synced: 3 months ago
JSON representation

A Python-based framework for modular, AI-ready data pipeline orchestration featuring plug-and-play agents, real-time terminal dashboards, and Pydantic-powered schema validation.

Awesome Lists containing this project

README

          




Bourne

Enterprise-grade data pipeline orchestration for ETL workflows

[![Build Status](https://img.shields.io/github/actions/workflow/status/imjuliengaupin/bourne/devops.yml?branch=PROD&style=for-the-badge&logo=github&label=CI/CD)](https://github.com/imjuliengaupin/bourne/actions/workflows/devops.yml)
[![Coverage](https://img.shields.io/coveralls/github/imjuliengaupin/bourne/PROD?style=for-the-badge&logo=coveralls&label=COVERAGE)](https://coveralls.io/github/imjuliengaupin/bourne?branch=PROD)
[![Artifacts](https://img.shields.io/badge/Actions-Artifacts-6c757d?style=for-the-badge&logo=github)](https://github.com/imjuliengaupin/bourne/actions/workflows/devops.yml?query=branch%3APROD)


Why Bourne?
Features
Use Cases
Quick Start
How It Works
Example Workflow
Demo
Quality & Reliability
License


## Why Bourne?

Modern data pipelines require flexibility, reliability, and visibility. Bourne delivers:

- **Modular agent architecture** - Build complex workflows from simple, reusable components
- **Real-time observability** - Live dashboard showing workflow progress, data transformations, and errors
- **Configuration-driven execution** - Define pipelines in JSON, no code changes needed
- **Enterprise-grade validation** - Multi-tier schema validation with automatic fallback strategies
- **Production-ready** - Comprehensive testing, retry logic, dependency resolution, and stall detection


## :gear: Features

- [x] **Multi-agent orchestration**: Coordinate complex data workflows with automatic dependency resolution
- [x] **Extensible framework**: Add custom agents and transformation logic with minimal boilerplate
- [x] **Live terminal dashboard**: Real-time workflow progress visualization with color-coded statuses
- [x] **Field-level data preview**: Compare before/after transformations with highlighted changes
- [x] **JSON-first configuration**: Build entire pipelines with external JSON files—no code required
- [x] **Flexible data formats**: Native support for JSON and NDJSON ingestion and output
- [x] **Intelligent schema validation**: Pydantic v2-backed, multi-tier cascade that keeps validation succeeding across data variations
- [x] **Resilient execution**: Automatic retries, stall detection, and dependency handling
- [x] **Comprehensive test suite**: Full coverage with scenario-based end-to-end testing


(back to top)

## :bulb: Use Cases

**Data Integration & Normalization**

- Ingest data from multiple JSON sources
- Normalize key formats (camelCase → snake_case)
- Apply consistent schema validation
- Export unified, validated output

**ETL Pipeline Management**

- Complex multi-step workflows with task dependencies
- Schema transformation and validation at each stage
- Real-time monitoring of pipeline execution
- Automatic error recovery and retry logic

**Data Quality Assurance**

- Validate incoming data against strict schemas
- Identify and log transformation changes
- Track data lineage with automatic metadata
- Fail-fast strict mode for critical pipelines

**Custom Data Processing**

- Build domain-specific agents for specialized transformations
- Compose agents into production workflows
- Configuration-driven scaling across different data sources


(back to top)

## :rocket: Quick Start

### Prerequisites

- Python 3.13
- Git

### Installation

```bash
# Clone and install
git clone https://github.com/imjuliengaupin/bourne.git
cd bourne

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install core dependencies only
pip install -e .

# Run a sample workflow
python main.py \
--workflow json/workflows/default/default.json \
--connector json/connectors/default/single-record.json \
--debug
```

**Command-line Arguments**

- `--workflow` (required): Point to any JSON workflow configuration
- `--connector` (required): Point to any JSON data connector configuration
- `--debug` (optional): Enable live terminal dashboard with progress visualization and real-time logging


(back to top)

## 🏗️ How It Works

### Architecture at a Glance

[![Open in Eraser](https://img.shields.io/badge/Open%20in-Eraser-blue?logo=eraser&style=for-the-badge)](https://app.eraser.io/workspace/LDgZLTRhjaVsKpyiZU0B)


Bourne orchestrates four core agents in configurable sequences:

| 🤖 Agent | 🎯 Purpose |
| --------------------------- | ------------------------------------------------------------ |
| **DataIngestionAgent** | Reads JSON/NDJSON from files or APIs |
| **DataValidationAgent** | Validates against expected schemas with intelligent fallback |
| **DataTransformationAgent** | Applies configurable key transformations and normalizations |
| **DataStorageAgent** | Persists processed data to files or external systems |

Each agent is independent, reusable, and can be composed into complex workflows with automatic dependency resolution.


### Configuration-Driven Pipelines

Define your entire workflow in JSON—no Python code needed. Typical flow:

- **Ingest**: Pull JSON/NDJSON from a source connector
- **Validate**: Enforce schemas with automatic fallback (primary → fallback → manual)
- **Transform**: Apply key normalization modes and add lineage metadata
- **Save**: Persist processed data to the configured destination

Workflows automatically handle:

- Task dependency ordering
- Automatic retries on transient failures
- Stall detection and recovery
- Real-time progress tracking


### Transformations & Validation

Apply powerful transformations without writing code:

| 🛠️ Transformation | 🎯 Use Case | 📊 Example |
| ------------------ | ------------------------------------------ | ------------------------------- |
| `lowercase_keys` | Convert all keys to lowercase | `FirstName` → `firstname` |
| `uppercase_keys` | Convert all keys to UPPERCASE | `FirstName` → `FIRSTNAME` |
| `snake_case_keys` | Convert to Python/database friendly format | `FirstName` → `first_name` |
| `camel_case_keys` | Convert all keys to camelCase format | `FirstName` → `firstName` |
| `pascal_case_keys` | Convert all keys to PascalCase format | `first_name` → `FirstName` |
| `normalize_types` | Coerce all values to string type | `{"id": 123}` → `{"id": "123"}` |

Transformation metadata keys auto-match the selected transform mode; toggle inclusion with `include_transformation_metadata` (default: off).

Schema validation uses a cascading approach:

1. **Primary** - Full typed Pydantic validation
2. **Fallback** - Enhanced validation with type flexibility
3. **Manual** - String-normalized fallback for ambiguous data

This ensures your pipeline succeeds even with unexpected data variations.


(back to top)

## 🧪 Example Workflow (Conceptual)

- **Source**: JSON/NDJSON ingested through a connector
- **Validate**: Schema-checked with graceful fallback to keep data flowing
- **Transform**: Key casing/normalization applied; transformation metadata added
- **Store**: Written to the configured output target
- **Observe**: Live dashboard shows task progress, retries, and data diffs

Note: Transformation automatically adds lineage metadata fields whose key casing matches the selected `transform_mode`.


(back to top)

## 🎬 Demo

### Live Workflow Dashboard

Real-time visualization of agent execution, task dependencies, and data flow:


### Data Transformation Preview

See exactly what changed before and after transformations are applied. Highlighted fields show which keys and values were modified:

Sample transformation exhibited: `lowercase_keys` with `include_transformation_metadata` fields enabled.


(back to top)

## 💪🏼 Quality & Reliability

- **Comprehensive testing**: 21 end-to-end scenario tests + 12 unit tests validate all agents, connectors, and transformation modes
- **Continuous verification**: Automated CI/CD pipeline (GitHub Actions) with static type checking, linting, and coverage reporting (Coveralls)
- **Production-ready resilience**: Built-in retries, automatic dependency ordering, stall detection, and graceful validation fallback
- **Observable & maintainable**: Full-stack observability through live dashboard; clean, modular architecture for long-term maintenance


(back to top)

## :pencil: License

All rights reserved.

This source code is proprietary. Unauthorized copying, modification, distribution, or use is prohibited without explicit permission from the author.


(back to top)