https://github.com/imjuliengaupin/bourne

A Python-based framework for modular, AI-ready data pipeline orchestration featuring plug-and-play agents, real-time terminal dashboards, and Pydantic-powered schema validation.
https://github.com/imjuliengaupin/bourne

agentic-ai agentic-workflow agents ai etl-framework github-actions orchestration-framework pydantic-v2 python3

Last synced: 3 months ago
JSON representation

A Python-based framework for modular, AI-ready data pipeline orchestration featuring plug-and-play agents, real-time terminal dashboards, and Pydantic-powered schema validation.

Host: GitHub
URL: https://github.com/imjuliengaupin/bourne
Owner: imjuliengaupin
License: other
Created: 2025-07-05T18:00:27.000Z (about 1 year ago)
Default Branch: PROD
Last Pushed: 2026-03-28T06:47:24.000Z (4 months ago)
Last Synced: 2026-03-28T11:32:28.588Z (4 months ago)
Topics: agentic-ai, agentic-workflow, agents, ai, etl-framework, github-actions, orchestration-framework, pydantic-v2, python3
Language: Python
Homepage:
Size: 1.36 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Bourne

Enterprise-grade data pipeline orchestration for ETL workflows

[![Build Status](https://img.shields.io/github/actions/workflow/status/imjuliengaupin/bourne/devops.yml?branch=PROD&style=for-the-badge&logo=github&label=CI/CD)](https://github.com/imjuliengaupin/bourne/actions/workflows/devops.yml)
[![Coverage](https://img.shields.io/coveralls/github/imjuliengaupin/bourne/PROD?style=for-the-badge&logo=coveralls&label=COVERAGE)](https://coveralls.io/github/imjuliengaupin/bourne?branch=PROD)
[![Artifacts](https://img.shields.io/badge/Actions-Artifacts-6c757d?style=for-the-badge&logo=github)](https://github.com/imjuliengaupin/bourne/actions/workflows/devops.yml?query=branch%3APROD)

Why Bourne? •
Features •
Use Cases •
Quick Start •
How It Works •
Example Workflow •
Demo •
Quality & Reliability •
License

## Why Bourne?

Modern data pipelines require flexibility, reliability, and visibility. Bourne delivers:

- **Modular agent architecture** - Build complex workflows from simple, reusable components
- **Real-time observability** - Live dashboard showing workflow progress, data transformations, and errors
- **Configuration-driven execution** - Define pipelines in JSON, no code changes needed
- **Enterprise-grade validation** - Multi-tier schema validation with automatic fallback strategies
- **Production-ready** - Comprehensive testing, retry logic, dependency resolution, and stall detection

## :gear: Features

- [x] **Multi-agent orchestration**: Coordinate complex data workflows with automatic dependency resolution
- [x] **Extensible framework**: Add custom agents and transformation logic with minimal boilerplate
- [x] **Live terminal dashboard**: Real-time workflow progress visualization with color-coded statuses
- [x] **Field-level data preview**: Compare before/after transformations with highlighted changes
- [x] **JSON-first configuration**: Build entire pipelines with external JSON files—no code required
- [x] **Flexible data formats**: Native support for JSON and NDJSON ingestion and output
- [x] **Intelligent schema validation**: Pydantic v2-backed, multi-tier cascade that keeps validation succeeding across data variations
- [x] **Resilient execution**: Automatic retries, stall detection, and dependency handling
- [x] **Comprehensive test suite**: Full coverage with scenario-based end-to-end testing

(back to top)

## :bulb: Use Cases

**Data Integration & Normalization**

- Ingest data from multiple JSON sources
- Normalize key formats (camelCase → snake_case)
- Apply consistent schema validation
- Export unified, validated output

**ETL Pipeline Management**

- Complex multi-step workflows with task dependencies
- Schema transformation and validation at each stage
- Real-time monitoring of pipeline execution
- Automatic error recovery and retry logic

**Data Quality Assurance**

- Validate incoming data against strict schemas
- Identify and log transformation changes
- Track data lineage with automatic metadata
- Fail-fast strict mode for critical pipelines

**Custom Data Processing**

- Build domain-specific agents for specialized transformations
- Compose agents into production workflows
- Configuration-driven scaling across different data sources

(back to top)

## :rocket: Quick Start

### Prerequisites

- Python 3.13
- Git

### Installation

```bash
# Clone and install
git clone https://github.com/imjuliengaupin/bourne.git
cd bourne

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install core dependencies only
pip install -e .

# Run a sample workflow
python main.py \
--workflow json/workflows/default/default.json \
--connector json/connectors/default/single-record.json \
--debug
```

**Command-line Arguments**

- `--workflow` (required): Point to any JSON workflow configuration
- `--connector` (required): Point to any JSON data connector configuration
- `--debug` (optional): Enable live terminal dashboard with progress visualization and real-time logging

(back to top)

## 🏗️ How It Works

### Architecture at a Glance

[![Open in Eraser](https://img.shields.io/badge/Open%20in-Eraser-blue?logo=eraser&style=for-the-badge)](https://app.eraser.io/workspace/LDgZLTRhjaVsKpyiZU0B)

Bourne orchestrates four core agents in configurable sequences:

| 🤖 Agent | 🎯 Purpose |
| --------------------------- | ------------------------------------------------------------ |
| **DataIngestionAgent** | Reads JSON/NDJSON from files or APIs |
| **DataValidationAgent** | Validates against expected schemas with intelligent fallback |
| **DataTransformationAgent** | Applies configurable key transformations and normalizations |
| **DataStorageAgent** | Persists processed data to files or external systems |

Each agent is independent, reusable, and can be composed into complex workflows with automatic dependency resolution.

### Configuration-Driven Pipelines

Define your entire workflow in JSON—no Python code needed. Typical flow:

- **Ingest**: Pull JSON/NDJSON from a source connector
- **Validate**: Enforce schemas with automatic fallback (primary → fallback → manual)
- **Transform**: Apply key normalization modes and add lineage metadata
- **Save**: Persist processed data to the configured destination

Workflows automatically handle:

- Task dependency ordering
- Automatic retries on transient failures
- Stall detection and recovery
- Real-time progress tracking

### Transformations & Validation

Apply powerful transformations without writing code:

| 🛠️ Transformation | 🎯 Use Case | 📊 Example |
| ------------------ | ------------------------------------------ | ------------------------------- |
| `lowercase_keys` | Convert all keys to lowercase | `FirstName` → `firstname` |
| `uppercase_keys` | Convert all keys to UPPERCASE | `FirstName` → `FIRSTNAME` |
| `snake_case_keys` | Convert to Python/database friendly format | `FirstName` → `first_name` |
| `camel_case_keys` | Convert all keys to camelCase format | `FirstName` → `firstName` |
| `pascal_case_keys` | Convert all keys to PascalCase format | `first_name` → `FirstName` |
| `normalize_types` | Coerce all values to string type | `{"id": 123}` → `{"id": "123"}` |

Transformation metadata keys auto-match the selected transform mode; toggle inclusion with `include_transformation_metadata` (default: off).

Schema validation uses a cascading approach:

1. **Primary** - Full typed Pydantic validation
2. **Fallback** - Enhanced validation with type flexibility
3. **Manual** - String-normalized fallback for ambiguous data

This ensures your pipeline succeeds even with unexpected data variations.

(back to top)

## 🧪 Example Workflow (Conceptual)

- **Source**: JSON/NDJSON ingested through a connector
- **Validate**: Schema-checked with graceful fallback to keep data flowing
- **Transform**: Key casing/normalization applied; transformation metadata added
- **Store**: Written to the configured output target
- **Observe**: Live dashboard shows task progress, retries, and data diffs

Note: Transformation automatically adds lineage metadata fields whose key casing matches the selected `transform_mode`.

(back to top)

## 🎬 Demo

### Live Workflow Dashboard

Real-time visualization of agent execution, task dependencies, and data flow:

### Data Transformation Preview

See exactly what changed before and after transformations are applied. Highlighted fields show which keys and values were modified:

Sample transformation exhibited: `lowercase_keys` with `include_transformation_metadata` fields enabled.

(back to top)

## 💪🏼 Quality & Reliability

- **Comprehensive testing**: 21 end-to-end scenario tests + 12 unit tests validate all agents, connectors, and transformation modes
- **Continuous verification**: Automated CI/CD pipeline (GitHub Actions) with static type checking, linting, and coverage reporting (Coveralls)
- **Production-ready resilience**: Built-in retries, automatic dependency ordering, stall detection, and graceful validation fallback
- **Observable & maintainable**: Full-stack observability through live dashboard; clean, modular architecture for long-term maintenance

(back to top)

## :pencil: License

This source code is proprietary. Unauthorized copying, modification, distribution, or use is prohibited without explicit permission from the author.

(back to top)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/imjuliengaupin/bourne

Awesome Lists containing this project

README

Bourne