https://github.com/imjuliengaupin/bourne
A Python-based framework for modular, AI-ready data pipeline orchestration featuring plug-and-play agents, real-time terminal dashboards, and Pydantic-powered schema validation.
https://github.com/imjuliengaupin/bourne
agentic-ai agentic-workflow agents ai etl-framework github-actions orchestration-framework pydantic-v2 python3
Last synced: 3 months ago
JSON representation
A Python-based framework for modular, AI-ready data pipeline orchestration featuring plug-and-play agents, real-time terminal dashboards, and Pydantic-powered schema validation.
- Host: GitHub
- URL: https://github.com/imjuliengaupin/bourne
- Owner: imjuliengaupin
- License: other
- Created: 2025-07-05T18:00:27.000Z (12 months ago)
- Default Branch: PROD
- Last Pushed: 2026-03-28T06:47:24.000Z (3 months ago)
- Last Synced: 2026-03-28T11:32:28.588Z (3 months ago)
- Topics: agentic-ai, agentic-workflow, agents, ai, etl-framework, github-actions, orchestration-framework, pydantic-v2, python3
- Language: Python
- Homepage:
- Size: 1.36 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Bourne
Enterprise-grade data pipeline orchestration for ETL workflows
[](https://github.com/imjuliengaupin/bourne/actions/workflows/devops.yml)
[](https://coveralls.io/github/imjuliengaupin/bourne?branch=PROD)
[](https://github.com/imjuliengaupin/bourne/actions/workflows/devops.yml?query=branch%3APROD)
Why Bourne? •
Features •
Use Cases •
Quick Start •
How It Works •
Example Workflow •
Demo •
Quality & Reliability •
License
## Why Bourne?
Modern data pipelines require flexibility, reliability, and visibility. Bourne delivers:
- **Modular agent architecture** - Build complex workflows from simple, reusable components
- **Real-time observability** - Live dashboard showing workflow progress, data transformations, and errors
- **Configuration-driven execution** - Define pipelines in JSON, no code changes needed
- **Enterprise-grade validation** - Multi-tier schema validation with automatic fallback strategies
- **Production-ready** - Comprehensive testing, retry logic, dependency resolution, and stall detection
## :gear: Features
- [x] **Multi-agent orchestration**: Coordinate complex data workflows with automatic dependency resolution
- [x] **Extensible framework**: Add custom agents and transformation logic with minimal boilerplate
- [x] **Live terminal dashboard**: Real-time workflow progress visualization with color-coded statuses
- [x] **Field-level data preview**: Compare before/after transformations with highlighted changes
- [x] **JSON-first configuration**: Build entire pipelines with external JSON files—no code required
- [x] **Flexible data formats**: Native support for JSON and NDJSON ingestion and output
- [x] **Intelligent schema validation**: Pydantic v2-backed, multi-tier cascade that keeps validation succeeding across data variations
- [x] **Resilient execution**: Automatic retries, stall detection, and dependency handling
- [x] **Comprehensive test suite**: Full coverage with scenario-based end-to-end testing
## :bulb: Use Cases
**Data Integration & Normalization**
- Ingest data from multiple JSON sources
- Normalize key formats (camelCase → snake_case)
- Apply consistent schema validation
- Export unified, validated output
**ETL Pipeline Management**
- Complex multi-step workflows with task dependencies
- Schema transformation and validation at each stage
- Real-time monitoring of pipeline execution
- Automatic error recovery and retry logic
**Data Quality Assurance**
- Validate incoming data against strict schemas
- Identify and log transformation changes
- Track data lineage with automatic metadata
- Fail-fast strict mode for critical pipelines
**Custom Data Processing**
- Build domain-specific agents for specialized transformations
- Compose agents into production workflows
- Configuration-driven scaling across different data sources
## :rocket: Quick Start
### Prerequisites
- Python 3.13
- Git
### Installation
```bash
# Clone and install
git clone https://github.com/imjuliengaupin/bourne.git
cd bourne
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install core dependencies only
pip install -e .
# Run a sample workflow
python main.py \
--workflow json/workflows/default/default.json \
--connector json/connectors/default/single-record.json \
--debug
```
**Command-line Arguments**
- `--workflow` (required): Point to any JSON workflow configuration
- `--connector` (required): Point to any JSON data connector configuration
- `--debug` (optional): Enable live terminal dashboard with progress visualization and real-time logging
## 🏗️ How It Works
### Architecture at a Glance
[](https://app.eraser.io/workspace/LDgZLTRhjaVsKpyiZU0B)
Bourne orchestrates four core agents in configurable sequences:
| 🤖 Agent | 🎯 Purpose |
| --------------------------- | ------------------------------------------------------------ |
| **DataIngestionAgent** | Reads JSON/NDJSON from files or APIs |
| **DataValidationAgent** | Validates against expected schemas with intelligent fallback |
| **DataTransformationAgent** | Applies configurable key transformations and normalizations |
| **DataStorageAgent** | Persists processed data to files or external systems |
Each agent is independent, reusable, and can be composed into complex workflows with automatic dependency resolution.
### Configuration-Driven Pipelines
Define your entire workflow in JSON—no Python code needed. Typical flow:
- **Ingest**: Pull JSON/NDJSON from a source connector
- **Validate**: Enforce schemas with automatic fallback (primary → fallback → manual)
- **Transform**: Apply key normalization modes and add lineage metadata
- **Save**: Persist processed data to the configured destination
Workflows automatically handle:
- Task dependency ordering
- Automatic retries on transient failures
- Stall detection and recovery
- Real-time progress tracking
### Transformations & Validation
Apply powerful transformations without writing code:
| 🛠️ Transformation | 🎯 Use Case | 📊 Example |
| ------------------ | ------------------------------------------ | ------------------------------- |
| `lowercase_keys` | Convert all keys to lowercase | `FirstName` → `firstname` |
| `uppercase_keys` | Convert all keys to UPPERCASE | `FirstName` → `FIRSTNAME` |
| `snake_case_keys` | Convert to Python/database friendly format | `FirstName` → `first_name` |
| `camel_case_keys` | Convert all keys to camelCase format | `FirstName` → `firstName` |
| `pascal_case_keys` | Convert all keys to PascalCase format | `first_name` → `FirstName` |
| `normalize_types` | Coerce all values to string type | `{"id": 123}` → `{"id": "123"}` |
Transformation metadata keys auto-match the selected transform mode; toggle inclusion with `include_transformation_metadata` (default: off).
Schema validation uses a cascading approach:
1. **Primary** - Full typed Pydantic validation
2. **Fallback** - Enhanced validation with type flexibility
3. **Manual** - String-normalized fallback for ambiguous data
This ensures your pipeline succeeds even with unexpected data variations.
## 🧪 Example Workflow (Conceptual)
- **Source**: JSON/NDJSON ingested through a connector
- **Validate**: Schema-checked with graceful fallback to keep data flowing
- **Transform**: Key casing/normalization applied; transformation metadata added
- **Store**: Written to the configured output target
- **Observe**: Live dashboard shows task progress, retries, and data diffs
Note: Transformation automatically adds lineage metadata fields whose key casing matches the selected `transform_mode`.
## 🎬 Demo
### Live Workflow Dashboard
Real-time visualization of agent execution, task dependencies, and data flow:

### Data Transformation Preview
See exactly what changed before and after transformations are applied. Highlighted fields show which keys and values were modified:

Sample transformation exhibited: `lowercase_keys` with `include_transformation_metadata` fields enabled.
## 💪🏼 Quality & Reliability
- **Comprehensive testing**: 21 end-to-end scenario tests + 12 unit tests validate all agents, connectors, and transformation modes
- **Continuous verification**: Automated CI/CD pipeline (GitHub Actions) with static type checking, linting, and coverage reporting (Coveralls)
- **Production-ready resilience**: Built-in retries, automatic dependency ordering, stall detection, and graceful validation fallback
- **Observable & maintainable**: Full-stack observability through live dashboard; clean, modular architecture for long-term maintenance
## :pencil: License
All rights reserved.
This source code is proprietary. Unauthorized copying, modification, distribution, or use is prohibited without explicit permission from the author.