An open API service indexing awesome lists of open source software.

https://github.com/mrmcmullan/flycatcher

Define your schema once & for all — built for DataFrames, powered across Pydantic, Polars, and SQLAlchemy.
https://github.com/mrmcmullan/flycatcher

data-engineering data-validation dataframe etl orm polars pydantic python python3 schema sqlalchemy type-checking validation

Last synced: 24 days ago
JSON representation

Define your schema once & for all — built for DataFrames, powered across Pydantic, Polars, and SQLAlchemy.

Awesome Lists containing this project

README

          

Flycatcher Logo

Define your schema once. Validate at scale. Stay columnar.


Built for DataFrames, powered across Pydantic, Polars, and SQLAlchemy.



CI


codecov


PyPI version


Python 3.12+


License: MIT


Documentation

---

Flycatcher is a **DataFrame-native schema layer** for Python. Define your data model once and generate optimized representations for every part of your stack:

- 🎯 **Pydantic models** for API validation & serialization
- ⚡ **Polars validators** for blazing-fast bulk validation
- 🗄️ **SQLAlchemy tables** for typed database access

**Built for modern data workflows:** Validate millions of rows at high speed, keep schema drift at zero, and stay columnar end-to-end.

## ❓ Why Flycatcher?

Modern Python data projects need **row-level validation** (Pydantic), **efficient bulk operations** (Polars), and **typed database queries** (SQLAlchemy). But maintaining multiple schemas across this stack can lead to duplication, drift, and manually juggling row-oriented and columnar paradigms.

**Flycatcher solves this:** One schema definition → three optimized outputs.

```python
from flycatcher import Schema, Field, col, model_validator

class ProductSchema(Schema):
id: int = Field(primary_key=True)
name: str = Field(min_length=3, max_length=100)
price: float = Field(gt=0)
discount_price: float | None = Field(default=None, gt=0, nullable=True)

@model_validator
def check_discount():
# Cross-field validation with DSL
return (
col('discount_price') < col('price'),
"Discount price must be less than regular price"
)

# Generate three optimized representations
ProductModel = ProductSchema.to_pydantic() # → Pydantic BaseModel
ProductValidator = ProductSchema.to_polars_validator() # → Polars DataFrame validator
ProductTable = ProductSchema.to_sqlalchemy() # → SQLAlchemy Table
```

**Flycatcher lets you stay DataFrame-native without giving up the speed of Polars, the ergonomic validation of Pydantic, or the Pythonic power of SQLAlchemy**.

---

## 🚀 Quick Start

### Installation

```bash
pip install flycatcher
# or
uv add flycatcher
```

### Define Your Schema

```python
from datetime import datetime
from flycatcher import Schema, Field

class UserSchema(Schema):
id: int = Field(primary_key=True)
username: str = Field(min_length=3, max_length=50, unique=True)
email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', unique=True, index=True)
age: int = Field(ge=13, le=120)
is_active: bool = Field(default=True)
created_at: datetime
```

### Use Pydantic for Row-Level Validation

Perfect for APIs, forms, and single-record validation:

```python
from datetime import datetime

User = UserSchema.to_pydantic()

# Validates constraints automatically via Pydantic
user = User(
id=1,
username="alice",
email="alice@example.com",
age=25,
created_at=datetime.utcnow()
)

# Serialize to JSON/dict
print(user.model_dump_json())
```

### Use Polars for Bulk Validation

Perfect for DataFrame-level validation:

```python
import polars as pl

UserValidator = UserSchema.to_polars_validator()

# Validate 1M+ rows with blazing speed
df = pl.read_csv("users.csv")
validated_df = UserValidator.validate(df, strict=True)

validated_df.write_parquet("validated_users.parquet")
```

### Use SQLAlchemy for Database Operations

Perfect for typed queries and database interactions:

```python
from sqlalchemy import create_engine

UserTable = UserSchema.to_sqlalchemy(table_name="users")

engine = create_engine("postgresql://localhost/mydb")

# Type-safe queries
with engine.connect() as conn:
result = conn.execute(
UserTable.select()
.where(UserTable.c.is_active == True)
.where(UserTable.c.age >= 18)
)
for row in result:
print(row)
```

---

## ✨ Key Features

### Rich Field Types & Constraints

Use standard Python types with `Field(...)` constraints:

| Python Type | Constraints | Example |
|-------------|-------------|---------|
| `int` | `ge`, `gt`, `le`, `lt`, `multiple_of` | `age: int = Field(ge=0, le=120)` |
| `float` | `ge`, `gt`, `le`, `lt` | `price: float = Field(gt=0)` |
| `str` | `min_length`, `max_length`, `pattern` | `email: str = Field(pattern=r'^[^@]+@...')` |
| `bool` | - | `is_active: bool = Field(default=True)` |
| `datetime` | `ge`, `gt`, `le`, `lt` | `created_at: datetime = Field(ge=datetime(2020, 1, 1))` |
| `date` | `ge`, `gt`, `le`, `lt` | `birth_date: date` |

**All fields support (validation):** `nullable`, `default`, `description`

**SQLAlchemy-specific:** `primary_key`, `unique`, `index`, `autoincrement`

### Custom & Cross-Field Validation

Use the `col()` DSL for powerful field-level and cross-field validation that works across both Pydantic and Polars:

```python
from datetime import datetime
from flycatcher import Schema, Field, col, model_validator

class BookingSchema(Schema):
email: str
phone: str
check_in: datetime = Field(ge=datetime(2024, 1, 1))
check_out: datetime = Field(ge=datetime(2024, 1, 1))
nights: int = Field(ge=1)

@model_validator
def check_dates():
return (
col('check_out') > col('check_in'),
"Check-out must be after check-in"
)

@model_validator
def check_phone_format():
cleaned = col('phone').str.replace(r'[^\d]', '')
return (cleaned.str.len_chars() == 10, "Phone must have 10 digits")

@model_validator
def check_minimum_stay():
# For operations not yet in DSL (like .is_in()), use explicit Polars format
# Note: .dt.month() is available in DSL, but .is_in() is not yet supported
import polars as pl
return {
'polars': (
(~pl.col('check_in').dt.month().is_in([7, 8])) | (pl.col('nights') >= 3),
"Minimum stay in July and August is 3 nights"
),
'pydantic': lambda v: (
v.check_in.month not in [7, 8] or v.nights >= 3,
"Minimum stay in July and August is 3 nights"
)
}

```

### Validation Modes

Polars validation supports flexible error handling:

```python
# Strict mode: Raise on validation errors (default)
validated_df = UserValidator.validate(df, strict=True)

# Non-strict mode: Filter out invalid rows
valid_df = UserValidator.validate(df, strict=False)

# Show violations for debugging
validated_df = UserValidator.validate(df, strict=True, show_violations=True)
```

---

## 🎯 Complete Example: ETL Pipeline

```python
import polars as pl
from datetime import datetime
from flycatcher import Schema, Field, col, model_validator
from sqlalchemy import create_engine, MetaData

# 1. Define schema once
class OrderSchema(Schema):
order_id: int = Field(primary_key=True)
customer_email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', index=True)
amount: float = Field(gt=0)
tax: float = Field(ge=0)
total: float = Field(gt=0)
created_at: datetime

@model_validator
def check_total():
return (
col('total') == col('amount') + col('tax'),
"Total must equal amount + tax"
)

# 2. Extract & Validate with Polars (handles millions of rows)
OrderValidator = OrderSchema.to_polars_validator()
df = pl.read_csv("orders.csv")
validated_df = OrderValidator.validate(df, strict=True)

# 3. Load to database with SQLAlchemy
OrderTable = OrderSchema.to_sqlalchemy(table_name="orders")
engine = create_engine("postgresql://localhost/analytics")

with engine.connect() as conn:
conn.execute(OrderTable.insert(), validated_df.to_dicts())
conn.commit()
```

✅ **Result:** Validated millions of rows, enforced business rules, and loaded to database — all from one schema definition.

---

## 🏗️ Design Philosophy

**One schema, three representations. Each optimized for its use case.**

```
Schema Definition

┌──────────┼──────────┐
↓ ↓ ↓
Pydantic Polars SQLAlchemy
↓ ↓ ↓
APIs ETL Database
```

### What Flycatcher Does

✅ Single source of truth for schema definitions


✅ Generate optimized representations for different use cases


✅ Keep runtimes separate (no ORM ↔ DataFrame conversions)


✅ Use stable public APIs (Pydantic, Polars, SQLAlchemy)

### What Flycatcher Doesn't Do

❌ Mix row-oriented and columnar paradigms


❌ Create a "unified runtime" (that would be slow)


❌ Reinvent validation logic (delegates to proven libraries when possible)


❌ Depend on internal APIs

---

## ⚠️ Current Limitations (v0.1.0)

Flycatcher v0.1.0 is an **alpha release**. The core functionality works perfectly, but some advanced features are planned for future versions:

### Polars DSL

The `col()` DSL supports **basic operations** (`>`, `<`, `==`, `+`, etc.),
**numeric math operations** (`.abs()`, `.round()`, `.floor()`, `.ceil()`, `.sqrt()`, `.pow()`),
**limited string operations** (`.str.contains()`, `.str.starts_with()`, `.str.len_chars()`, etc.),
and a **limited datetime accessor** (`.dt.year()`, `.dt.month()`, `.dt.total_days(other)`, etc.).

The `col()` DSL does not support the full range of Polars operations. However, additional
operations will be added in future versions to better support the full functionality of Polars.

**Workaround**: Use the explicit format in `@model_validator`:

```python
@model_validator
def check():
return {
'polars': (pl.col('field').is_null(), "Message"),
'pydantic': lambda v: (v.field is None, "Message")
}
```

### Pydantic Features

- ❌ `@field_validator` - Only `@model_validator` is supported (coming in v0.2.0)
- ❌ Field aliases and computed fields (coming in v0.2.0+)
- ❌ Custom serialization options (coming in v0.2.0+)

**Workaround**: Use `@model_validator` for all validation needs.

### SQLAlchemy Features

- ❌ Foreign key relationships - Must be added manually after table generation (coming in v0.3.0+)
- ❌ Composite primary keys - Only single-field primary keys supported (coming in v0.3.0+)
- ❌ Function-based defaults (e.g., `default=func.now()`) - Only literal defaults supported

**Workaround**: Add relationships and composite keys manually in SQLAlchemy after table generation.

### Field Types

- ❌ Enum, UUID, JSON, Array field types (coming in v0.3.0+)
- ❌ Numeric/Decimal field type (coming in v0.3.0+)

**Workaround**: Use `String` with pattern validation or manual handling.

---

## 📊 Comparison

| Feature | Flycatcher | SQLModel | Patito |
|---------|-----------|----------|--------|
| Pydantic support | ✅ | ✅ | ✅ |
| Polars support | ✅ | ❌ | ✅ |
| SQLAlchemy support | ✅ | ✅ | ❌ |
| DataFrame-level DB ops | 🚧 (v0.2) | ❌ | ❌ |
| Cross-field validation | ✅ | ⚠️ (Pydantic only) | ⚠️ (Polars only) |
| Single schema definition | ✅ | ⚠️ (Pydantic + ORM hybrid) | ⚠️ (Pydantic + Polars hybrid) |

**Flycatcher** is the only library that generates optimized representations for all three systems while keeping them properly separated.

---

## 📚 Documentation

- **[Getting Started](https://mrmcmullan.github.io/flycatcher/)** - Installation and basics
- **[Tutorials](https://mrmcmullan.github.io/flycatcher/tutorials/)** - Step-by-step guides
- **[How-To Guides](https://mrmcmullan.github.io/flycatcher/how-to/)** - Solve specific problems
- **[API Reference](https://mrmcmullan.github.io/flycatcher/api/)** - Complete API documentation
- **[Explanations](https://mrmcmullan.github.io/flycatcher/explanations/)** - Deep dives and concepts

---

## 🛣️ Roadmap

### v0.1.0 (Released) 🚀

- [x] Core schema definition with metaclass
- [x] Field types with constraints (Integer, String, Float, Boolean, Datetime, Date)
- [x] Pydantic model generator
- [x] Polars DataFrame validator with bulk validation
- [x] SQLAlchemy table generator
- [x] Cross-field validators with DSL (`col()`)
- [x] Test suite with 70%+ coverage
- [x] Complete documentation site
- [x] PyPI publication

### v0.2.0 (In Progress) 🚧

**Theme:** Enhanced validation and database operations

- [ ] `@field_validator` support in addition to existing `@model_validator`
- [x] Enhanced Polars DSL: `.is_null()`, `.is_not_null()`, `.str.contains()`, `.str.startswith()`, `.dt.month`, `.dt.year`, `.is_in([...])`, `.is_between()`
- [ ] Pydantic enhancements: field aliases, computed fields, custom serialization
- [ ] Enable inheritance of `Schema` to create subclasses with different fields
- [ ] For more details, see the [GitHub Milestone for v0.2.0](https://github.com/mrmcmullan/flycatcher/milestone/2)

### v0.3.0 (Planned)

- [ ] DataFrame-level queries (`Schema.query()`)
- [ ] Bulk write operations (`Schema.insert()`, `Schema.update()`, `Schema.upsert()`)
- [ ] Complete ETL loop staying columnar end-to-end
- [ ] Add PascalCase metaclass
- [ ] Additional Pydantic validation modes (`mode='before'`, `mode='wrap'`)
- [ ] For more details, see the [GitHub Milestone for v0.3.0](https://github.com/mrmcmullan/flycatcher/milestone/3)

### v0.4.0+ (Future)

**Theme:** Advanced field types and relationships

- [ ] Additional field types: Enum, UUID, JSON, Array, Numeric/Decimal, Time, Binary, Interval
- [ ] SQLAlchemy relationships: Foreign keys, composite primary keys
- [ ] SQLAlchemy function-based defaults (e.g., `default=func.now()`)
- [ ] JOIN support in queries
- [ ] Aggregations (GROUP BY, COUNT, SUM)
- [ ] Schema migrations helper

## 🤝 Contributing

Contributions are welcome! Please see our [Contributing Guide] for details.

---

## 📄 License

MIT License - see [LICENSE]([LICENSE](https://github.com/mrmcmullan/flycatcher?tab=MIT-1-ov-file)) for details.

---

## 💬 Community

- **[GitHub Issues](https://github.com/mrmcmullan/flycatcher/issues)** - Bug reports and feature requests
- **[GitHub Discussions](https://github.com/mrmcmullan/flycatcher/discussions)** - Questions and community discussion
- **[Documentation](https://mrmcmullan.github.io/flycatcher)** - Full guides and API reference

---

Built with ❤️ for the DataFrame generation


⭐ Star us on GitHub
 | 
📖 Read the docs
 | 
🐛 Report a bug