https://github.com/go9sky/pytuck

A lightweight, pure Python document database with multi-engine support. No SQL required - manage your data through Python objects and methods. ✨纯Python实现的轻量级文档数据库，支持多种存储引擎，无SQL，通过对象和方法管理数据。
https://github.com/go9sky/pytuck
database orm python zero-dependency
Last synced: 5 months ago
JSON representation
Host: GitHub
URL: https://github.com/go9sky/pytuck
Owner: go9sky
License: mit
Created: 2026-01-10T08:57:18.000Z (6 months ago)
Default Branch: master
Last Pushed: 2026-01-16T18:33:33.000Z (6 months ago)
Last Synced: 2026-01-16T21:36:14.206Z (6 months ago)
Topics: database, orm, python, zero-dependency
Language: Python
Homepage:
Size: 642 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.EN.md
- Changelog: CHANGELOG.EN.md
- License: LICENSE
Awesome Lists containing this project

README

          # Pytuck - Lightweight Python Document Database

[![Gitee](https://img.shields.io/badge/Gitee-go9sky%2Fpytuck-red)](https://gitee.com/go9sky/pytuck)

[![GitHub](https://img.shields.io/badge/GitHub-go9sky%2Fpytuck-blue)](https://github.com/go9sky/pytuck)

[![PyPI version](https://badge.fury.io/py/pytuck.svg)](https://badge.fury.io/py/pytuck)

[![Python Versions](https://img.shields.io/pypi/pyversions/pytuck.svg)](https://pypi.org/project/pytuck/)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

[中文](README.md) | English

A lightweight, pure Python document database with multi-engine support. No SQL required - manage your data through Python objects and methods.

## Repository Mirrors

- **GitHub**: https://github.com/go9sky/pytuck

- **Gitee**: https://gitee.com/go9sky/pytuck

## Key Features

- **No SQL Required** - Work entirely with Python objects and methods

- **Multi-Engine Support** - Binary, JSON, CSV, SQLite, Excel, XML storage formats

- **Pluggable Architecture** - Zero dependencies by default, optional engines on demand

- **SQLAlchemy 2.0 Style API** - Modern query builders (`select()`, `insert()`, `update()`, `delete()`)

- **Generic Type Hints** - Complete generic support with precise IDE type inference (`List[User]` instead of `List[PureBaseModel]`)

- **Pythonic Query Syntax** - Use native Python operators (`User.age >= 18`)

- **Index Optimization** - Hash indexes for accelerated queries

- **Type Safety** - Automatic type validation and conversion (loose/strict modes), supports 10 field types

- **Relationships** - Supports one-to-many and many-to-one with lazy loading + auto caching

- **Independent Data Models** - Accessible after session close, usable like Pydantic

- **Persistence** - Automatic or manual data persistence to disk

## Quick Start

### Installation

```bash

# Basic installation (binary engine only, zero dependencies)

pip install pytuck

# Install specific engines

pip install pytuck[excel]   # Excel engine (requires openpyxl)

pip install pytuck[xml]     # XML engine (requires lxml)

# Install all engines

pip install pytuck[all]

# Development environment

pip install pytuck[dev]

```

### Basic Usage

Pytuck offers two usage modes:

#### Mode 1: Pure Model (Default, Recommended)

Operate data through Session, following SQLAlchemy 2.0 style:

```python

from typing import Type

from pytuck import Storage, declarative_base, Session, Column

from pytuck import PureBaseModel, select, insert, update, delete

# Create database (default: binary engine)

db = Storage(file_path='mydb.db')

Base: Type[PureBaseModel] = declarative_base(db)

# Define model

class Student(Base):

    __tablename__ = 'students'

    id = Column('id', int, primary_key=True)

    name = Column('name', str, nullable=False, index=True)

    age = Column('age', int)

    email = Column('email', str, nullable=True)

# Create Session

session = Session(db)

# Insert records

stmt = insert(Student).values(name='Alice', age=20, email='alice@example.com')

result = session.execute(stmt)

session.commit()

print(f"Created student, ID: {result.inserted_primary_key}")

# Query records

stmt = select(Student).where(Student.id == 1)

result = session.execute(stmt)

alice = result.scalars().first()

print(f"Found: {alice.name}, {alice.age} years old")

# Conditional query (Pythonic syntax)

stmt = select(Student).where(Student.age >= 18).order_by('name')

result = session.execute(stmt)

adults = result.scalars().all()

for student in adults:

    print(f"  - {student.name}")

# Identity Map example (0.3.0 NEW, object uniqueness guarantee)

student1 = session.get(Student, 1)  # Load from database

stmt = select(Student).where(Student.id == 1)

student2 = session.execute(stmt).scalars().first()  # Get through query

print(f"Same object? {student1 is student2}")  # True, same instance

# merge() operation example (0.3.0 NEW, merge external data)

external_student = Student(id=1, name="Alice Updated", age=22)  # External data

merged = session.merge(external_student)  # Intelligently merge into Session

session.commit()  # Update takes effect

# Update records

# Method 1: Use update statement (bulk update)

stmt = update(Student).where(Student.id == 1).values(age=21)

session.execute(stmt)

session.commit()

# Method 2: Attribute assignment update (0.3.0 NEW, more intuitive)

stmt = select(Student).where(Student.id == 1)

result = session.execute(stmt)

alice = result.scalars().first()

alice.age = 21  # Attribute assignment auto-detected and updates database

session.commit()  # Automatically writes changes to database

# Delete records

stmt = delete(Student).where(Student.id == 1)

session.execute(stmt)

session.commit()

# Close

session.close()

db.close()

```

#### Mode 2: Active Record

Models with built-in CRUD methods for simpler operations:

```python

from typing import Type

from pytuck import Storage, declarative_base, Column

from pytuck import CRUDBaseModel

# Create database

db = Storage(file_path='mydb.db')

Base: Type[CRUDBaseModel] = declarative_base(db, crud=True)  # Note: crud=True

# Define model

class Student(Base):

    __tablename__ = 'students'

    id = Column('id', int, primary_key=True)

    name = Column('name', str, nullable=False)

    age = Column('age', int)

# Create record (auto-save)

alice = Student.create(name='Alice', age=20)

print(f"Created: {alice.name}, ID: {alice.id}")

# Or save manually

bob = Student(name='Bob', age=22)

bob.save()

# Query records

student = Student.get(1)  # Query by primary key

students = Student.filter(Student.age >= 18).all()  # Conditional query

students = Student.filter_by(name='Alice').all()  # Equality query

all_students = Student.all()  # Get all

# Update records

alice.age = 21  # Active Record mode already supports attribute assignment updates

alice.save()    # Explicitly save to database

# Delete records

alice.delete()

# Close

db.close()

```

**How to Choose?**

- **Pure Model Mode**: Suited for larger projects, team development, clear data access layer separation

- **Active Record Mode**: Suited for smaller projects, rapid prototyping, simple CRUD operations

## Storage Engines

Pytuck supports multiple storage engines, each suited for different scenarios:

### Binary Engine (Default)

**Features**: Zero dependencies, compact, high performance, encryption support

```python

from pytuck.common.options import BinaryBackendOptions

# Basic usage

db = Storage(file_path='data.db', engine='binary')

# Enable encryption (three levels: low/medium/high)

opts = BinaryBackendOptions(encryption='high', password='mypassword')

db = Storage(file_path='secure.db', engine='binary', backend_options=opts)

# Open encrypted database (auto-detects encryption level)

opts = BinaryBackendOptions(password='mypassword')

db = Storage(file_path='secure.db', engine='binary', backend_options=opts)

```

**Encryption Levels**:

| Level | Algorithm | Security | Use Case |

|-------|-----------|----------|----------|

| `low` | XOR obfuscation | Prevents casual viewing | Prevent accidental file opening |

| `medium` | LCG stream cipher | Prevents regular users | General protection needs |

| `high` | ChaCha20 | Cryptographically secure | Sensitive data protection |

**Encryption Performance Benchmark** (1000 records, ~100 bytes each):

| Level | Write Time | Read Time | File Size | Read Overhead |

|-------|------------|-----------|-----------|---------------|

| None | 41ms | 17ms | 183KB | (baseline) |

| low | 33ms | 33ms | 183KB | +100% |

| medium | 82ms | 86ms | 183KB | +418% |

| high | 342ms | 335ms | 183KB | +1928% |

> **Note**: Encryption uses pure Python implementation to maintain zero dependencies. For better performance, consider using `low` or `medium` levels.

> Run `examples/benchmark_encryption.py` to test performance in your environment.

**Use Cases**:

- Production deployment

- Embedded applications

- Sensitive data protection

- Minimum footprint required

### JSON Engine

**Features**: Human-readable, debug-friendly, standard format

```python

from pytuck.common.options import JsonBackendOptions

# Configure JSON options

json_opts = JsonBackendOptions(indent=2, ensure_ascii=False)

db = Storage(file_path='data.json', engine='json', backend_options=json_opts)

```

**Use Cases**:

- Development and debugging

- Configuration storage

- Data exchange

### CSV Engine

**Features**: Excel compatible, tabular format, data analysis friendly

```python

from pytuck.common.options import CsvBackendOptions

# Configure CSV options

csv_opts = CsvBackendOptions(encoding='utf-8', delimiter=',')

db = Storage(file_path='data_dir', engine='csv', backend_options=csv_opts)

```

**Use Cases**:

- Data analysis

- Excel import/export

- Tabular data

### SQLite Engine

**Features**: Mature, stable, ACID compliance, SQL support

```python

from pytuck.common.options import SqliteBackendOptions

# Configure SQLite options (optional)

sqlite_opts = SqliteBackendOptions()  # Use default config

db = Storage(file_path='data.sqlite', engine='sqlite', backend_options=sqlite_opts)

```

**Use Cases**:

- Need SQL queries

- Need transaction guarantees

- Large datasets

### Excel Engine (Optional)

**Requires**: `openpyxl>=3.0.0`

```python

from pytuck.common.options import ExcelBackendOptions

# Configure Excel options (optional)

excel_opts = ExcelBackendOptions(read_only=False)  # Use default config

db = Storage(file_path='data.xlsx', engine='excel', backend_options=excel_opts)

```

**Use Cases**:

- Business reports

- Visualization needs

- Office automation

### XML Engine (Optional)

**Requires**: `lxml>=4.9.0`

```python

from pytuck.common.options import XmlBackendOptions

# Configure XML options

xml_opts = XmlBackendOptions(encoding='utf-8', pretty_print=True)

db = Storage(file_path='data.xml', engine='xml', backend_options=xml_opts)

```

**Use Cases**:

- Enterprise integration

- Standardized exchange

- Configuration files

## Advanced Features

### Generic Type Hints

Pytuck provides complete generic type support, enabling IDEs to precisely infer the specific types of query results and significantly enhancing the development experience:

#### IDE Type Inference Effects

```python

from typing import List, Optional

from pytuck import Storage, declarative_base, Session, Column

from pytuck import select, insert, update, delete

db = Storage('mydb.db')

Base = declarative_base(db)

class User(Base):

    __tablename__ = 'users'

    id = Column('id', int, primary_key=True)

    name = Column('name', str)

    age = Column('age', int)

session = Session(db)

# Statement builder type inference

stmt = select(User)  # IDE infers: Select[User] ✅

chained = stmt.where(User.age >= 18)  # IDE infers: Select[User] ✅

# Session execution type inference

result = session.execute(stmt)  # IDE infers: Result[User] ✅

# Result processing precise types

users = result.scalars().all()  # IDE infers: List[User] ✅ (no longer List[PureBaseModel])

user = result.scalars().first()  # IDE infers: Optional[User] ✅

# IDE knows specific attribute types

for user in users:

    user_name: str = user.name  # ✅ IDE knows this is str

    user_age: int = user.age    # ✅ IDE knows this is int

    # user.invalid_field        # ❌ IDE warns attribute doesn't exist

```

#### Type Safety Features

- **Precise Type Inference**: `select(User)` returns `Select[User]`, not generic `Select`

- **Smart Code Completion**: IDE accurately suggests model attributes and methods

- **Compile-time Error Detection**: MyPy can detect type errors at compile time

- **Method Chain Type Preservation**: All chained calls maintain specific generic types

- **100% Backward Compatibility**: Existing code works unchanged and automatically gains type hint enhancement

#### Comparison Effects

**Before:**

```python

users = result.scalars().all()  # IDE: List[PureBaseModel] 😞

user.name                       # IDE: doesn't know what attributes exist 😞

```

**Now:**

```python

users = result.scalars().all()  # IDE: List[User] ✅

user.name                       # IDE: knows this is str type ✅

user.age                        # IDE: knows this is int type ✅

```

### Data Persistence

Pytuck provides flexible data persistence mechanisms.

#### Pure Model Mode (Session)

```python

db = Storage(file_path='data.db')  # auto_flush=False (default)

# Data changes only in memory

session.execute(insert(User).values(name='Alice'))

session.commit()  # Commits to Storage memory

# Manually write to disk

db.flush()  # Method 1: Explicit flush

# or

db.close()  # Method 2: Auto-flush on close

```

Enable auto persistence:

```python

db = Storage(file_path='data.db', auto_flush=True)

# Each commit automatically writes to disk

session.execute(insert(User).values(name='Alice'))

session.commit()  # Automatically writes to disk, no manual flush needed

```

#### Active Record Mode (CRUDBaseModel)

CRUDBaseModel has no Session, operates directly on Storage:

```python

db = Storage(file_path='data.db')  # auto_flush=False (default)

Base = declarative_base(db, crud=True)

class User(Base):

    __tablename__ = 'users'

    id = Column('id', int, primary_key=True)

    name = Column('name', str)

# create/save/delete only modify memory

user = User.create(name='Alice')

user.name = 'Bob'

user.save()

# Manually write to disk

db.flush()  # Method 1: Explicit flush

# or

db.close()  # Method 2: Auto-flush on close

```

Enable auto persistence:

```python

db = Storage(file_path='data.db', auto_flush=True)

Base = declarative_base(db, crud=True)

# Each create/save/delete automatically writes to disk

user = User.create(name='Alice')  # Automatically writes to disk

user.name = 'Bob'

user.save()  # Automatically writes to disk

```

#### Persistence Method Summary

| Method | Mode | Description |

|--------|------|-------------|

| `session.commit()` | Pure Model | Commits transaction to Storage memory; if `auto_flush=True`, also writes to disk |

| `Model.create/save/delete()` | Active Record | Modifies Storage memory; if `auto_flush=True`, also writes to disk |

| `storage.flush()` | Both | Forces in-memory data to be written to disk |

| `storage.close()` | Both | Closes database, automatically calls `flush()` |

**Recommendations**:

- Use `auto_flush=True` in production for data safety

- Use default mode for batch operations, call `flush()` at the end for better performance

### Transaction Support

Pytuck supports memory-level transactions with automatic rollback on exceptions:

```python

# Session transaction (recommended)

with session.begin():

    session.add(User(name='Alice'))

    session.add(User(name='Bob'))

    # Auto-commit on success, auto-rollback on exception

# Storage-level transaction

with db.transaction():

    db.insert('users', {'name': 'Alice'})

    db.insert('users', {'name': 'Bob'})

    # Auto-rollback to pre-transaction state on exception

```

### Session Context Manager

Session supports context manager for automatic commit/rollback:

```python

with Session(db) as session:

    stmt = insert(User).values(name='Alice')

    session.execute(stmt)

    # Auto-commit on exit, auto-rollback on exception

```

### Auto-commit Mode

```python

session = Session(db, autocommit=True)

# Each operation auto-commits

session.add(User(name='Alice'))  # Auto-committed

```

### Object State Tracking

Session provides complete object state tracking:

```python

# Add single object

session.add(user)

# Batch add

session.add_all([user1, user2, user3])

# Flush to database (without committing transaction)

session.flush()

# Commit transaction

session.commit()

# Rollback transaction

session.rollback()

```

### Auto Flush

Enable `auto_flush` for automatic disk persistence on each write:

```python

db = Storage(file_path='data.db', auto_flush=True)

# Insert automatically writes to disk

stmt = insert(Student).values(name='Bob', age=21)

session.execute(stmt)

session.commit()

```

### Index Queries

Add indexes to fields to accelerate queries:

```python

class Student(Base):

    __tablename__ = 'students'

    name = Column('name', str, index=True)  # Create index

# Index query (automatically optimized)

stmt = select(Student).filter_by(name='Bob')

result = session.execute(stmt)

bob = result.scalars().first()

```

### Query Operators

Supported Pythonic query operators:

```python

# Equal

stmt = select(Student).where(Student.age == 20)

# Not equal

stmt = select(Student).where(Student.age != 20)

# Greater than / Greater than or equal

stmt = select(Student).where(Student.age > 18)

stmt = select(Student).where(Student.age >= 18)

# Less than / Less than or equal

stmt = select(Student).where(Student.age < 30)

stmt = select(Student).where(Student.age <= 30)

# IN query

stmt = select(Student).where(Student.age.in_([18, 19, 20]))

# Multiple conditions (AND)

stmt = select(Student).where(Student.age >= 18, Student.age < 30)

# Simple equality query (filter_by)

stmt = select(Student).filter_by(name='Alice', age=20)

```

### Sorting and Pagination

```python

# Sorting

stmt = select(Student).order_by('age')

stmt = select(Student).order_by('age', desc=True)

# Pagination

stmt = select(Student).limit(10)

stmt = select(Student).offset(10).limit(10)

# Count

stmt = select(Student).where(Student.age >= 18)

result = session.execute(stmt)

adults = result.scalars().all()

count = len(adults)

```

## Data Model Features

Pytuck's data models have unique characteristics that make them behave like both ORM and pure data containers.

### Independent Data Objects

Pytuck model instances are completely independent Python objects that are immediately materialized to memory after query:

- ✅ **Accessible After Session Close**: No DetachedInstanceError

- ✅ **Operable After Storage Close**: Loaded objects are completely independent

- ✅ **No Lazy Loading**: All direct attributes are loaded immediately

- ✅ **Serializable**: Supports JSON, Pickle, and other serialization formats

- ✅ **Usable as Data Containers**: Use like Pydantic models

```python

from pytuck import Storage, declarative_base, Session, Column, select

db = Storage(file_path='data.db')

Base = declarative_base(db)

class User(Base):

    __tablename__ = 'users'

    id = Column('id', int, primary_key=True)

    name = Column('name', str)

session = Session(db)

stmt = select(User).where(User.id == 1)

user = session.execute(stmt).scalars().first()

# Close session and storage

session.close()

db.close()

# Still accessible!

print(user.name)  # ✅ Works

print(user.to_dict())  # ✅ Works

```

**Comparison with SQLAlchemy**:

| Feature | Pytuck | SQLAlchemy |

|---------|--------|------------|

| Access after Session close | ✅ Supported | ❌ DetachedInstanceError |

| Lazy loading relationships | ✅ Supported (with cache) | ✅ Supported |

| Model as pure data container | ✅ Yes | ❌ No (bound to session) |

### Relationships

Pytuck supports one-to-many and many-to-one relationships with lazy loading and caching:

```python

from pytuck.core.orm import Relationship

# Define relationships

class User(Base):

    __tablename__ = 'users'

    id = Column('id', int, primary_key=True)

    name = Column('name', str)

    # One-to-many: one user has many orders

    orders = Relationship('Order', foreign_key='user_id')

class Order(Base):

    __tablename__ = 'orders'

    id = Column('id', int, primary_key=True)

    user_id = Column('user_id', int)

    amount = Column('amount', float)

    # Many-to-one: one order belongs to one user

    user = Relationship(User, foreign_key='user_id')

# Use relationships

user = User.get(1)

orders = user.orders  # Lazy loaded on first access

for order in orders:

    print(f"Order: {order.amount}")

# Reverse access

order = Order.get(1)

user = order.user  # Many-to-one query

print(f"User: {user.name}")

```

**Relationship Features**:

- ✅ **Lazy Loading**: Queries database only on first access

- ✅ **Auto Caching**: Caches results to avoid repeated queries

- ✅ **Bidirectional**: Supports back_populates parameter

- ✅ **After Storage Close**: Already loaded relationships remain accessible (uses cache)

- ⚠️ **Requires Eager Loading**: Access once before storage close to trigger loading

```python

# Eager loading strategy

user = User.get(1)

orders = user.orders  # Access before storage close to load and cache

db.close()

# Still accessible after close (uses cache)

for order in orders:

    print(order.amount)  # ✅ Works

```

### Type Validation and Conversion

Pytuck provides zero-dependency automatic type validation and conversion:

```python

class User(Base):

    __tablename__ = 'users'

    id = Column('id', int, primary_key=True)

    age = Column('age', int)  # Declared as int

# Loose mode (default): auto conversion

user = User(age='25')  # ✅ Automatically converts '25' → 25

# Strict mode: no conversion, raises error on type mismatch

class StrictUser(Base):

    __tablename__ = 'strict_users'

    id = Column('id', int, primary_key=True)

    age = Column('age', int, strict=True)  # Strict mode

user = StrictUser(age='25')  # ❌ ValidationError

```

**Type Conversion Rules (Loose Mode)**:

| Python Type | Conversion Rule | Example |

|------------|----------------|---------|

| int | int(value) | '123' → 123 |

| float | float(value) | '3.14' → 3.14 |

| str | str(value) | 123 → '123' |

| bool | Special rules* | '1', 'true', 1 → True |

| bytes | encode() if str | 'hello' → b'hello' |

| datetime | ISO 8601 parse | '2024-01-15T10:30:00' → datetime |

| date | ISO 8601 parse | '2024-01-15' → date |

| timedelta | Total seconds | 3600.0 → timedelta(hours=1) |

| list | JSON parse | '[1,2,3]' → [1, 2, 3] |

| dict | JSON parse | '{"a":1}' → {'a': 1} |

| None | Allowed if nullable=True | None → None |

*bool conversion rules:

- True: `True`, `1`, `'1'`, `'true'`, `'True'`, `'yes'`, `'Yes'`

- False: `False`, `0`, `'0'`, `'false'`, `'False'`, `'no'`, `'No'`, `''`

**Use Cases**:

```python

# Web API development: return directly after query, no connection concerns

@app.get("/users/{id}")

def get_user(id: int):

    session = Session(db)

    stmt = select(User).where(User.id == id)

    user = session.execute(stmt).scalars().first()

    session.close()

    # Return model, no concern about closed session

    return user.to_dict()

# Data transfer: model objects can be passed freely between functions

def process_users(users: List[User]) -> List[dict]:

    return [u.to_dict() for u in users]

# JSON serialization

import json

user_json = json.dumps(user.to_dict())

```

## Performance Benchmark

Here are v4 version benchmark results.

### Test Environment

- **System**: Windows 11, Python 3.12.10

- **Test Data**: 100,000 records

- **Mode**: Extended test (including index comparison, range queries, batch reads, lazy loading)

### Performance Comparison

| Engine | Insert | Indexed | Non-Indexed | Speedup | Range | Save | Load | Lazy | Size |

|--------|--------|---------|-------------|---------|-------|------|------|------|------|

| Binary | 794.57ms | 1.39ms | 7.13s | 5124x | 333.29ms | 869.68ms | 1.01s | 319.88ms | 11.73MB |

| JSON | 844.76ms | 1.42ms | 8.95s | 6279x | 337.01ms | 845.77ms | 319.37ms | - | 18.90MB |

| CSV | 838.89ms | 1.47ms | 7.24s | 4939x | 346.85ms | 453.50ms | 472.90ms | - | 731.9KB |

| SQLite | 879.05ms | 1.40ms | 7.21s | 5145x | 333.84ms | 325.80ms | 393.39ms | - | 6.97MB |

| Excel | 897.48ms | 1.41ms | 7.25s | 5150x | 340.40ms | 5.75s | 7.63s | - | 2.84MB |

| XML | 1.23s | 1.41ms | 7.41s | 5248x | 333.87ms | 2.49s | 2.03s | - | 34.54MB |

**Notes**:

- **Indexed**: 100 indexed field equality lookups (millisecond level)

- **Non-Indexed**: 100 non-indexed field full table scans (second level)

- **Speedup**: Index query vs non-indexed query speedup ratio

- **Range**: Range condition queries (e.g., `age >= 20 AND age < 62`)

- **Lazy**: Only Binary engine supports lazy loading (loads index only, not data)

### Engine Feature Comparison

| Engine | Query Perf | I/O Perf | Storage Eff | Human Readable | Dependencies | Recommended Use |

|--------|-----------|----------|-------------|----------------|--------------|-----------------|

| Binary | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ❌ | None | **Production First Choice** |

| JSON | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ✅ | None | Development, Config Storage |

| CSV | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ✅ | None | Data Exchange, Minimum Size |

| SQLite | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ❌ | None | SQL Needed, ACID Guarantee |

| Excel | ⭐⭐⭐⭐ | ⭐ | ⭐⭐⭐⭐ | ✅ | openpyxl | Visual Editing, Reports |

| XML | ⭐⭐⭐⭐ | ⭐⭐ | ⭐ | ✅ | lxml | Enterprise Integration |

**Conclusions**:

- **Binary** fastest insert (794ms), supports lazy loading and encryption, **production first choice**

- **JSON** fastest load (319ms), easy debugging, suitable for development and config storage

- **CSV** smallest file (732KB, ZIP compressed), excellent I/O, suitable for data exchange

- **SQLite** best I/O (save 325ms), well-balanced, suitable for ACID requirements

- **Excel** slower I/O (7.63s load), suitable for visual editing scenarios

- **XML** largest file (34.54MB), suitable for enterprise integration

## Installation Methods

### Install from PyPI

```bash

# Basic installation

pip install pytuck

# With specific extras

pip install pytuck[all]      # All optional engines

pip install pytuck[excel]    # Excel support only

pip install pytuck[xml]      # XML support only

pip install pytuck[dev]      # Development tools

```

### Install from Source

```bash

# Clone repository

git clone https://github.com/go9sky/pytuck.git

cd pytuck

# Editable install

pip install -e .

# With all extras

pip install -e .[all]

# Development mode

pip install -e .[dev]

```

### Build and Publish

```bash

# Install build tools

pip install build twine

# Build wheel and source distribution

python -m build

# Upload to PyPI

python -m twine upload dist/*

# Upload to Test PyPI

python -m twine upload --repository testpypi dist/*

```

## Data Migration

Migrate data between different engines:

```python

from pytuck.tools.migrate import migrate_engine

from pytuck.common.options import JsonBackendOptions

# Configure target engine options

json_opts = JsonBackendOptions(indent=2, ensure_ascii=False)

# Migrate from binary to JSON

migrate_engine(

    source_path='data.db',

    source_engine='binary',

    target_path='data.json',

    target_engine='json',

    target_options=json_opts  # Use strongly-typed options

)

```

## Architecture

```

┌─────────────────────────────────────┐

│       Application Layer             │

│   BaseModel, Column, Query API      │

└─────────────────────────────────────┘

               ↓

┌─────────────────────────────────────┐

│          ORM Layer (orm.py)         │

│   Model definitions, validation,    │

│   relationship mapping              │

└─────────────────────────────────────┘

               ↓

┌─────────────────────────────────────┐

│     Storage Layer (storage.py)      │

│   Table management, CRUD ops,       │

│   query execution                   │

└─────────────────────────────────────┘

               ↓

┌─────────────────────────────────────┐

│    Backend Layer (backends/)        │

│  BinaryBackend | JSONBackend | ...  │

└─────────────────────────────────────┘

               ↓

┌─────────────────────────────────────┐

│      Common Layer (common/)         │

│   Exceptions, Utils, Options        │

└─────────────────────────────────────┘

```

## Roadmap

### Completed

- Core ORM and in-memory storage

- Pluggable multi-engine persistence

- SQLAlchemy 2.0 style API

- Basic transaction support

## Current Limitations

Pytuck is a lightweight embedded database designed for simplicity. Here are the current limitations:

| Limitation | Description |

|------------|-------------|

| **No JOIN support** | Single table queries only, no multi-table joins |

| **No OR conditions** | Query conditions only support AND logic |

| **No aggregate functions** | No COUNT, SUM, AVG, MIN, MAX support |

| **No relationship loading** | No lazy loading or eager loading of related objects |

| **No migration tools** | Schema changes require manual handling |

| **Single writer** | No concurrent write support, suitable for single-process use |

| **Full rewrite on save** | Non-binary/SQLite backends rewrite entire file on each save |

| **No nested transactions** | Only single-level transactions supported |

## Roadmap / TODO

### Completed

- [x] **Extended Field Type Support** ✨NEW✨

  - [x] Added `datetime`, `date`, `timedelta`, `list`, `dict` five new types

  - [x] Unified TypeRegistry codec, all backends use consistent serialization interface

  - [x] JSON backend format optimization, removed redundant `_type`/`_value` wrapper

- [x] **Binary Engine v4 Format** ✨NEW✨

  - [x] WAL (Write-Ahead Log) for O(1) write latency

  - [x] Dual Header mechanism for atomic switching and crash recovery

  - [x] Index region zlib compression (saves ~81% space)

  - [x] Batch I/O and codec caching optimizations

  - [x] Three-tier encryption support (low/medium/high), pure Python implementation

- [x] **Primary Key Query Optimization** (affects ALL storage engines) ✨NEW✨

  - [x] `WHERE pk = value` queries use O(1) direct access

  - [x] Single update/delete performance improved ~1000x

- [x] Complete SQLAlchemy 2.0 Style Object State Management

  - [x] Identity Map (Object Uniqueness Management)

  - [x] Automatic Dirty Tracking (Attribute assignment auto-detected and updates database)

  - [x] merge() Operation (Merge detached objects)

  - [x] Query Instance Auto-Registration to Session

- [x] Unified database connector architecture (`pytuck/connectors/` module)

- [x] Data migration tools (`migrate_engine()`, `import_from_database()`)

- [x] Import from external relational databases feature

- [x] Unified engine version management (`pytuck/backends/versions.py`)

- [x] Table and column comment support (`comment` parameter)

- [x] Complete generic type hints system

- [x] Strongly-typed configuration options system (dataclass replaces **kwargs)

### Planned Features

> 📋 For detailed development plans, please refer to [TODO.md](./TODO.md)

- [ ] **Web UI Interface Support** - Provide API support for independent Web UI library

- [ ] **ORM Event Hooks System** - Complete event system based on SQLAlchemy event pattern

- [ ] **JOIN Support** - Multi-table relational queries

- [ ] **OR Condition Support** - Complex logical query conditions

- [ ] **Aggregate Functions** - COUNT, SUM, AVG, MIN, MAX, etc.

- [ ] **Relationship Lazy Loading** - Optimize associated data loading performance

- [ ] **Schema Migration Tools** - Database structure version management

- [ ] **Concurrent Access Support** - Multi-process/thread-safe access

### Planned Engines

- [ ] DuckDB - Analytical database engine

- [ ] TinyDB - Pure Python document database

- [ ] PyDbLite3 - Pure Python in-memory database

- [ ] diskcache - Disk-based cache engine

### Planned Optimizations

- [ ] Incremental save for non-binary backends (currently full rewrite on each save)

- [ ] Binary engine Compaction (space reclaim) mechanism

- [ ] Use `tempfile` module for safer temporary file handling

- [ ] Streaming read/write for large datasets

- [ ] Connection pooling for SQLite backend

- [ ] Relationship and lazy loading enhancements

## Examples

See the `examples/` directory for more examples:

- `sqlalchemy20_api_demo.py` - Complete SQLAlchemy 2.0 style API example (recommended)

- `all_engines_test.py` - All storage engine functionality tests

- `transaction_demo.py` - Transaction management example

- `type_validation_demo.py` - Type validation and conversion example

- `data_model_demo.py` - Data model independence features example

- `backend_options_demo.py` - Backend configuration options demo (new)

- `migration_tools_demo.py` - Data migration tools demo (new)

## Contributing

Issues and Pull Requests are welcome!

## License

MIT License - see [LICENSE](LICENSE) for details.

## Acknowledgments

Inspired by SQLAlchemy, Django ORM, and TinyDB.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/go9sky/pytuck

Awesome Lists containing this project

README