{"id":34033074,"url":"https://github.com/thomazyujibaba/ibarrow","last_synced_at":"2026-03-17T20:37:23.118Z","repository":{"id":316300606,"uuid":"1062781881","full_name":"thomazyujibaba/ibarrow","owner":"thomazyujibaba","description":"Connect legacy ODBC databases (e.g., InterBase, Firebird) to modern Arrow/Pandas/Polars workflows via Rust and Python.","archived":false,"fork":false,"pushed_at":"2025-11-24T07:43:53.000Z","size":76,"stargazers_count":0,"open_issues_count":9,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-27T18:28:11.170Z","etag":null,"topics":["apache","arrow","interbase","odbc"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomazyujibaba.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-23T18:07:51.000Z","updated_at":"2025-09-24T01:56:40.000Z","dependencies_parsed_at":"2025-09-23T21:12:38.166Z","dependency_job_id":"e12a55be-9bce-4374-b9e0-3d5ee78da488","html_url":"https://github.com/thomazyujibaba/ibarrow","commit_stats":null,"previous_names":["thomazyujibaba/ibarrow"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/thomazyujibaba/ibarrow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomazyujibaba%2Fibarrow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomazyujibaba%2Fibarrow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomazyujibaba%2Fibarrow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomazyujibaba%2Fibarrow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomazyujibaba","download_url":"https://codeload.github.com/thomazyujibaba/ibarrow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomazyujibaba%2Fibarrow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30631403,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T17:32:55.572Z","status":"ssl_error","status_checked_at":"2026-03-17T17:32:38.732Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache","arrow","interbase","odbc"],"created_at":"2025-12-13T19:01:41.407Z","updated_at":"2026-03-17T20:37:23.112Z","avatar_url":"https://github.com/thomazyujibaba.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ibarrow\n\nHigh-performance ODBC to Arrow data conversion for Python, built with Rust.\n\n## Features\n\n- 🚀 **High Performance**: Built with Rust for maximum speed\n- 🔄 **ODBC Integration**: Direct connection to any ODBC-compatible database\n- 📊 **Arrow Format**: Native Apache Arrow support for efficient data processing\n- 🐼 **Pandas/Polars Ready**: Seamless integration with popular Python data libraries\n- 🛡️ **Type Safe**: Rust-powered reliability with Python convenience\n- 🎯 **Two-Level API**: Simple wrappers for common use + raw functions for advanced control\n\n## Installation\n\n```bash\npip install ibarrow\n```\n\n## Repository\n\n- **GitHub**: https://github.com/thomazyujibaba/ibarrow\n- **PyPI**: https://pypi.org/project/ibarrow/\n- **Documentation**: https://github.com/thomazyujibaba/ibarrow#readme\n\n## Prerequisites\n\n**Important**: You need an ODBC driver installed on your system for ibarrow to work.\n\n### Windows\n- **SQL Server**: [ODBC Driver for SQL Server](https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server)\n- **PostgreSQL**: [psqlODBC](https://www.postgresql.org/ftp/odbc/versions/)\n- **MySQL**: [MySQL Connector/ODBC](https://dev.mysql.com/downloads/connector/odbc/)\n- **Oracle**: [Oracle Instant Client + ODBC](https://www.oracle.com/database/technologies/instant-client/winx64-64-downloads.html)\n\n### Linux\n- **SQL Server**: [Microsoft ODBC Driver for SQL Server on Linux](https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server)\n- **PostgreSQL**: `sudo apt-get install odbc-postgresql` (Ubuntu/Debian) or `sudo yum install postgresql-odbc` (RHEL/CentOS)\n- **MySQL**: `sudo apt-get install libmyodbc` (Ubuntu/Debian) or `sudo yum install mysql-connector-odbc` (RHEL/CentOS)\n\n### macOS\n- **Note**: macOS support is currently not available. Please use Windows or Linux for now.\n\n### Verify ODBC Installation\n\nYou can verify your ODBC drivers are installed by checking the system:\n\n**Windows:**\n```cmd\n# Check installed drivers\nodbcad32.exe\n```\n\n**Linux/macOS:**\n```bash\n# List available drivers\nodbcinst -q -d\n```\n\n## API Architecture\n\nibarrow provides a **two-level API** designed for different user needs:\n\n### 🎯 **High-Level API (Recommended for 95% of users)**\n- **`query_polars()`**: Direct Polars DataFrame (zero-copy, fastest)\n- **`query_pandas()`**: Direct Pandas DataFrame (maximum compatibility)\n\n### 🔧 **Low-Level API (For advanced users)**\n- **`query_arrow_ipc()`**: Raw Arrow IPC bytes (maximum compatibility)\n- **`query_arrow_c_data()`**: Raw Arrow C Data Interface (maximum performance)\n\n### 📋 **When to Use Each Level**\n\n| User Type | Recommended Function | Use Case |\n|-----------|---------------------|----------|\n| **Beginners** | `query_polars()` | 95% of cases - simple and fast |\n| **Pandas Users** | `query_pandas()` | When you need Pandas compatibility |\n| **Advanced Users** | `query_arrow_ipc()` | When you need raw Arrow data |\n| **Performance Critical** | `query_arrow_c_data()` | When you need maximum control |\n\n## Quick Start\n\n### 🔗 **Connection Methods**\n\nibarrow supports two ways to connect to databases:\n\n#### **Method 1: DSN (Data Source Name)**\n```python\n# Requires pre-configured DSN in ODBC Data Sources\nconn = ibarrow.connect(\n    dsn=\"my_database_dsn\",\n    user=\"username\", \n    password=\"password\"\n)\n```\n\n#### **Method 2: Direct Connection String (Recommended)**\n```python\n# Direct connection like pyodbc - no DSN configuration needed\nconn = ibarrow.connect(\n    dsn=\"DRIVER={SQL Server};SERVER=localhost;DATABASE=mydb;\",\n    user=\"username\",\n    password=\"password\"\n)\n```\n\n### 🚀 **Recommended Usage (95% of cases)**\n\n```python\nimport ibarrow\n\n# Option 1: Using DSN (Data Source Name)\nconn = ibarrow.connect(\n    dsn=\"your_dsn\",\n    user=\"username\",\n    password=\"password\"\n)\n\n# Option 2: Using direct connection string (like pyodbc)\nconn = ibarrow.connect(\n    dsn=\"DRIVER={SQL Server};SERVER=localhost;DATABASE=mydb;\",\n    user=\"username\",\n    password=\"password\"\n)\n\n# Query and get Polars DataFrame\ndf = conn.query_polars(\"SELECT * FROM your_table\")\n\nprint(df)\n```\n\n### With Custom Batch Size\n\n```python\nimport ibarrow\n\n# Create config with custom batch size\nconfig = ibarrow.QueryConfig(batch_size=2000)\n\n# Create connection with configuration\nconn = ibarrow.connect(\n    dsn=\"your_dsn\",\n    user=\"username\",\n    password=\"password\",\n    config=config\n)\n\n# Query with custom batch size\narrow_bytes = conn.query_arrow_ipc(\"SELECT * FROM your_table\")\n```\n\n### Advanced Configuration\n\n```python\nimport ibarrow\n\n# Create custom configuration\nconfig = ibarrow.QueryConfig(\n    batch_size=2000,           # Rows per batch\n    read_only=True,            # Read-only connection\n    connection_timeout=30,      # Connection timeout in seconds\n    query_timeout=60,          # Query timeout in seconds\n    max_text_size=32768,       # Max text field size\n    max_binary_size=16384,     # Max binary field size\n    isolation_level=\"READ_COMMITTED\"  # Transaction isolation\n)\n\n# Create connection with configuration\nconn = ibarrow.connect(\n    dsn=\"your_dsn\",\n    user=\"username\",\n    password=\"password\",\n    config=config\n)\n\n# Use the connection\narrow_bytes = conn.query_arrow_ipc(\"SELECT * FROM your_table\")\n```\n\n### Direct DataFrame Integration\n\n```python\nimport ibarrow\n\n# Direct conversion to Polars DataFrame (uses pl.read_ipc internally)\n# Create connection\nconn = ibarrow.connect(\n    dsn=\"your_dsn\",\n    user=\"username\",\n    password=\"password\"\n)\n\n# Get Polars DataFrame\ndf_polars = conn.query_polars(\"SELECT * FROM your_table\")\n\n# Get Pandas DataFrame\ndf_pandas = conn.query_pandas(\"SELECT * FROM your_table\")\n\nprint(df_polars)\nprint(df_pandas)\n```\n\n### ⚡ Zero-Copy Performance (Arrow C Data Interface)\n\nFor maximum performance, use the Arrow C Data Interface functions that completely eliminate serialization:\n\n```python\nimport ibarrow\nimport polars as pl\nimport pyarrow as pa\n\n# Zero-copy conversion to Polars DataFrame (fastest)\n# Create connection\nconn = ibarrow.connect(\n    dsn=\"your_dsn\",\n    user=\"username\",\n    password=\"password\"\n)\n\n# Get Polars DataFrame directly\ndf_polars = conn.query_arrow_c_data(\"SELECT * FROM your_table\", return_dataframe=True)\n\n# Or get raw PyCapsules for manual control\nschema_capsule, array_capsule = conn.query_arrow_c_data(\"SELECT * FROM your_table\")\n\n# Convert to PyArrow Table using zero-copy\nschema = pa.Schema._import_from_c(schema_capsule)\narray = pa.Array._import_from_c(array_capsule)\ntable = pa.Table.from_arrays([array], schema=schema)\n\n# Convert to Polars\ndf = pl.from_arrow(table)\n```\n\n**Arrow C Data Interface Benefits:**\n- 🚀 **Zero serialization**: Data passes directly via pointers\n- 💾 **Zero copies**: Eliminates memory overhead\n- ⚡ **Maximum speed**: Ideal for large datasets\n- 🔄 **Compatibility**: Works with PyArrow, Polars, Pandas\n\n### Manual Arrow IPC Usage\n\n```python\nimport ibarrow\nimport polars as pl\n\n# Get raw Arrow IPC bytes\n# Create connection\nconn = ibarrow.connect(\n    dsn=\"your_dsn\",\n    user=\"username\",\n    password=\"password\"\n)\n\n# Get Arrow IPC bytes\narrow_bytes = conn.query_arrow_ipc(\"SELECT * FROM your_table\")\n\n# Convert to Polars DataFrame manually\ndf = pl.read_ipc(arrow_bytes)\nprint(df)\n```\n\n## API Reference\n\n### `ibarrow.connect(dsn, user, password, config=None)`\n\nCreates a connection object for database operations.\n\n**Parameters:**\n- `dsn` (str): ODBC Data Source Name or full connection string\n  - **DSN format**: `\"your_dsn\"` (requires pre-configured DSN)\n  - **Connection string format**: `\"DRIVER={SQL Server};SERVER=localhost;DATABASE=mydb;\"` (direct connection)\n- `user` (str): Database username\n- `password` (str): Database password\n- `config` (QueryConfig, optional): Configuration object\n\n**Returns:** `IbarrowConnection` object\n\n**Connection String Examples:**\n```python\n# SQL Server\ndsn = \"DRIVER={SQL Server};SERVER=localhost;DATABASE=mydb;\"\n\n# PostgreSQL\ndsn = \"DRIVER={PostgreSQL};SERVER=localhost;PORT=5432;DATABASE=mydb;\"\n\n# MySQL\ndsn = \"DRIVER={MySQL ODBC 8.0 Driver};SERVER=localhost;PORT=3306;DATABASE=mydb;\"\n\n# Oracle\ndsn = \"DRIVER={Oracle in OraClient19Home1};DBQ=localhost:1521/XE;\"\n```\n\n### `query_arrow_ipc(sql)`\n\nExecute a SQL query and return Arrow IPC bytes.\n\n**Parameters:**\n- `sql` (str): SQL query to execute\n\n**Returns:** `bytes` - Arrow IPC format data\n\n**Raises:**\n- `PyConnectionError`: Database connection issues\n- `PySQLError`: SQL syntax or execution errors\n- `PyArrowError`: Arrow data processing errors\n\n### `conn.query_polars(sql)`\n\nExecute a SQL query and return a Polars DataFrame directly.\n\n**Parameters:** Same as `query_arrow_ipc`\n\n**Returns:** `polars.DataFrame` - Ready-to-use DataFrame\n\n**Note:** Uses `pl.read_ipc()` directly with bytes for optimal performance.\n\n### `query_pandas(sql)`\n\nExecute a SQL query and return a Pandas DataFrame directly.\n\n**Parameters:** Same as `query_arrow_ipc`\n\n**Returns:** `pandas.DataFrame` - Ready-to-use DataFrame\n\n**Note:** Converts Arrow IPC to Pandas via PyArrow for compatibility.\n\n### `QueryConfig`\n\nConfiguration class for advanced query settings.\n\n**Parameters:**\n- `batch_size` (int, optional): Number of rows per batch for processing (default: 1000)\n- `read_only` (bool, optional): Read-only connection to avoid locks (default: True)\n- `connection_timeout` (int, optional): Connection timeout in seconds\n- `query_timeout` (int, optional): Query timeout in seconds\n- `max_text_size` (int, optional): Maximum text field size in bytes (default: 65536)\n- `max_binary_size` (int, optional): Maximum binary field size in bytes (default: 65536)\n- `isolation_level` (str, optional): Transaction isolation level. Supported values: \"read_uncommitted\", \"read_committed\", \"repeatable_read\", \"serializable\", \"snapshot\"\n\n### Configuration Benefits\n\n- **`batch_size`**: Controls memory usage and performance. Larger batches = more memory but faster processing\n- **`read_only`**: Prevents locks and improves performance for read-only operations (effective only if ODBC driver supports this flag)\n- **`connection_timeout`**: Protects against hanging connections\n- **`query_timeout`**: Prevents long-running queries from blocking\n- **`max_text_size`**: Handles large text fields (VARCHAR, TEXT) efficiently\n- **`max_binary_size`**: Handles large binary data (BLOB, VARBINARY) efficiently\n- **`isolation_level`**: Controls transaction isolation for concurrent access\n\n### Implementation Notes\n\n- **`read_only`**: Currently implemented via ODBC connection string (`ReadOnly=1`). \n- **`batch_size`**: Controls how many rows are fetched per batch from the database, avoiding row-by-row fetching for better performance.\n- **`query_timeout`**: Implemented via statement handle using `stmt.set_query_timeout()`, which is more reliable than connection string timeouts.\n- **`isolation_level`**: Standardized mapping from common names (e.g., \"read_committed\") to driver-specific ODBC connection string values (e.g., \"Isolation Level=ReadCommitted\").\n- **`query_polars`**: Uses Arrow IPC stream with `pl.read_ipc()` for maximum compatibility and performance.\n- **Native Types**: Always preserves ODBC native types (INT, DECIMAL, FLOAT) as Arrow native types (Int64Array, Float64Array), avoiding expensive string conversions for maximum performance.\n- **Pipelining**: Always processes data in streaming fashion, writing each batch immediately as it's fetched. This keeps memory usage constant (e.g., 10MB) regardless of dataset size (even 80GB+).\n\n## Performance Comparison\n\n### Serialization vs Zero-Copy\n\n| Method | Level | Serialization | Memory Copies | Performance | Ideal Use |\n|--------|-------|-------------|---------------|-------------|-----------|\n| **`query_polars`** | **High** | Arrow IPC Stream | 1x (serialization) | ⭐⭐⭐⭐ | **95% of cases (recommended)** |\n| **`query_pandas`** | **High** | Arrow IPC Stream | 1x (serialization) | ⭐⭐⭐ | Pandas compatibility |\n| `query_arrow_ipc` | Low | Arrow IPC Stream | 1x (serialization) | ⭐⭐⭐ | Maximum compatibility |\n| `query_arrow_c_data` | Low | **Zero** | **Zero** | **⭐⭐⭐⭐⭐** | Maximum performance |\n\n### Typical Benchmarks (1M rows)\n\n```\nquery_polars:         ~200ms  (Arrow IPC + polars.read_ipc) ⭐ RECOMMENDED\nquery_pandas:         ~600ms  (Arrow IPC + pyarrow + pandas)\nquery_arrow_ipc:      ~500ms  (Arrow IPC serialization)\nquery_arrow_c_data:   ~50ms   (zero-copy via pointers)\n```\n\n### 🚀 **Built-in Performance Optimizations**\n\n**Native Types (Always Enabled):**\n```\n- INT columns → Int64Array (native Arrow types)\n- DECIMAL columns → DecimalArray (native Arrow types)  \n- FLOAT columns → Float64Array (native Arrow types)\n- Performance: ~200ms for 1M numeric rows (Arrow IPC)\n```\n\n**Pipelining (Always Enabled):**\n```\n- Memory usage: Constant (~10MB) regardless of dataset size\n- Processing: Streaming (fetch + write immediately)\n- Latency: Lower (Python can start consuming data before completion)\n- Example: 80GB dataset uses only ~10MB RAM\n```\n\n### When to Use Each Method\n\n#### 🎯 **High-Level API (Recommended)**\n- **`query_polars()`**: **95% of cases** - Simple, fast, zero-copy\n- **`query_pandas()`**: When you need Pandas compatibility\n\n#### 🔧 **Low-Level API (Advanced)**\n- **`query_arrow_ipc()`**: Maximum compatibility, save to disk\n- **`query_arrow_c_data()`**: Maximum performance, full control over data\n\n## Error Handling\n\nThe library provides specific exception types for different error scenarios:\n\n```python\nimport ibarrow\n\ntry:\n    # Create connection\n    conn = ibarrow.connect(dsn, user, password)\n    \n    # Query with batch size\n    df = conn.query_polars(sql)\nexcept ibarrow.PyConnectionError as e:\n    print(f\"Connection failed: {e}\")\nexcept ibarrow.PySQLError as e:\n    print(f\"SQL error: {e}\")\nexcept ibarrow.PyArrowError as e:\n    print(f\"Arrow processing error: {e}\")\n```\n\n## Requirements\n\n- Python 3.8+\n- ODBC driver for your database\n- Rust toolchain (for development)\n\n## Development\n\n### Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/thomazyujibaba/ibarrow.git\ncd ibarrow\n\n# Install maturin\npip install maturin[patchelf]\n\n# Install in development mode\nmaturin develop\n```\n\n### Running Tests\n\n```bash\n# Install test dependencies\npip install pytest pytest-cov\n\n# Run tests\npytest tests/ -v\n```\n\n### Building\n\n```bash\n# Build wheel\nmaturin build --release\n\n# Build and install\nmaturin develop\n```\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Troubleshooting\n\n### Common ODBC Issues\n\n**\"Driver not found\" errors:**\n- Ensure the ODBC driver is properly installed\n- Check that the driver name in your DSN matches exactly\n- Verify the driver architecture (32-bit vs 64-bit) matches your Python installation\n\n**Connection timeout errors:**\n- Check network connectivity to the database server\n- Verify firewall settings\n- Ensure the database server is running and accessible\n\n**Permission errors:**\n- Verify database credentials\n- Check user permissions on the database\n- Ensure the ODBC driver has necessary privileges\n\n**Performance issues:**\n- Adjust `batch_size` in `QueryConfig` for optimal memory usage\n- Use `read_only=True` for read-only operations\n- Consider connection pooling for high-frequency queries\n\n## Support\n\nFor issues and questions, please use the [GitHub Issues](https://github.com/thomazyujibaba/ibarrow/issues) page.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomazyujibaba%2Fibarrow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomazyujibaba%2Fibarrow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomazyujibaba%2Fibarrow/lists"}