{"id":29343587,"url":"https://github.com/lancedb/pglance","last_synced_at":"2025-07-08T12:09:50.692Z","repository":{"id":298478497,"uuid":"1000100096","full_name":"lancedb/pglance","owner":"lancedb","description":"PostgreSQL Lance Table Extension","archived":false,"fork":false,"pushed_at":"2025-06-11T16:15:38.000Z","size":93,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-30T14:52:11.360Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lancedb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-11T09:07:12.000Z","updated_at":"2025-06-27T16:32:04.000Z","dependencies_parsed_at":"2025-06-11T10:38:33.021Z","dependency_job_id":"2971d0f0-ce2d-4310-b5fb-a595ebe7825c","html_url":"https://github.com/lancedb/pglance","commit_stats":null,"previous_names":["xuanwo/pglance","lancedb/pglance"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lancedb/pglance","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fpglance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fpglance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fpglance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fpglance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lancedb","download_url":"https://codeload.github.com/lancedb/pglance/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fpglance/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264267164,"owners_count":23581930,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-08T12:09:49.737Z","updated_at":"2025-07-08T12:09:50.684Z","avatar_url":"https://github.com/lancedb.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pglance - PostgreSQL Lance Table Extension\n\npglance is a PostgreSQL extension built with the [pgrx](https://github.com/pgcentralfoundation/pgrx) framework that implements full-table scanning functionality for directly reading and querying [Lance](https://lancedb.github.io/lance/) format tables within PostgreSQL.\n\nThis is the first open-source project to seamlessly integrate the modern columnar storage format Lance with PostgreSQL database.\n\n## 🎯 Project Goals\n\nBring Lance's high-performance columnar storage and vector search capabilities into the PostgreSQL ecosystem, providing users with:\n- Efficient large-scale data analytics capabilities\n- Native vector search support (planned)\n- Unified SQL interface for accessing Lance data\n\n## ✨ Core Features\n\n### Current Implementation\n- **🔍 Lance Table Scanning**: Complete table data reading and traversal\n- **📊 Schema Inspection**: Automatic parsing of Lance table structure and column types\n- **📈 Statistics**: Get table metadata including version, row count, column count\n- **🔄 Type Conversion**: Intelligent type mapping from Arrow/Lance to PostgreSQL\n- **📦 JSONB Output**: JSON serialization for complex data structures\n- **⚡ Async Processing**: Integration of async Lance APIs within sync PostgreSQL interface\n\n### Planned Features\n- **🎯 Vector Search**: KNN and ANN search support\n- **🔧 FDW Support**: Foreign Data Wrapper interface\n- **✏️ Write Operations**: INSERT/UPDATE/DELETE support\n- **🚀 Query Optimization**: Predicate pushdown and column projection optimization\n\n## 🛠️ Tech Stack\n\n| Component | Version | Description |\n|-----------|---------|-------------|\n| PostgreSQL | 13-17 | Support for all actively maintained versions |\n| Rust | 1.70+ | Modern systems programming language |\n| pgrx | 0.14.3 | PostgreSQL extension development framework |\n| Lance | 0.29 | Latest version of Lance storage engine |\n| Arrow | 55.1 | Latest version of Apache Arrow |\n\n## 🚀 Quick Start\n\n### Prerequisites\n\nInstall required tools:\n- **Rust** (latest stable) - https://rustup.rs/\n- **PostgreSQL** (13-17) with development headers\n- **Protocol Buffers compiler** (protoc)\n\n### Installation\n\n```bash\n# Clone the project\ngit clone \u003crepository-url\u003e\ncd pglance\n\n# Setup development environment\ncargo install cargo-pgrx --version=0.14.3 --locked\ncargo pgrx init\n\n# Build and install extension\ncargo pgrx install --features pg16\n\n# Enable extension in PostgreSQL\npsql -c \"CREATE EXTENSION pglance;\"\n```\n\n### Verify Installation\n\n```sql\n-- Test basic functionality\nSELECT hello_pglance();\n-- Should return: \"Hello, pglance\"\n```\n\n## 📖 Usage Guide\n\n### 🔍 Table Structure Exploration\n\n```sql\n-- View complete Lance table structure information\nSELECT\n    column_name,\n    data_type,\n    CASE WHEN nullable THEN 'YES' ELSE 'NO' END as is_nullable\nFROM lance_table_info('/path/to/your/lance/table')\nORDER BY column_name;\n```\n\n**Example Output:**\n```\n column_name | data_type | is_nullable\n-------------+-----------+-------------\n id          | int8      | NO\n embedding   | float4[]  | YES\n metadata    | jsonb     | YES\n name        | text      | YES\n```\n\n### 📊 Data Statistics Analysis\n\n```sql\n-- Get detailed table statistics\nSELECT\n    'Lance Table Version: ' || version as info,\n    'Total Rows: ' || num_rows as row_info,\n    'Total Columns: ' || num_columns as col_info\nFROM lance_table_stats('/path/to/your/lance/table');\n```\n\n### 📋 Data Content Viewing\n\n```sql\n-- View first 5 rows of data (recommended for large tables)\nSELECT\n    (row_data-\u003e\u003e'id')::bigint as id,\n    row_data-\u003e\u003e'name' as name,\n    jsonb_array_length(row_data-\u003e'embedding') as embedding_dim\nFROM lance_scan_jsonb('/path/to/your/lance/table', 5);\n\n-- Data quality statistics\nSELECT\n    COUNT(*) as total_rows,\n    COUNT(CASE WHEN row_data ? 'id' THEN 1 END) as has_id,\n    COUNT(CASE WHEN row_data ? 'embedding' THEN 1 END) as has_embedding\nFROM lance_scan_jsonb('/path/to/your/lance/table', 1000);\n```\n\n## 📚 API Reference\n\n### `hello_pglance()`\n\nReturns a simple greeting to verify extension installation.\n\n**Returns:** `TEXT` - \"Hello, pglance\"\n\n### `lance_table_info(table_path TEXT)`\n\nReturns Lance table structure information.\n\n**Parameters:**\n- `table_path`: File system path to the Lance table\n\n**Returns:**\n- `column_name`: Column name\n- `data_type`: PostgreSQL data type\n- `nullable`: Whether null values are allowed\n\n### `lance_table_stats(table_path TEXT)`\n\nReturns Lance table statistics.\n\n**Parameters:**\n- `table_path`: File system path to the Lance table\n\n**Returns:**\n- `version`: Lance table version\n- `num_rows`: Total number of rows\n- `num_columns`: Total number of columns\n\n### `lance_scan_jsonb(table_path TEXT, limit INTEGER DEFAULT NULL)`\n\nScans Lance table and returns data in JSONB format.\n\n**Parameters:**\n- `table_path`: File system path to the Lance table\n- `limit`: Limit number of rows returned (optional)\n\n**Returns:**\n- `row_data`: Row data in JSONB format\n\n## 🔄 Data Type Mapping\n\n| Arrow/Lance Type | PostgreSQL Type |\n|------------------|-----------------|\n| Boolean          | boolean         |\n| Int8             | char            |\n| Int16            | int2            |\n| Int32            | int4            |\n| Int64            | int8            |\n| Float32          | float4          |\n| Float64          | float8          |\n| Utf8/LargeUtf8   | text            |\n| Binary           | bytea           |\n| Date32/Date64    | date            |\n| Timestamp        | timestamp       |\n| List/Struct      | jsonb           |\n| FixedSizeList(float) | float4[]/float8[] |\n\n## 🛠️ Development\n\n### Quick Development Setup\n\n```bash\n# Setup development environment\ncargo install cargo-pgrx --version=0.14.3 --locked\ncargo pgrx init\n\n# Clone and setup project\ngit clone \u003crepository-url\u003e\ncd pglance\n\n# Run all quality checks\ncargo fmt --all -- --check\ncargo clippy --features pg16 -- -D warnings\ncargo test --features pg16\n\n# Build and install\ncargo pgrx install --features pg16\n\n# Start PostgreSQL with extension\ncargo pgrx run --features pg16\n```\n\n### Using Just Commands\n\nIf you have [just](https://github.com/casey/just) installed:\n\n```bash\n# Show all available commands\njust\n\n# Run all quality checks\njust check\n\n# Auto-format code\njust fmt\n\n# Build extension\njust build\n\n# Run tests\njust test\n\n# Start PostgreSQL with extension\njust run\n\n# Simulate CI locally\njust ci\n```\n\n### Supported PostgreSQL Versions\n\nSpecify PostgreSQL version for commands:\n```bash\ncargo pgrx install --features pg15  # PostgreSQL 15\ncargo pgrx install --features pg17  # PostgreSQL 17\n# Or with just:\njust build pg=15\njust test pg=17\n```\n\nSupported versions: 13, 14, 15, 16, 17 (default: 16)\n\nFor detailed development information, see [DEVELOPMENT.md](DEVELOPMENT.md).\n\n## 🧪 Testing\n\npglance uses a pure Rust testing approach with comprehensive unit and integration tests.\n\n```bash\n# Run all tests\ncargo test --features pg16\n# Or with just:\njust test\n```\n\nAll tests are written in Rust using the pgrx testing framework. For detailed testing information, see [TESTING.md](TESTING.md).\n\n## 🏗️ Architecture\n\n```\npglance/\n├── src/\n│   ├── lib.rs              # Main entry, PostgreSQL function definitions\n│   ├── types/              # Type conversion module\n│   │   ├── mod.rs          # Module exports\n│   │   ├── conversion.rs   # Arrow to PostgreSQL type mapping\n│   │   └── arrow_convert.rs # Arrow value conversion utilities\n│   └── scanner/            # Lance scanner implementation\n│       ├── mod.rs          # Module exports\n│       └── lance_scanner.rs # Lance table scanning logic\n├── sql/                    # SQL scripts (if any)\n├── .github/                # GitHub workflows\n│   └── workflows/\n│       ├── rust-checks.yml # CI/CD pipeline\n│       └── release.yml     # Release automation\n├── Cargo.toml             # Rust dependency configuration\n├── justfile               # Development commands\n├── pglance.control        # PostgreSQL extension metadata\n├── README.md              # This file\n├── DEVELOPMENT.md         # Development guide\n└── TESTING.md             # Testing guide\n```\n\n## ⚠️ Limitations and Notes\n\n1. **File Paths**: Currently requires full file system path to Lance tables\n2. **Permissions**: PostgreSQL process needs read permissions for Lance files\n3. **Memory Usage**: Large table scans may consume significant memory\n4. **Type Support**: Complex nested types are converted to JSONB\n5. **Concurrency**: Current implementation uses synchronous access\n\n## 🔮 Future Plans\n\n- [ ] Foreign Data Wrapper (FDW) support\n- [ ] Vector search functionality (KNN/ANN)\n- [ ] Write support (INSERT/UPDATE/DELETE)\n- [ ] Partitioned table support\n- [ ] Query pushdown optimization\n- [ ] Streaming scans for large datasets\n- [ ] Custom vector types\n- [ ] Index creation and management\n\n## 🤝 Contributing\n\nIssues and Pull Requests are welcome! Please see our development guidelines in [DEVELOPMENT.md](DEVELOPMENT.m\nd).\n\n## 📄 License\n\nApache License 2.0\n\n## 🔗 Related Projects\n\n- [Lance](https://github.com/lancedb/lance) - Modern columnar data\n format\n- [pgrx](https://github.com/pgcentralfoundation/pgrx) - PostgreSQL extension development framework\n- [Apache Arrow](https://arrow.apache.org/) - In-memory columnar data format\n- [LanceDB](https://lancedb.github.io/lancedb/) - Vector database built on Lance","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Fpglance","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flancedb%2Fpglance","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Fpglance/lists"}