{"id":30896757,"url":"https://github.com/dgtlss/parqbridge","last_synced_at":"2025-10-03T14:35:03.304Z","repository":{"id":309701898,"uuid":"1037243863","full_name":"dgtlss/parqbridge","owner":"dgtlss","description":"ParqBridge focuses on zero PHP dependency bloat while still producing spec-compliant Parquet files by delegating the final write step to a tiny, embedded Python script using PyArrow (or any custom CLI you prefer). You keep full Laravel DX for configuration and Storage; we bridge your data to Parquet.","archived":false,"fork":false,"pushed_at":"2025-08-21T12:34:59.000Z","size":45,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-06T11:55:54.346Z","etag":null,"topics":["laravel","laravel-framework","laravel-package","parquet","parquet-files","parquet-generator","parquet-schema","php","php8","powerbi","python"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dgtlss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-13T09:27:40.000Z","updated_at":"2025-08-21T12:33:54.000Z","dependencies_parsed_at":"2025-08-13T11:30:54.365Z","dependency_job_id":"dddfebec-b3f2-415c-9bbd-fcfad7c97ed6","html_url":"https://github.com/dgtlss/parqbridge","commit_stats":null,"previous_names":["dgtlss/parqbridge"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/dgtlss/parqbridge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgtlss%2Fparqbridge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgtlss%2Fparqbridge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgtlss%2Fparqbridge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgtlss%2Fparqbridge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dgtlss","download_url":"https://codeload.github.com/dgtlss/parqbridge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgtlss%2Fparqbridge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274231139,"owners_count":25245675,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["laravel","laravel-framework","laravel-package","parquet","parquet-files","parquet-generator","parquet-schema","php","php8","powerbi","python"],"created_at":"2025-09-08T23:45:03.521Z","updated_at":"2025-10-03T14:34:58.260Z","avatar_url":"https://github.com/dgtlss.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ParqBridge\n\nExport your Laravel database tables to real Apache Parquet files on any Storage disk (local, S3, etc.) with a simple artisan command.\n\nParqBridge focuses on zero PHP dependency bloat while still producing spec-compliant Parquet files by delegating the final write step to a tiny, embedded Python script using PyArrow (or any custom CLI you prefer). You keep full Laravel DX for configuration and Storage; we bridge your data to Parquet.\n\n## Installation\n\n- Require the package in your app (path repo or VCS):\n\n```bash\ncomposer require dgtlss/parqbridge\n```\n\n- Laravel will auto-discover the service provider. Alternatively, register `ParqBridge\\\\ParqBridgeServiceProvider` manually.\n\n- Publish the config if you want to customize defaults:\n\n```bash\nphp artisan vendor:publish --tag=\"parqbridge-config\"\n```\n\n## Configuration\n\nSet your export disk and options in `.env` or `config/parqbridge.php`.\n\n- `PARQUET_DISK`: which filesystem disk to use (e.g., `s3`, `local`).\n- `PARQUET_OUTPUT_DIR`: directory prefix within the disk (default `parquet-exports`).\n- `PARQUET_CHUNK_SIZE`: rows per DB chunk when exporting (default 1000).\n- `PARQUET_INFERENCE`: `database|sample|hybrid` (default `hybrid`).\n- `PARQUET_COMPRESSION`: compression codec for Parquet (`UNCOMPRESSED`/`NONE`, `SNAPPY`, `GZIP`, `ZSTD`, `BROTLI`, `LZ4_RAW`) when using PyArrow backend.\n- `PARQBRIDGE_WRITER`: `pyarrow` (default) or `custom`. If `custom`, set `PARQBRIDGE_CUSTOM_CMD`.\n- `PARQBRIDGE_PYTHON`: python executable for PyArrow (default `python3`).\n\nExample `.env`:\n\n```ini\nPARQUET_DISK=s3\nPARQUET_OUTPUT_DIR=parquet-exports\nPARQUET_CHUNK_SIZE=2000\n```\n\nEnsure your `filesystems` disk is configured (e.g., `s3`) in `config/filesystems.php`.\n\n### FTP disk configuration\n\nYou can export directly to an FTP server using Laravel's `ftp` disk. Add an FTP disk to `config/filesystems.php` and reference it via `PARQUET_DISK=ftp` or `--disk=ftp`.\n\n```php\n'disks' =\u003e [\n    'ftp' =\u003e [\n        'driver' =\u003e 'ftp',\n        'host' =\u003e env('FTP_HOST'),\n        'username' =\u003e env('FTP_USERNAME'),\n        'password' =\u003e env('FTP_PASSWORD'),\n\n        // Optional FTP settings\n        'port' =\u003e (int) env('FTP_PORT', 21),\n        'root' =\u003e env('FTP_ROOT', ''),\n        'passive' =\u003e filter_var(env('FTP_PASSIVE', true), FILTER_VALIDATE_BOOL),\n        'ssl' =\u003e filter_var(env('FTP_SSL', false), FILTER_VALIDATE_BOOL),\n        'timeout' =\u003e (int) env('FTP_TIMEOUT', 90),\n    ],\n],\n```\n\nNote: This package will coerce common FTP env values (e.g., `port`, `timeout`, `passive`, `ssl`) to the proper types before resolving the disk to avoid Flysystem type errors like \"Argument #5 ($port) must be of type int, string given\".\n\n## Usage\n\n- List tables:\n\n```bash\nphp artisan parqbridge:tables\n```\n\n- Export a table to the configured disk:\n\n```bash\nphp artisan parqbridge:export users --where=\"active = 1\" --limit=1000 --output=\"parquet-exports\" --disk=s3\n```\n\nOn success, the command prints the full path written within the disk. Files are named `{table}-{YYYYMMDD_HHMMSS}.parquet`.\n\n- Export ALL tables into one folder (timestamped subfolder inside `parqbridge.output_directory`):\n\n```bash\nphp artisan parqbridge:export-all --disk=s3 --output=\"parquet-exports\" --exclude=migrations,password_resets\n```\n\nOptions:\n- `--include=`: comma-separated allowlist of table names\n- `--exclude=`: comma-separated denylist of table names\n\n## Data types\n\nThe schema inferrer maps common DB types to a set of Parquet primitive types and logical annotations. With the PyArrow backend, an Arrow schema is constructed to faithfully write types:\n\n- Primitive: `BOOLEAN`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BYTE_ARRAY`, `FIXED_LEN_BYTE_ARRAY`\n- Logical: `UTF8`, `DATE`, `TIME_MILLIS`, `TIME_MICROS`, `TIMESTAMP_MILLIS`, `TIMESTAMP_MICROS`, `DECIMAL`\n\nFor decimals we write Arrow decimal types (`decimal128`/`decimal256`) with declared `precision`/`scale`.\n\n## Testing\n\nRun the test suite:\n\n```bash\ncomposer install\nvendor/bin/phpunit\n```\n\nThe tests bootstrap a minimal container, create a SQLite database, and verify:\n- listing tables works on SQLite\n- exporting a table writes a Parquet file to the configured disk (magic `PAR1`)\n- schema inference on SQLite maps major families\n\n## Backend requirements\n\n- By default ParqBridge uses Python + PyArrow. Ensure `python3` is available and install PyArrow:\n\n```bash\npython3 -m pip install --upgrade pip\npython3 -m pip install pyarrow\n```\n\n- Alternatively set a custom converter command via `PARQBRIDGE_WRITER=custom` and `PARQBRIDGE_CUSTOM_CMD` (must read `{input}` CSV and write `{output}` Parquet).\n\nYou can automate setup via the included command:\n\n```bash\nphp artisan parqbridge:setup --write-env\n```\n\nOptions:\n- `--python=`: path/name of Python (default from config `parqbridge.pyarrow_python`)\n- `--venv=`: location for virtualenv (default `./parqbridge-venv`)\n- `--no-venv`: install into global Python instead of a venv\n- `--write-env`: append `PARQBRIDGE_PYTHON` and `PARQBRIDGE_WRITER` to `.env`\n- `--upgrade`: upgrade pip first\n- `--dry-run`: print commands without executing\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgtlss%2Fparqbridge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdgtlss%2Fparqbridge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgtlss%2Fparqbridge/lists"}