{"id":13423599,"url":"https://github.com/cldellow/sqlite-parquet-vtable","last_synced_at":"2025-04-09T19:20:43.862Z","repository":{"id":38393621,"uuid":"123642098","full_name":"cldellow/sqlite-parquet-vtable","owner":"cldellow","description":"A SQLite vtable extension to read Parquet files","archived":false,"fork":false,"pushed_at":"2021-05-18T07:00:23.000Z","size":414,"stargazers_count":270,"open_issues_count":15,"forks_count":31,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-04-06T23:47:58.252Z","etag":null,"topics":["apache-arrow","apache-parquet","parquet","sqlite","sqlite3"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cldellow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-02T23:37:08.000Z","updated_at":"2025-02-12T06:16:20.000Z","dependencies_parsed_at":"2022-07-18T01:30:46.556Z","dependency_job_id":null,"html_url":"https://github.com/cldellow/sqlite-parquet-vtable","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cldellow%2Fsqlite-parquet-vtable","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cldellow%2Fsqlite-parquet-vtable/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cldellow%2Fsqlite-parquet-vtable/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cldellow%2Fsqlite-parquet-vtable/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cldellow","download_url":"https://codeload.github.com/cldellow/sqlite-parquet-vtable/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248095005,"owners_count":21046773,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-arrow","apache-parquet","parquet","sqlite","sqlite3"],"created_at":"2024-07-31T00:00:38.451Z","updated_at":"2025-04-09T19:20:43.834Z","avatar_url":"https://github.com/cldellow.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"# sqlite-parquet-vtable\n\n[![Build Status](https://travis-ci.org/cldellow/sqlite-parquet-vtable.svg?branch=master)](https://travis-ci.org/cldellow/sqlite-parquet-vtable)\n[![codecov](https://codecov.io/gh/cldellow/sqlite-parquet-vtable/branch/master/graph/badge.svg)](https://codecov.io/gh/cldellow/sqlite-parquet-vtable)\n\nA SQLite [virtual table](https://sqlite.org/vtab.html) extension to expose Parquet files as SQL tables. You may also find [csv2parquet](https://github.com/cldellow/csv2parquet/) useful.\n\nThis [blog post](https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html) provides some context on why you might use this.\n\n## Installing\n\n### Download\n\nYou can fetch a version built for Ubuntu 16.04 at https://s3.amazonaws.com/cldellow/public/libparquet/libparquet.so.xz\n\n### Building\n\n```\n./make-linux\n```\n\nThe first run will git clone a bunch of libraries, patch them to be statically linkable and build them.\n\nSubsequent builds will only build the parquet virtual table extension.\n\n### Building (release)\n\nRun `./make-linux-pgo` to build an instrumented binary, run tests to collect real-life usage samples, then build an optimized binary. PGO seems to give a 5-10% reduction in query times.\n\n### Tests\n\nRun:\n\n```\ntests/create-queries-from-templates\ntests/test-all\n```\n\n\n## Use\n\n```\n$ sqlite/sqlite3\nsqlite\u003e .load build/linux/libparquet\nsqlite\u003e CREATE VIRTUAL TABLE demo USING parquet('parquet-generator/99-rows-1.parquet');\nsqlite\u003e SELECT * FROM demo;\n...if all goes well, you'll see data here!...\n```\n\nNote: if you get an error like:\n\n```\nsqlite\u003e .load build/linux/libparquet\nError: parquet/libparquet.so: wrong ELF class: ELFCLASS64\n```\n\nYou have the 32-bit SQLite installed. To fix this, do:\n\n```\nsudo apt-get remove --purge sqlite3\nsudo apt-get install sqlite3:amd64\n```\n\n## Supported features\n\n### Row group filtering\n\nRow group filtering is supported for strings and numerics so long as the SQLite\ntype matches the Parquet type.\n\ne.g. if you have a column `foo` that is an INT32, this query will skip row groups whose\nstatistics prove that it does not contain relevant rows:\n\n```\nSELECT * FROM tbl WHERE foo = 123;\n```\n\nbut this query will devolve to a table scan:\n\n```\nSELECT * FROM tbl WHERE foo = '123';\n```\n\nThis is laziness on my part and could be fixed without too much effort.\n\n### Row filtering\n\nFor common constraints, the row is checked to see if it satisfies the query's\nconstraints before returning control to SQLite's virtual machine. This minimizes\nthe number of allocations performed when many rows are filtered out by\nthe user's criteria.\n\n### Memoized slices\n\nIndividual clauses are mapped to the row groups they match.\n\neg going on row group statistics, which store minimum and maximum values, a clause\nlike `WHERE city = 'Dawson Creek'` may match 80% of row groups.\n\nIn reality, it may only be present in one or two row groups.\n\nThis is recorded in a shadow table so future queries that contain that clause\ncan read only the necessary row groups.\n\n### Types\n\nThese Parquet types are supported:\n\n* INT96 timestamps (exposed as milliseconds since the epoch)\n* INT8/INT16/INT32/INT64\n* UTF8 strings\n* BOOLEAN\n* FLOAT\n* DOUBLE\n* Variable- and fixed-length byte arrays\n\nThese are not currently supported:\n\n* UINT8/UINT16/UINT32/UINT64\n* DECIMAL\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcldellow%2Fsqlite-parquet-vtable","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcldellow%2Fsqlite-parquet-vtable","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcldellow%2Fsqlite-parquet-vtable/lists"}