{"id":47821574,"url":"https://github.com/moriyoshi/papera","last_synced_at":"2026-04-03T19:10:59.527Z","repository":{"id":347843230,"uuid":"1195314275","full_name":"moriyoshi/papera","owner":"moriyoshi","description":"Papera is a SQL transpiler that can translate Redshift / Trino dialects to DuckDB.","archived":false,"fork":false,"pushed_at":"2026-03-29T17:57:24.000Z","size":155,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-29T19:38:43.442Z","etag":null,"topics":["aws","duckdb","emulation","glue","hive","iceberg","redshift","sql","transpiler","trino"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/moriyoshi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-03-29T14:17:39.000Z","updated_at":"2026-03-29T14:39:43.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/moriyoshi/papera","commit_stats":null,"previous_names":["moriyoshi/papera"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/moriyoshi/papera","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moriyoshi%2Fpapera","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moriyoshi%2Fpapera/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moriyoshi%2Fpapera/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moriyoshi%2Fpapera/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/moriyoshi","download_url":"https://codeload.github.com/moriyoshi/papera/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moriyoshi%2Fpapera/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31372199,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T17:53:18.093Z","status":"ssl_error","status_checked_at":"2026-04-03T17:53:17.617Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","duckdb","emulation","glue","hive","iceberg","redshift","sql","transpiler","trino"],"created_at":"2026-04-03T19:10:58.780Z","updated_at":"2026-04-03T19:10:59.518Z","avatar_url":"https://github.com/moriyoshi.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Papera\n\nA SQL compatibility layer that transpiles Trino, Redshift, and Hive SQL to target-specific analytical SQL.\n\npapera parses source SQL using [sqlparser-rs](https://github.com/apache/datafusion-sqlparser-rs), applies dialect-specific AST transformations, and emits SQL for the selected target dialect. DuckDB remains the default and most fully supported target. The library API also exposes `TargetDialect::DataFusion` for callers that need DataFusion-compatible output, while the current CLI remains DuckDB-targeted.\n\n## Installation\n\n### As a Library\n\nAdd to your `Cargo.toml`:\n\n```toml\n[dependencies]\npapera = \"0.1\"\n```\n\n### CLI\n\nThe CLI binary is feature-gated and not built by default:\n\n```sh\ncargo build --features cli\n# or install globally\ncargo install --path . --features cli\n```\n\n## Usage\n\n### CLI\n\n```sh\n# Pipe SQL through papera\necho \"SELECT NVL(a, b) FROM t\" | papera redshift\n# Output: SELECT coalesce(a, b) FROM t\n\necho \"SELECT approx_distinct(col) FROM t\" | papera trino\n# Output: SELECT approx_count_distinct(col) FROM t\n```\n\n```\nUsage: papera \u003ctrino|redshift|hive\u003e\n  Reads SQL from stdin and writes DuckDB-compatible SQL to stdout.\n```\n\n### Library\n\n```rust\nuse papera::{\n    transpile, transpile_with_options, SourceDialect, TargetDialect,\n    TranspileOptions, ExternalTableBehavior, IcebergTableBehavior,\n    CopyBehavior,\n};\n\n// Simple usage\nlet sql = \"SELECT NVL(a, b) FROM t\";\nlet result = transpile(sql, SourceDialect::Redshift).unwrap();\nassert_eq!(result, \"SELECT coalesce(a, b) FROM t\");\n\n// With options (e.g., convert external tables to views)\nlet sql = \"CREATE EXTERNAL TABLE t (a INT) STORED AS PARQUET LOCATION 's3://bucket/path'\";\nlet opts = TranspileOptions {\n    external_table: ExternalTableBehavior::MapToView,\n    ..Default::default()\n};\nlet result = transpile_with_options(sql, SourceDialect::Redshift, \u0026opts).unwrap();\n// Output: CREATE VIEW t (a) AS SELECT * FROM read_parquet('s3://bucket/path')\n\n// Migration mode: opt into all conversions\nlet opts = TranspileOptions {\n    external_table: ExternalTableBehavior::MapToView,\n    iceberg_table: IcebergTableBehavior::MapToView,\n    copy: CopyBehavior::MapToInsert,\n    ..Default::default()\n};\n\n// Select a non-default output target through the library API\nlet opts = TranspileOptions {\n    target: TargetDialect::DataFusion,\n    ..Default::default()\n};\nlet result = transpile_with_options(\"SELECT split(name, ',') FROM t\", SourceDialect::Trino, \u0026opts).unwrap();\n// Output: SELECT string_to_array(name, ',') FROM t\n```\n\nCustom SerDe class mappings for classes not covered by the built-in resolver:\n\n```rust\nuse papera::{SerdeClassResolver, TranspileOptions, ExternalTableBehavior};\n\nlet opts = TranspileOptions {\n    external_table: ExternalTableBehavior::MapToView,\n    serde_class_resolver: Some(SerdeClassResolver::new(|class| {\n        match class {\n            c if c.eq_ignore_ascii_case(\"com.example.MyParquetSerde\") =\u003e Some(\"read_parquet\".to_string()),\n            c if c.eq_ignore_ascii_case(\"com.example.MyJsonSerde\")    =\u003e Some(\"read_json\".to_string()),\n            _ =\u003e None, // fall through to built-in logic\n        }\n    })),\n    ..Default::default()\n};\n```\n\n### Multi-Statement SQL\n\nBoth `transpile` and `transpile_with_options` accept multi-statement SQL. Statements are parsed together and emitted joined with `;\\n`:\n\n```rust\nlet script = \"SELECT NVL(a, b) FROM t; SELECT GETDATE()\";\nlet result = transpile(script, SourceDialect::Redshift).unwrap();\n// result: \"SELECT coalesce(a, b) FROM t;\\nSELECT current_timestamp()\"\n```\n\nSee `cargo run --example multi_statement` for a full ETL script example.\n\n### Error Handling\n\n`transpile` and `transpile_with_options` return `papera::Result\u003cString\u003e`, where `papera::Error` has two variants:\n\n```rust\npub enum Error {\n    /// The source SQL could not be parsed by sqlparser-rs.\n    Parse(sqlparser::parser::ParserError),\n    /// The SQL uses a feature that cannot be transpiled for the configured target\n    /// (e.g., an unsupported type or a conversion that was not opted into).\n    Unsupported(String),\n}\n```\n\n`Error::Unsupported` is returned for:\n- `CREATE EXTERNAL TABLE` when `external_table` is `Error` (the default)\n- Iceberg tables when `iceberg_table` is `Error` (the default)\n- `COPY FROM` when `copy` is `Error` (the default)\n- Types with no DuckDB equivalent ( `HLLSKETCH`, `GEOMETRY` )\n\nSee `examples/` for more complete usage patterns (`cargo run --example basic`, `cargo run --example migration`, `cargo run --example serde_resolver`).\n\n## Feature Coverage\n\nUnless noted otherwise, the compatibility tables in this section document the default DuckDB target. DataFusion support exists through `TranspileOptions::target`, but it is narrower and should be treated as a separate compatibility path.\n\n### Supported Dialects\n\n| Dialect | Parser | Notes |\n|---------|--------|-------|\n| Trino | `GenericDialect` | Also handles Hive-style DDL (STORED AS, TBLPROPERTIES, etc.) |\n| Redshift | `RedshiftSqlDialect` | Includes Redshift Spectrum external tables |\n| Hive | `HiveDialect` | ROW FORMAT, SERDE, PARTITIONED BY support |\n\n### Function Mappings: Trino to DuckDB\n\n#### Aggregate\n\n| Trino | DuckDB | Notes |\n|-------|--------|-------|\n| `approx_distinct(x)` | `approx_count_distinct(x)` | |\n| `arbitrary(x)` | `any_value(x)` | |\n| `approx_percentile(x, p)` | `approx_quantile(x, p)` | |\n| `map_agg(k, v)` | `map(list(k), list(v))` | |\n\n#### Date / Time\n\n| Trino | DuckDB | Notes |\n|-------|--------|-------|\n| `date_parse(s, fmt)` | `strptime(s, fmt)` | Java format strings converted |\n| `format_datetime(ts, fmt)` | `strftime(ts, fmt)` | Java format strings converted |\n| `date_format(ts, fmt)` | `strftime(ts, fmt)` | Java format strings converted |\n| `at_timezone(ts, tz)` | `ts AT TIME ZONE tz` | |\n| `with_timezone(ts, tz)` | `ts AT TIME ZONE tz` | |\n| `parse_datetime(s, fmt)` | `strptime(s, fmt)` | Java format strings converted |\n| `to_unixtime(ts)` | `epoch(ts)` | |\n| `current_timezone()` | `current_setting('TimeZone')` | |\n| `from_unixtime(t)` | `to_timestamp(t)` | |\n| `date_diff(unit, t1, t2)` | `date_diff(unit, t1, t2)` | |\n| `date_add(unit, n, ts)` | `date_add(unit, n, ts)` | |\n| `day_of_week(d)` | `dayofweek(d)` | |\n| `day_of_year(d)` | `dayofyear(d)` | |\n| `week_of_year(d)` | `weekofyear(d)` | |\n| `year_of_week(d)` | `yearofweek(d)` | |\n\n#### String\n\n| Trino | DuckDB | Notes |\n|-------|--------|-------|\n| `split(s, del)` | `str_split(s, del)` | |\n| `levenshtein_distance(a, b)` | `levenshtein(a, b)` | |\n| `regexp_like(s, p)` | `regexp_matches(s, p)` | |\n| `regexp_extract(s, p[, g])` | `regexp_extract(s, p[, g])` | |\n| `regexp_replace(s, p, r)` | `regexp_replace(s, p, r)` | |\n| `strpos(s, sub)` | `strpos(s, sub)` | |\n| `length(s)` | `length(s)` | |\n| `reverse(s)` | `reverse(s)` | |\n| `lpad(s, n, c)` | `lpad(s, n, c)` | |\n| `rpad(s, n, c)` | `rpad(s, n, c)` | |\n| `chr(n)` | `chr(n)` | |\n| `codepoint(c)` | `unicode(c)` | |\n| `from_hex(s)` | `unhex(s)` | |\n| `to_utf8(s)` | `encode(s)` | Returns BLOB |\n| `from_utf8(b)` | `decode(b)` | |\n| `url_extract_host(url)` | `regexp_extract(url, ...)` | Approximation via regex |\n| `url_extract_path(url)` | `regexp_extract(url, ...)` | Approximation via regex |\n| `url_extract_protocol(url)` | `regexp_extract(url, ...)` | Approximation via regex |\n| `url_extract_query(url)` | `regexp_extract(url, ...)` | Approximation via regex |\n| `url_extract_fragment(url)` | `regexp_extract(url, ...)` | Approximation via regex |\n| `url_extract_port(url)` | `regexp_extract(url, ...)` | Approximation via regex |\n\n#### Array / Map\n\n| Trino | DuckDB | Notes |\n|-------|--------|-------|\n| `transform(arr, fn)` | `list_transform(arr, fn)` | |\n| `sequence(start, stop)` | `generate_series(start, stop)` | |\n| `element_at(arr, i)` | `list_extract(arr, i)` | |\n| `cardinality(x)` | `len(x)` | |\n| `array_join(arr, sep)` | `array_to_string(arr, sep)` | |\n| `reduce(arr, ...)` | `list_reduce(arr, ...)` | |\n| `filter(arr, fn)` | `list_filter(arr, fn)` | |\n| `contains(arr, x)` | `list_contains(arr, x)` | |\n| `zip(a, b)` | `list_zip(a, b)` | |\n| `flatten(arr)` | `flatten(arr)` | |\n| `slice(arr, ...)` | `list_slice(arr, ...)` | |\n| `array_distinct(arr)` | `list_distinct(arr)` | |\n| `array_sort(arr)` | `list_sort(arr)` | |\n| `array_max(arr)` | `list_max(arr)` | |\n| `array_min(arr)` | `list_min(arr)` | |\n| `array_position(arr, x)` | `list_position(arr, x)` | |\n| `array_remove(arr, x)` | `list_filter(arr, x)` | Approximate |\n| `array_intersect(a, b)` | `list_intersect(a, b)` | |\n| `array_concat(a, b)` | `list_concat(a, b)` | |\n| `array_except(a, b)` | `list_except(a, b)` | |\n| `array_union(a, b)` | `list_distinct(list_concat(a, b))` | |\n| `arrays_overlap(a, b)` | `len(list_intersect(a, b)) \u003e 0` | |\n| `array_has(arr, x)` | `list_contains(arr, x)` | |\n| `array_has_all(arr, req)` | `len(list_intersect(arr, req)) = len(req)` | |\n| `array_has_any(arr, cands)` | `len(list_intersect(arr, cands)) \u003e 0` | |\n| `array_sum(arr)` | `list_sum(arr)` | |\n| `array_average(arr)` | `list_avg(arr)` | |\n| `map_keys(m)` | `map_keys(m)` | |\n| `map_values(m)` | `map_values(m)` | |\n| `map_concat(m1, m2)` | `map_concat(m1, m2)` | |\n\n#### JSON\n\n| Trino | DuckDB | Notes |\n|-------|--------|-------|\n| `json_extract_scalar(j, p)` | `json_extract_string(j, p)` | |\n| `json_extract(j, p)` | `json_extract(j, p)` | |\n| `json_parse(s)` | `CAST(s AS JSON)` | |\n| `json_format(j)` | `CAST(j AS VARCHAR)` | |\n| `json_array_get(j, idx)` | `json_extract_string(j, '$[idx]')` | Literal index only |\n| `json_array_length(j)` | `json_array_length(j)` | |\n| `json_object_keys(j)` | `json_keys(j)` | |\n\n#### Math / Numeric\n\n| Trino | DuckDB | Notes |\n|-------|--------|-------|\n| `is_nan(x)` | `isnan(x)` | |\n| `is_finite(x)` | `isfinite(x)` | |\n| `is_infinite(x)` | `isinf(x)` | |\n| `nan()` | `CAST('NaN' AS DOUBLE)` | |\n| `infinity()` | `CAST('Infinity' AS DOUBLE)` | |\n| `rand()` | `random()` | |\n| `typeof(x)` | `typeof(x)` | |\n\n#### Bitwise\n\n| Trino | DuckDB | Notes |\n|-------|--------|-------|\n| `bitwise_and(a, b)` | `a \u0026 b` | |\n| `bitwise_or(a, b)` | `a \\| b` | |\n| `bitwise_xor(a, b)` | `a ^ b` | |\n| `bitwise_not(a)` | `~a` | |\n| `bitwise_left_shift(a, b)` | `a \u003c\u003c b` | |\n| `bitwise_right_shift(a, b)` | `a \u003e\u003e b` | |\n\n### Function Mappings: Redshift to DuckDB\n\n#### Date / Time\n\n| Redshift | DuckDB | Notes |\n|----------|--------|-------|\n| `GETDATE()` | `current_timestamp()` | |\n| `SYSDATE` | `current_timestamp` | |\n| `DATEADD(part, n, d)` | `d + INTERVAL 'n' part` | Interval arithmetic |\n| `DATEDIFF(part, d1, d2)` | `date_diff('part', d1, d2)` | Datepart quoted |\n| `DATE_TRUNC(part, d)` | `date_trunc(part, d)` | |\n| `CONVERT_TIMEZONE(tz, ts)` | `ts AT TIME ZONE 'tz'` | 2-arg form |\n| `CONVERT_TIMEZONE(src, dst, ts)` | `ts AT TIME ZONE 'src' AT TIME ZONE 'dst'` | 3-arg form |\n| `TO_DATE(s, fmt)` | `CAST(strptime(s, fmt) AS DATE)` | PG format strings converted |\n| `TO_TIMESTAMP(s, fmt)` | `strptime(s, fmt)` | PG format strings converted |\n| `TO_CHAR(ts, fmt)` | `strftime(ts, fmt)` | PG format strings converted |\n| `MONTHS_BETWEEN(d1, d2)` | `datediff('month', d2, d1)` | |\n| `ADD_MONTHS(d, n)` | `d + INTERVAL 'n' MONTH` | |\n\n#### String\n\n| Redshift | DuckDB | Notes |\n|----------|--------|-------|\n| `NVL(a, b)` | `coalesce(a, b)` | |\n| `NVL2(e, a, b)` | `CASE WHEN e IS NOT NULL THEN a ELSE b END` | |\n| `ISNULL(v, r)` | `coalesce(v, r)` | 2-arg form only |\n| `LEN(s)` | `length(s)` | |\n| `LCASE(s)` | `lower(s)` | |\n| `UCASE(s)` | `upper(s)` | |\n| `UPPER(s)` | `upper(s)` | |\n| `LOWER(s)` | `lower(s)` | |\n| `LEFT(s, n)` | `left(s, n)` | |\n| `RIGHT(s, n)` | `right(s, n)` | |\n| `SUBSTRING(s, ...)` | `substring(s, ...)` | |\n| `REPLACE(s, from, to)` | `replace(s, from, to)` | |\n| `BTRIM(s)` | `trim(s)` | |\n| `TRIM(s)` | `trim(s)` | |\n| `CHARINDEX(sub, str)` | `strpos(str, sub)` | Args swapped |\n| `SPACE(n)` | `repeat(' ', n)` | |\n| `REGEXP_SUBSTR(s, p)` | `regexp_extract(s, p)` | |\n| `REGEXP_COUNT(s, p)` | `len(regexp_extract_all(s, p))` | |\n\n#### Aggregate\n\n| Redshift | DuckDB | Notes |\n|----------|--------|-------|\n| `DECODE(e, s1, r1, ..., def)` | `CASE e WHEN s1 THEN r1 ... ELSE def END` | |\n| `LISTAGG(col, sep)` | `string_agg(col, sep)` | |\n\n#### JSON\n\n| Redshift | DuckDB | Notes |\n|----------|--------|-------|\n| `JSON_EXTRACT_PATH_TEXT(j, k1, k2, ...)` | `json_extract_string(j, '$.k1.k2...')` | Literal keys only |\n| `JSON_EXTRACT_ARRAY_ELEMENT_TEXT(j, i)` | `json_extract_string(j, '$[i]')` | Literal index only |\n| `JSON_ARRAY_LENGTH(j)` | `json_array_length(j)` | |\n| `JSON_TYPEOF(j)` | `json_type(j)` | |\n| `JSON_SERIALIZE(j)` | `CAST(j AS VARCHAR)` | |\n| `JSON_DESERIALIZE(s)` | `CAST(s AS JSON)` | |\n| `IS_VALID_JSON(s)` | `json_valid(s)` | |\n\n#### Array\n\n| Redshift | DuckDB | Notes |\n|----------|--------|-------|\n| `ARRAY_CONCAT(a, b)` | `list_concat(a, b)` | |\n\n#### Crypto / Encoding\n\n| Redshift | DuckDB | Notes |\n|----------|--------|-------|\n| `MD5(s)` | `md5(s)` | |\n| `SHA1(s)` | `sha1(s)` | |\n| `SHA2(s, 256)` | `sha256(s)` | 256-bit only |\n\n#### Emulated\n\n| Redshift | DuckDB | Notes |\n|----------|--------|-------|\n| `STRTOL(s, base)` | `CASE base WHEN 16 THEN CAST(('0x' \\|\\| s) AS BIGINT) WHEN 10 THEN CAST(s AS BIGINT) END` | Base 10 and 16 only |\n| `RATIO_TO_REPORT(col) OVER (...)` | `col / SUM(col) OVER (...)` | Window clause preserved |\n\n#### Unsupported\n\n| Redshift | Notes |\n|----------|-------|\n| `BPCHARCMP(a, b)` | No DuckDB equivalent |\n\n### Type Mappings: Trino to DuckDB\n\n| Trino | DuckDB | Context |\n|-------|--------|---------|\n| `ROW(a INT, b VARCHAR)` | `STRUCT(a INT, b VARCHAR)` | CAST, DDL |\n| `ARRAY(T)` | `T[]` | CAST, DDL |\n| `ARRAY\u003cT\u003e` | `T[]` | CAST, DDL |\n| `MAP(K, V)` | `MAP(K, V)` | Passthrough |\n| `VARBINARY` | `BLOB` | CAST, DDL |\n| `IPADDRESS` | `VARCHAR` | CAST, DDL |\n\n### Type Mappings: Redshift to DuckDB\n\n| Redshift | DuckDB | Context |\n|----------|--------|---------|\n| `VARCHAR(MAX)` | `VARCHAR` | CAST, DDL |\n| `CHARACTER VARYING(MAX)` | `VARCHAR` | CAST, DDL |\n| `NVARCHAR(MAX)` | `VARCHAR` | CAST, DDL |\n| `SUPER` | `JSON` | CAST, DDL |\n| `VARBINARY` | `BLOB` | CAST, DDL |\n| `HLLSKETCH` | Unsupported | |\n| `GEOMETRY` | Unsupported | |\n| `TIMETZ` | `TIMETZ` | Passthrough |\n| `TIMESTAMPTZ` | `TIMESTAMPTZ` | Passthrough |\n\n### Type Mappings: Hive to DuckDB\n\nHive uses the same type rewrite rules as Trino.\n\n| Hive | DuckDB | Context |\n|------|--------|---------|\n| `ROW(a INT, b VARCHAR)` | `STRUCT(a INT, b VARCHAR)` | CAST, DDL |\n| `ARRAY(T)` | `T[]` | CAST, DDL |\n| `ARRAY\u003cT\u003e` | `T[]` | CAST, DDL |\n| `MAP(K, V)` | `MAP(K, V)` | Passthrough |\n| `VARBINARY` | `BLOB` | CAST, DDL |\n| `IPADDRESS` | `VARCHAR` | CAST, DDL |\n\n### DDL Support\n\n#### CREATE TABLE\n\n| Feature | Behavior |\n|---------|----------|\n| Column type rewriting | Automatic (all type mappings applied) |\n| `CREATE EXTERNAL TABLE ... STORED AS ... LOCATION` | Configurable: `MapToView` or `Error` |\n| Iceberg via `TBLPROPERTIES ('table_type'='ICEBERG')` | Configurable: `MapToView` (uses `iceberg_scan()`) or `Error` |\n| Iceberg via `WITH (table_type = 'ICEBERG')` | Same as above (Trino syntax) |\n| `PARTITIONED BY` | `hive_partitioning = true` added to reader options |\n| `ROW FORMAT DELIMITED FIELDS TERMINATED BY` | `delim = '...'` added to `read_csv` options |\n| `ROW FORMAT DELIMITED ESCAPED BY` | `escape = '...'` added to `read_csv` options |\n| `ROW FORMAT DELIMITED LINES TERMINATED BY` | `new_line = '...'` added to `read_csv` options |\n| `ROW FORMAT DELIMITED NULL DEFINED AS` | `nullstr = '...'` added to `read_csv` options |\n| `ROW FORMAT SERDE 'class'` | Reader function inferred from SerDe class name |\n\n#### External Table Format Mapping\n\n| STORED AS | DuckDB Reader |\n|-----------|---------------|\n| `PARQUET` | `read_parquet()` |\n| `ORC` | `read_parquet()` |\n| `TEXTFILE` | `read_csv()` |\n| `JSONFILE` | `read_json()` |\n| `AVRO` | Unsupported |\n| `SEQUENCEFILE` | Unsupported |\n| `RCFILE` | Unsupported |\n| (none specified) | `read_parquet()` (default) |\n\n#### SerDe Class Mapping\n\nWhen no `STORED AS` is present, the reader function is inferred from `ROW FORMAT SERDE`:\n\n| SerDe Class | DuckDB Reader |\n|-------------|---------------|\n| `ParquetHiveSerDe` | `read_parquet()` |\n| `OrcSerde` | `read_parquet()` |\n| `JsonSerDe` (Hive or OpenX) | `read_json()` |\n| `OpenCSVSerde` | `read_csv()` |\n| `LazySimpleSerDe` | `read_csv()` |\n| `RegexSerDe` | Unsupported |\n| Unknown classes | `Error` by default; override via `serde_class_resolver` |\n\nThe built-in mapping is substring-based and case-insensitive. For classes not listed above, supply a `SerdeClassResolver` in `TranspileOptions` (see the Library section for an example). The resolver is called first; returning `None` falls through to the built-in logic.\n\n#### ALTER TABLE\n\n| Operation | Behavior |\n|-----------|----------|\n| `ADD COLUMN` | Column type rewritten |\n| `ALTER COLUMN ... SET DATA TYPE` | Data type rewritten |\n| Other operations | Passthrough |\n\n### DML Support\n\n| Statement | Behavior |\n|-----------|----------|\n| `INSERT INTO ... SELECT` | Passthrough (expressions rewritten) |\n| `INSERT INTO ... VALUES` | Passthrough |\n| `UPDATE ... SET ... FROM` (Redshift) | Passthrough (DuckDB supports this) |\n| `DELETE ... USING` (Redshift) | Passthrough (DuckDB supports this) |\n| `MERGE` | Passthrough |\n| `COPY FROM` (Redshift) | Configurable: `MapToInsert` (uses `read_parquet`/`read_csv`/`read_json`) or `Error` |\n\n### SHOW Commands\n\n| Command | DuckDB Output |\n|---------|---------------|\n| `SHOW TABLES` | Passthrough |\n| `SHOW DATABASES` | Passthrough |\n| `SHOW SCHEMAS` | Passthrough |\n| `SHOW COLUMNS FROM t` | Passthrough |\n| `SHOW VIEWS` | Passthrough |\n| `SHOW CREATE TABLE t` | Emulated via `information_schema.columns` (reconstructs DDL) |\n| `SHOW CREATE VIEW v` | Emulated via `duckdb_views()` (retrieves view SQL) |\n| `SHOW variable` | `SELECT current_setting('variable')` |\n| `SHOW FUNCTIONS` | `SELECT ... FROM information_schema.routines` |\n\n### UNNEST and Lateral Joins\n\n| Source Syntax | DuckDB Output |\n|---------------|---------------|\n| `CROSS JOIN UNNEST(arr) AS t(x)` | Passthrough |\n| `CROSS JOIN UNNEST(arr) WITH ORDINALITY AS t(x, n)` | Passthrough |\n| `LATERAL VIEW explode(arr) t AS x` | `CROSS JOIN UNNEST(arr) AS t(x)` |\n| `LATERAL VIEW posexplode(arr) t AS x` | `CROSS JOIN UNNEST(arr) AS t(x)` |\n| `CROSS JOIN LATERAL (subquery)` | Passthrough |\n\n### Parameterized Queries\n\nParameterized queries (prepared statement placeholders) are passed through unchanged by papera.\n\n| Dialect | Supported Styles | Example |\n|---------|-----------------|---------|\n| Trino | `?`, `$1` | `SELECT * FROM t WHERE x = ?` |\n| Redshift | `$1` | `SELECT * FROM t WHERE x = $1` |\n| Hive | `?`, `$1` | `SELECT * FROM t WHERE x = ?` |\n\nOn the default DuckDB path, `$1`-style positional parameters are native. The `?` style is also accepted by DuckDB in its client APIs.\n\n## Configuration\n\n### TranspileOptions\n\n| Option | Values | Default | Description |\n|--------|--------|---------|-------------|\n| `target` | `DuckDB`, `DataFusion` | `DuckDB` | The target SQL dialect to emit |\n| `external_table` | `MapToView`, `Error` | `Error` | How to handle `CREATE EXTERNAL TABLE` |\n| `iceberg_table` | `MapToView`, `Error` | `Error` | How to handle Iceberg tables (detected via TBLPROPERTIES) |\n| `copy` | `MapToInsert`, `Error` | `Error` | How to handle Redshift `COPY FROM` |\n| `serde_class_resolver` | `Some(SerdeClassResolver)`, `None` | `None` | Custom resolver for `ROW FORMAT SERDE` class names not covered by the built-in mapping. Return `Some(reader_fn)` to override, `None` to fall through. |\n\n## Project Structure\n\n```\nsrc/\n  lib.rs                  Public API, TranspileOptions, crate-root exports\n  error.rs                Error types\n  main.rs                 CLI entry point\n  transpiler/\n    mod.rs                Transpiler trait\n    rewrite.rs            ExprRewriter (VisitorMut-based AST walker)\n  dialect/\n    mod.rs                SourceDialect and TargetDialect enums\n    trino.rs              Trino transpiler\n    redshift.rs           Redshift transpiler\n    hive.rs               Hive transpiler\n  transforms/\n    mod.rs                Module re-exports\n    types.rs              Data type rewriting\n    functions.rs          Function name/signature mapping\n    format_strings.rs     Format string conversion (PG/Java → strftime)\n    ddl.rs                CREATE TABLE, ALTER TABLE, external/iceberg tables\n    dml.rs                INSERT, UPDATE, DELETE, MERGE\n    show.rs               SHOW command translation\n    unnest.rs             UNNEST syntax normalization\n    lateral.rs            LATERAL VIEW to CROSS JOIN UNNEST\ntests/\n  common/mod.rs           Test helpers\n  integration.rs          End-to-end tests\n  duckdb_integration.rs   DuckDB execution tests\nexamples/\n  basic.rs                Function, type, and syntax rewrites\n  migration.rs            External tables, Iceberg, COPY with TranspileOptions\n  multi_statement.rs      Multi-statement ETL script transpilation\n  serde_resolver.rs       Custom SerDe class resolver with built-in fallthrough\n```\n\n## Architecture and Design\n\n### Transpilation Pipeline\n\npapera uses a two-stage AST transformation pipeline:\n\n1. **Parse** source SQL with the dialect-specific parser from sqlparser-rs.\n2. **Statement-level transforms** restructure top-level `Statement` variants (e.g., converting `CREATE EXTERNAL TABLE` into `CREATE VIEW`, or rewriting `SHOW CREATE TABLE` into catalog queries).\n3. **Expression-level rewrites** via `ExprRewriter` (a `VisitorMut`-based AST walker) handle cross-cutting concerns such as function renaming, type casting, and table-factor normalization.\n4. **Emit** SQL for the selected target dialect from the rewritten AST.\n\nThis split is intentional: statement handlers own structural changes that may replace one statement kind with another, while `ExprRewriter` handles expression-level rewrites that apply uniformly across statement types. Source dialect and target dialect are separate dimensions in the design: source dialect controls parsing and dialect-specific preprocessing, while target dialect controls function mappings, type rewrites, selected SHOW behavior, and some DDL lowering decisions.\n\n### Library-First Design\n\nThe crate is designed as a library first. The CLI binary is feature-gated behind `cli` and is not built by default. Internal rewrite machinery under `src/transforms` is `pub(crate)`, keeping the stable public API surface small: `transpile`, `transpile_with_options`, source and target dialect selection, option types, the `SerdeClassResolver` extension hook, and shared error types.\n\n### Opt-In Semantics for Risky Conversions\n\nFeatures that can silently change semantics or storage assumptions are controlled by `TranspileOptions` and default to `Error`. External-table-to-view, Iceberg-table-to-view, and Redshift `COPY` lowering are examples: they are useful but alter storage or ingestion behavior, so callers must explicitly opt in.\n\n### Parser as Feature Boundary\n\npapera's feature coverage is bounded not only by the rewrite logic but also by what sqlparser-rs can parse and represent. For example:\n\n- DDL column `DataType` nodes are not visited by `VisitorMut`, so `CREATE TABLE` column types must be rewritten in statement handlers rather than in the expression walker.\n- Trino `ROW(a INT, b VARCHAR)` is exposed as flattened custom type data under `GenericDialect`, making nested `ROW` handling fragile.\n- Some source syntax cannot be supported cleanly until the upstream parser exposes a usable AST for it.\n\n### Rewrite Strategy\n\nFunction rewrites are classified by complexity:\n\n- **Rename**: simple name substitution (e.g., `approx_distinct` to `approx_count_distinct`).\n- **RenameReorder**: same function with reordered arguments (e.g., `CHARINDEX(sub, str)` to `strpos(str, sub)`).\n- **Custom**: the rewrite must produce a different AST shape entirely (e.g., `NVL2(e, a, b)` becomes a `CASE WHEN` expression, bitwise functions become infix operators).\n\nDeclarative mappings are preferred where possible, with custom rewrites reserved for cases where the selected target requires a structurally different expression.\n\n### Compatibility Model\n\npapera targets engine-correct output, not just syntactically valid SQL. DuckDB execution remains the strongest validation path in the current test strategy, and some mappings are approximations rather than exact semantic matches (e.g., `url_extract_*` functions use regex approximations). String-level rewrite success alone is not considered sufficient evidence of compatibility, which is why the test suite includes DuckDB execution tests alongside string-comparison tests.\n\n### Target Dialect Notes\n\nDuckDB is the mature target and the one documented by most compatibility tables in this README. The library also supports `TargetDialect::DataFusion`, but that path has narrower coverage and different unsupported cases, especially for reader-backed external-table and Iceberg rewrites that currently rely on DuckDB-specific functions.\n\n## Known Limitations\n\n### Nested ROW types (sqlparser-rs 0.61)\n\nNested `ROW` types such as `ROW(x BIGINT, y ROW(i DOUBLE, j DOUBLE))` fail to parse in sqlparser-rs 0.61. The root cause is `parse_optional_type_modifiers()` in the parser, which cannot handle nested parentheses. This affects all dialects (including `HiveDialect`). Flat `ROW(a INT, b VARCHAR)` works correctly.\n\n### ARRAY(T) vs ARRAY\\\u003cT\\\u003e\n\n`ARRAY(T)` (Trino parenthesis syntax) is rewritten to `T[]`, but type inference inside `ARRAY(T)` depends on the parser recognizing the inner type. `ARRAY\u003cT\u003e` and `T[]` are fully supported.\n\n### Approximate mappings\n\nSome functions are approximations rather than exact semantic matches. For example, `url_extract_*` functions use regex-based approximations. Always validate output against DuckDB execution for compatibility-sensitive queries.\n\n### DataFusion target scope\n\n`TargetDialect::DataFusion` is available through the library API, but it is not feature-equivalent with DuckDB. In particular, reader-backed external-table and Iceberg rewrites remain DuckDB-specific, and some DataFusion-specific mappings are still better treated as explicit unsupported cases than as silent approximations.\n\n### Redshift COPY options\n\nWhen `copy` is set to `MapToInsert`, Redshift-specific options such as `IAM_ROLE`, `IGNOREHEADER`, and `GZIP` are silently dropped. The generated `INSERT INTO ... SELECT * FROM read_parquet/read_csv/read_json` reflects format and location only.\n\n## Building\n\n```sh\ncargo build\ncargo test\n```\n\n## License\n\nMIT License. Copyright (c) 2026 Moriyoshi Koizumi. See [LICENSE](./LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoriyoshi%2Fpapera","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoriyoshi%2Fpapera","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoriyoshi%2Fpapera/lists"}