{"id":24456241,"url":"https://github.com/abc3/ex_iceberg_port","last_synced_at":"2025-09-01T22:36:39.960Z","repository":{"id":272474372,"uuid":"916714159","full_name":"abc3/ex_iceberg_port","owner":"abc3","description":"Elixir bindings for Apache Iceberg via Apache Spark","archived":false,"fork":false,"pushed_at":"2025-05-24T09:22:16.000Z","size":25,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-24T10:27:15.267Z","etag":null,"topics":["elixir","iceberg","scala","spark"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abc3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-14T16:20:28.000Z","updated_at":"2025-05-24T09:22:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"886f2550-269f-43d9-9ba1-e6ec732861bd","html_url":"https://github.com/abc3/ex_iceberg_port","commit_stats":null,"previous_names":["abc3/ex_iceberg_port"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abc3/ex_iceberg_port","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abc3%2Fex_iceberg_port","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abc3%2Fex_iceberg_port/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abc3%2Fex_iceberg_port/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abc3%2Fex_iceberg_port/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abc3","download_url":"https://codeload.github.com/abc3/ex_iceberg_port/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abc3%2Fex_iceberg_port/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273203216,"owners_count":25063275,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elixir","iceberg","scala","spark"],"created_at":"2025-01-21T02:16:00.054Z","updated_at":"2025-09-01T22:36:39.955Z","avatar_url":"https://github.com/abc3.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ExIcebergPort\n\nApache Iceberg port for Elixir applications, providing SQL interface through Apache Spark\n\n## Requirements\n\n- Java 11 (OpenJDK 11 recommended)\n- Apache Spark 3.4.1\n- Scala 2.13.12\n- SBT (Scala Build Tool)\n\nBefore running the application, ensure you have Java 11 set up correctly:\n\n```bash\nexport JAVA_HOME=/path/to/your/java11\nexport PATH=\"$JAVA_HOME/bin:$PATH\"\n```\n\nYou can verify your Java version with:\n\n```bash\njava -version\n```\n\n## Building\n\nBefore using the library, you need to build the JVM component:\n\n```bash\nmake jvm\n```\n\n## Development\n\nTo start the application in development mode with IEx shell:\n\n```bash\nmake dev\n```\n\n## Usage\n\n### Start the Port\n\nTo start the port, use the following command:\n\n```elixir\nExIcebergPort.start_link(\n  warehouse_path: \"LOCAL_PATH\",\n  catalog_name: \"local\",\n)\n```\n\n### Create table\n\nTo create a table, use the following command:\n\n```elixir\nExIcebergPort.query(\"CREATE TABLE IF NOT EXISTS local.db.my_table (id INT, name STRING, age INT) USING iceberg\")\n```\n\n### Insert data\n\nTo insert data, use the following command:\n\n```elixir\nExIcebergPort.query(\"insert into local.db.my_table values (1, 'John', 30), (2, 'Jane', 25), (3, 'Bob', 35)\")\n```\n\n### Select rows\n\nTo select rows, use the following command:\n\n```elixir\nExIcebergPort.query(\"SELECT * FROM local.db.my_table\")\n```\n\n### Table Maintenance Operations\n\n#### List Snapshots\n\nView all snapshots (versions) of a table:\n\n```elixir\nExIcebergPort.query(\"SELECT * FROM local.db.my_table.snapshots\")\n```\n\n#### Compact Table Files\n\nTo optimize table performance by rewriting and compacting the table data:\n\n```elixir\nExIcebergPort.query(\"\"\"\n  INSERT OVERWRITE TABLE local.db.my_table\n  SELECT * FROM local.db.my_table\n\"\"\")\n```\n\nThis operation will rewrite the table data, which helps to:\n\n- Compact small files into larger ones\n- Remove deleted records\n- Optimize the table's physical layout\n\n#### Expire Old Snapshots\n\nRemove old snapshots to free up storage space. This operation is safe as it only removes snapshots that are no longer needed for time travel queries:\n\n```elixir\nExIcebergPort.query(\"\"\"\n  CALL catalog_name.system.expire_snapshots(\n    table =\u003e 'local.db.my_table',\n    older_than =\u003e TIMESTAMP '2025-05-24 00:00:00'\n  )\n\"\"\")\n```\n\n#### Remove Orphan Files\n\nAfter expiring snapshots, you can physically delete unreferenced files to reclaim storage space. This operation should be run after `expire_snapshots`:\n\n```elixir\nExIcebergPort.query(\"\"\"\n  CALL catalog_name.system.remove_orphan_files(\n    table =\u003e 'local.db.my_table',\n    older_than =\u003e TIMESTAMP '2025-05-24 00:00:00'\n  )\n\"\"\")\n```\n\nNote: Always ensure you have a backup before running maintenance operations, and verify the `older_than` timestamp carefully to avoid removing data that might still be needed.\n\n## Catalog Support\n\n| Catalog Type   | Status | Description                                    |\n| -------------- | ------ | ---------------------------------------------- |\n| Local Catalog  | ✅     | Hadoop-based local filesystem catalog          |\n| AWS S3 Catalog | 🔄     | Store tables in S3 buckets (Coming soon)       |\n| REST Catalog   | 🔄     | Use Iceberg REST catalog service (Coming soon) |\n\n### DataFrame Operations\n\nCreate and write a DataFrame to an Iceberg table:\n\n```elixir\niex\u003e ExIcebergPort.dummy_df\n{:ok,\n %ExIcebergPort.Result{\n   columns: [\"id\", \"name\", \"age\"],\n   rows: [],\n   num_rows: 2,\n   exec_time_ms: 207\n }}\n```\n\n### SQL Queries\n\nQuery data from Iceberg tables using SQL:\n\n```elixir\niex\u003e ExIcebergPort.query(\"select * from local.db.my_table\")\n{:ok,\n %ExIcebergPort.Result{\n   sql: \"select * from local.db.my_table\",\n   columns: [\"id\", \"name\", \"age\"],\n   rows: [[1, \"John_529\", 18], [2, \"Jane_595\", 81]],\n   num_rows: 2,\n   exec_time_ms: 84\n }}\n```\n\nThe result includes:\n\n- `columns`: List of column names\n- `rows`: List of data rows\n- `num_rows`: Number of rows returned\n- `exec_time_ms`: Query execution time in milliseconds\n- `sql`: The SQL query that was executed (for SQL queries only)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabc3%2Fex_iceberg_port","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabc3%2Fex_iceberg_port","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabc3%2Fex_iceberg_port/lists"}