{"id":30565621,"url":"https://github.com/grouzen/zio-apache-parquet","last_synced_at":"2025-08-28T16:04:38.052Z","repository":{"id":209346533,"uuid":"723798303","full_name":"grouzen/zio-apache-parquet","owner":"grouzen","description":"Scala ZIO-powered Apache Parquet library","archived":false,"fork":false,"pushed_at":"2025-07-25T07:35:13.000Z","size":436,"stargazers_count":26,"open_issues_count":10,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-25T11:38:42.261Z","etag":null,"topics":["apache-parquet","big-data","bigdata","parquet","parquet-files","parquet-format","parquet-tools","scala","zio","zio-streams","zio2"],"latest_commit_sha":null,"homepage":"http://mnedokushev.me/zio-apache-parquet/","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grouzen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-11-26T19:41:21.000Z","updated_at":"2025-07-25T07:30:12.000Z","dependencies_parsed_at":"2024-01-28T03:23:19.989Z","dependency_job_id":"60fca4fe-fb25-418e-a740-1f6e06865c59","html_url":"https://github.com/grouzen/zio-apache-parquet","commit_stats":null,"previous_names":["grouzen/zio-apache-parquet"],"tags_count":23,"template":false,"template_full_name":null,"purl":"pkg:github/grouzen/zio-apache-parquet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grouzen%2Fzio-apache-parquet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grouzen%2Fzio-apache-parquet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grouzen%2Fzio-apache-parquet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grouzen%2Fzio-apache-parquet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grouzen","download_url":"https://codeload.github.com/grouzen/zio-apache-parquet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grouzen%2Fzio-apache-parquet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272519516,"owners_count":24948520,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-28T02:00:10.768Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-parquet","big-data","bigdata","parquet","parquet-files","parquet-format","parquet-tools","scala","zio","zio-streams","zio2"],"created_at":"2025-08-28T16:01:40.889Z","updated_at":"2025-08-28T16:04:38.040Z","avatar_url":"https://github.com/grouzen.png","language":"Scala","funding_links":[],"categories":["\u003ca name=\"Scala\"\u003e\u003c/a\u003eScala"],"sub_categories":[],"readme":"![](docs/logo.png)\n\n![Build status](https://github.com/grouzen/zio-apache-parquet/actions/workflows/ci.yml/badge.svg)\n![Maven Central](https://img.shields.io/maven-central/v/me.mnedokushev/zio-apache-parquet-core_2.13.svg?label=Maven%20central)\n[![Scala Steward badge](https://img.shields.io/badge/Scala_Steward-helping-blue.svg?style=flat\u0026logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAAQCAMAAAARSr4IAAAAVFBMVEUAAACHjojlOy5NWlrKzcYRKjGFjIbp293YycuLa3pYY2LSqql4f3pCUFTgSjNodYRmcXUsPD/NTTbjRS+2jomhgnzNc223cGvZS0HaSD0XLjbaSjElhIr+AAAAAXRSTlMAQObYZgAAAHlJREFUCNdNyosOwyAIhWHAQS1Vt7a77/3fcxxdmv0xwmckutAR1nkm4ggbyEcg/wWmlGLDAA3oL50xi6fk5ffZ3E2E3QfZDCcCN2YtbEWZt+Drc6u6rlqv7Uk0LdKqqr5rk2UCRXOk0vmQKGfc94nOJyQjouF9H/wCc9gECEYfONoAAAAASUVORK5CYII=)](https://scala-steward.org)\n\n# ZIO Apache Parquet\n\nA ZIO-powered wrapper for [Apache Parquet's Java implementation](https://github.com/apache/parquet-mr), leveraging [ZIO Schema](https://zio.dev/zio-schema/) to automatically derive codecs and provide type-safe filter predicates. Operate your parquet files easily using a top-notch ZIO-powered ecosystem without running a Spark cluster.\n\nReady for more? Check out my other game-changing library that makes working with Apache Arrow format a breeze - [ZIO Apache Arrow](https://github.com/grouzen/zio-apache-arrow).\n\n## Why?\n\n- **No Spark required** - you don't need to run a Spark cluster to read/write Parquet files.\n- **ZIO native** - utilizes various ZIO features to offer a FP-oriented way of working with the Parquet API.\n- **ZIO Schema** - the backbone that powers all the cool features of this library such as type-safe filter predicates and codecs derivation.\n\n\n## Contents\n\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Codecs](#codecs)\n    - [Schema](#schema)\n    - [Value](#value)\n  - [Reading \u0026 Writing files](#reading--writing-files)\n    - [Filtering](#filtering)\n- [Resources](#resources)\n\n## Installation\n\n```scala\nlibraryDependencies += \"me.mnedokushev\" %% \"zio-apache-parquet-core\" % \"@VERSION@\"\n```\n\n## Usage\n\nAll examples are self-contained [Scala CLI](https://scala-cli.virtuslab.org) snippets. You can find copies of them in `docs/scala-cli`.\n\n### Codecs\n\nTo be able to write/read data to/from parquet files you need to define the following schema and value codecs \n`SchemaEncoder`, `ValueEncoder`, and `ValueDecoder` for your case classes.\n\n#### Schema\n\nYou can get Java SDK's `Type` by using `SchemaEncoder` generated by `SchemaEncoderDeriver.default` ZIO Schema deriver:\n\n```scala\n//\u003e using scala \"3.7.1\"\n//\u003e using dep me.mnedokushev::zio-apache-parquet-core:0.3.1\n\nimport zio.schema.*\nimport me.mnedokushev.zio.apache.parquet.core.codec.*\n\nobject Schema extends App:\n\n  case class MyRecord(a: Int, b: String, c: Option[Long])\n\n  object MyRecord:\n    given schema: Schema[MyRecord]               =\n      DeriveSchema.gen[MyRecord]\n    given schemaEncoder: SchemaEncoder[MyRecord] =\n      Derive.derive[SchemaEncoder, MyRecord](SchemaEncoderDeriver.default)\n\n  val parquetSchema = MyRecord.schemaEncoder.encode(MyRecord.schema, \"my_record\", optional = false)\n\n  println(parquetSchema)\n  // Outputs:\n  // required group my_record {\n  //   required int32 a (INTEGER(32,true));\n  //   required binary b (STRING);\n  //   optional int64 c (INTEGER(64,true));\n  // }\n```\n\nAlternatively, you can customize the schemas of [primitive](https://zio.dev/zio-schema/standard-type-reference) fields within your record by defining a custom `SchemaEncoder` \nand using the `SchemaEncoderDeriver.summoned` deriver.\n\n```scala\n//\u003e using scala \"3.7.1\"\n//\u003e using dep me.mnedokushev::zio-apache-parquet-core:0.3.1\n\nimport me.mnedokushev.zio.apache.parquet.core.Schemas\nimport zio.schema.*\nimport me.mnedokushev.zio.apache.parquet.core.codec.*\n\nobject SchemaSummoned extends App:\n\n  case class MyRecord(a: Int, b: String, c: Option[Long])\n\n  object MyRecord:\n    given schema: Schema[MyRecord] =\n      DeriveSchema.gen[MyRecord]\n    // The custom encoder must be defined before the definition for your record type.\n    given SchemaEncoder[Int] with {\n      override def encode(schema: Schema[Int], name: String, optional: Boolean) =\n        Schemas.uuid.optionality(optional).named(name)\n    }\n    given schemaEncoder: SchemaEncoder[MyRecord] =\n      Derive.derive[SchemaEncoder, MyRecord](SchemaEncoderDeriver.summoned)\n\n  val parquetSchema = MyRecord.schemaEncoder.encode(MyRecord.schema, \"my_record\", optional = false)\n\n  println(parquetSchema)\n  // Outputs:\n  // required group my_record {\n  //   required fixed_len_byte_array(16) a (UUID);\n  //   required binary b (STRING);\n  //   optional int64 c (INTEGER(64,true));\n  // }\n```\n\nThe case classes with arity bigger than 22 are supported too. Check out the [SchemaArity23.scala ScalaCLI example](docs/scala-cli/SchemaArity23.scala)!\n\n#### Value\n\n`Value` is a sealed hierarchy of types for interop between Scala values and Parquet readers/writers.\nFor converting Scala values into `Value` and back we need to define instances of `ValueEncoder` and `ValueDecoder`\ntype classes. This could be done by using `ValueDecoderDeriver.default` ZIO Schema deriver.\n\n```scala\n//\u003e using scala \"3.7.1\"\n//\u003e using dep me.mnedokushev::zio-apache-parquet-core:0.3.1\n\nimport zio.schema.*\nimport me.mnedokushev.zio.apache.parquet.core.codec.*\n\nobject Value extends App:\n\n  case class MyRecord(a: Int, b: String, c: Option[Long])\n\n  object MyRecord:\n    given Schema[MyRecord]                =\n      DeriveSchema.gen[MyRecord]\n    given encoder: ValueEncoder[MyRecord] =\n      Derive.derive[ValueEncoder, MyRecord](ValueEncoderDeriver.default)\n    given decoder: ValueDecoder[MyRecord] =\n      Derive.derive[ValueDecoder, MyRecord](ValueDecoderDeriver.default)\n\n  val value  = MyRecord.encoder.encode(MyRecord(3, \"zio\", None))\n  val record = MyRecord.decoder.decode(value)\n\n  println(value)\n  // Outputs:\n  // RecordValue(Map(a -\u003e Int32Value(3), b -\u003e BinaryValue(Binary{\"zio\"}), c -\u003e NullValue))\n  println(record)\n  // Outputs:\n  // MyRecord(3,zio,None)\n```\n\nSame as for `SchemaEncoder`, you can customize the codecs of primitive types by defining custom \n`ValueEncoder`/`ValueDecoder` and using `ValueEncoderDeriver.summoned`/`ValueDecoderDeriver.summoned` derivers accordingly.\n\n```scala\n//\u003e using scala \"3.7.1\"\n//\u003e using dep me.mnedokushev::zio-apache-parquet-core:0.3.1\n\nimport me.mnedokushev.zio.apache.parquet.core.Value\nimport zio.schema.*\nimport me.mnedokushev.zio.apache.parquet.core.codec.*\n\nimport java.nio.charset.StandardCharsets\n\nobject ValueSummoned extends App:\n\n  case class MyRecord(a: Int, b: String, c: Option[Long])\n\n  object MyRecord:\n    given Schema[MyRecord] =\n      DeriveSchema.gen[MyRecord]\n    given ValueEncoder[Int] with {\n      override def encode(value: Int): Value =\n        Value.string(value.toString)\n    }\n    given ValueDecoder[Int] with {\n      override def decode(value: Value): Int =\n        value match {\n          case Value.PrimitiveValue.BinaryValue(v) =\u003e\n            new String(v.getBytes, StandardCharsets.UTF_8).toInt\n          case other                               =\u003e\n            throw DecoderError(s\"Wrong value: $other\")\n        }\n    }\n    given encoder: ValueEncoder[MyRecord] =\n      Derive.derive[ValueEncoder, MyRecord](ValueEncoderDeriver.summoned)\n    given decoder: ValueDecoder[MyRecord] =\n      Derive.derive[ValueDecoder, MyRecord](ValueDecoderDeriver.summoned)\n\n  val value  = MyRecord.encoder.encode(MyRecord(3, \"zio\", None))\n  val record = MyRecord.decoder.decode(value)\n\n  println(value)\n  // Outputs:\n  // RecordValue(Map(a -\u003e BinaryValue(Binary{\"3\"}), b -\u003e BinaryValue(Binary{\"zio\"}), c -\u003e NullValue))\n  println(record)\n  // Outputs:\n  // MyRecord(3,zio,None)\n```\n\n### Reading \u0026 Writing files\n\nFinally, to perform some IO operations we need to initialize `ParquetWriter` and `ParquetReader` and use either\n`writeChunk`/`readChunk` or `writeStream`/`readStream` methods. \n\n```scala\n//\u003e using scala \"3.7.1\"\n//\u003e using dep me.mnedokushev::zio-apache-parquet-hadoop:0.3.1\n\nimport zio.schema.*\nimport me.mnedokushev.zio.apache.parquet.core.codec.*\nimport me.mnedokushev.zio.apache.parquet.hadoop.{ ParquetReader, ParquetWriter, Path }\nimport zio.*\n\nimport java.nio.file.Files\n\nobject ParquetIO extends ZIOAppDefault:\n\n  case class MyRecord(a: Int, b: String, c: Option[Long])\n\n  object MyRecord:\n    given Schema[MyRecord]        =\n      DeriveSchema.gen[MyRecord]\n    given SchemaEncoder[MyRecord] =\n      Derive.derive[SchemaEncoder, MyRecord](SchemaEncoderDeriver.default)\n    given ValueEncoder[MyRecord]  =\n      Derive.derive[ValueEncoder, MyRecord](ValueEncoderDeriver.default)\n    given ValueDecoder[MyRecord]  =\n      Derive.derive[ValueDecoder, MyRecord](ValueDecoderDeriver.default)\n\n  val data =\n    Chunk(\n      MyRecord(1, \"first\", Some(11)),\n      MyRecord(3, \"third\", None)\n    )\n\n  val recordsFile = Path(Files.createTempDirectory(\"records\")) / \"records.parquet\"\n\n  override def run =\n    (for {\n      writer   \u003c- ZIO.service[ParquetWriter[MyRecord]]\n      reader   \u003c- ZIO.service[ParquetReader[MyRecord]]\n      _        \u003c- writer.writeChunk(recordsFile, data)\n      fromFile \u003c- reader.readChunk(recordsFile)\n      _        \u003c- Console.printLine(fromFile)\n    } yield ()).provide(\n      ParquetWriter.configured[MyRecord](),\n      ParquetReader.configured[MyRecord]()\n    )\n  // Outputs:\n  // Chunk(MyRecord(1,first,Some(11)),MyRecord(3,third,None))\n```\n\nIn the previous code snippet we used `ParquetReader.configured[A]()` to initialize a reader that uses a parquet schema taken from a given file. Such a reader will always try to read all columns from a given file. \n\nIn case you need to read only part of the columns, use `ParquetReader.projected[A]()`. This skips columns that are not present in the schema and reads only those that are, saving precious CPU cycles and time.\n\n#### Filtering\n\nSay goodbye to type-unsafe filter predicates such as `Col(\"foo\") != \"bar\"`. The library takes advantage of an underdocumented feature in ZIO Schema - [Accessors](https://github.com/zio/zio-schema/blob/main/zio-schema/shared/src/main/scala/zio/schema/Schema.scala#L38) - the hidden pearl that allows extracting type level infromation about fields of case classes. In addition to the already provided codecs, you need to provide an instance of `TypeTag` for your record type. For this, use the `TypeTagDeriver.default` deriver.\n\n```scala\n//\u003e using scala \"3.7.1\"\n//\u003e using dep me.mnedokushev::zio-apache-parquet-hadoop:0.3.1\n\nimport zio.*\nimport zio.schema.*\nimport me.mnedokushev.zio.apache.parquet.core.codec.*\nimport me.mnedokushev.zio.apache.parquet.hadoop.{ ParquetReader, ParquetWriter, Path }\nimport me.mnedokushev.zio.apache.parquet.core.filter.syntax.*\nimport me.mnedokushev.zio.apache.parquet.core.filter.*\n\nimport java.nio.file.Files\n\nobject Filtering extends ZIOAppDefault:\n\n  case class MyRecord(a: Int, b: String, c: Option[Long])\n\n  object MyRecord:\n    // We need to provide field names using singleton types\n    given Schema.CaseClass3.WithFields[\"a\", \"b\", \"c\", Int, String, Option[Long], MyRecord] =\n      DeriveSchema.gen[MyRecord]\n    given SchemaEncoder[MyRecord]                                                          =\n      Derive.derive[SchemaEncoder, MyRecord](SchemaEncoderDeriver.default)\n    given ValueEncoder[MyRecord]                                                           =\n      Derive.derive[ValueEncoder, MyRecord](ValueEncoderDeriver.default)\n    given ValueDecoder[MyRecord]                                                           =\n      Derive.derive[ValueDecoder, MyRecord](ValueDecoderDeriver.default)\n    given TypeTag[MyRecord]                                                                =\n      Derive.derive[TypeTag, MyRecord](TypeTagDeriver.default)\n\n    // Define accessors to use them later in the filter predicate.\n    // You can give any names to the accessors as we demonstrate here.\n    val (id, name, age) = Filter[MyRecord].columns\n\n  val data =\n    Chunk(\n      MyRecord(1, \"bob\", Some(10L)),\n      MyRecord(2, \"bob\", Some(12L)),\n      MyRecord(3, \"alice\", Some(13L)),\n      MyRecord(4, \"john\", None)\n    )\n\n  val recordsFile = Path(Files.createTempDirectory(\"records\")) / \"records.parquet\"\n\n  override def run =\n    (\n      for {\n        writer   \u003c- ZIO.service[ParquetWriter[MyRecord]]\n        reader   \u003c- ZIO.service[ParquetReader[MyRecord]]\n        _        \u003c- writer.writeChunk(recordsFile, data)\n        fromFile \u003c- reader.readChunkFiltered(\n                      recordsFile,\n                      filter(\n                        MyRecord.id \u003e 1 `and` (\n                          MyRecord.name =!= \"bob\" `or`\n                            // Use .nullable syntax for optional fields.\n                            MyRecord.age.nullable \u003e 10L\n                        )\n                      )\n                    )\n        _        \u003c- Console.printLine(fromFile)\n      } yield ()\n    ).provide(\n      ParquetWriter.configured[MyRecord](),\n      ParquetReader.configured[MyRecord]()\n    )\n  // Outputs:\n  // Chunk(MyRecord(2,bob,Some(12)),MyRecord(3,alice,Some(13)),MyRecord(4,john,None))\n```\n\n## Resources\n\n- [Unpacking ZIO Schema's Accessors](https://mnedokushev.me/2024/09/05/unpacking-zio-schema-accessors.html) - Explore how ZIO Schema enables type-safe filtering through its underdocumented feature on my personal blog.\n- [Scala's Hidden Treasures: Five ZIO-Compatible Libraries you didn't know you needed!](https://jorgevasquez.blog/scalas-hidden-treasures-five-zio-compatible-libraries-you-didnt-know-you-needed) - This article, featured in Jorge Vásquez's blog post accompanying his presentation at the [Functional Scala 2024 Conference](https://www.functionalscala.com). You can find more information on the [slides](https://jorge-vasquez-2301.github.io/scalas-hidden-treasures/24). The recording is now available on Ziverge's YouTube channel [here](https://www.youtube.com/watch?v=iFhQibDdqT0\u0026list=PLvdARMfvom9CuM40p_Yr3UAtlADSKC2Js).\n- [Overview page on ZIO's official community ecosystem website](https://zio.dev/ecosystem/community/zio-apache-parquet/) - For a brief overview, visit this page on ZIO's official community ecosystem website.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrouzen%2Fzio-apache-parquet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrouzen%2Fzio-apache-parquet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrouzen%2Fzio-apache-parquet/lists"}