{"id":21068963,"url":"https://github.com/univalence/zio-spark","last_synced_at":"2025-07-20T19:32:34.301Z","repository":{"id":37859670,"uuid":"259306396","full_name":"univalence/zio-spark","owner":"univalence","description":"A functional wrapper around Spark to make it works with ZIO","archived":false,"fork":false,"pushed_at":"2025-07-18T00:36:32.000Z","size":3793,"stargazers_count":46,"open_issues_count":18,"forks_count":12,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-18T05:18:14.842Z","etag":null,"topics":["scala","spark","zio","zio-spark"],"latest_commit_sha":null,"homepage":"https://univalence.github.io/zio-spark/","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/univalence.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-04-27T12:17:07.000Z","updated_at":"2025-07-18T00:36:29.000Z","dependencies_parsed_at":"2024-05-11T22:28:39.742Z","dependency_job_id":"7c549c6b-6158-4dd8-915f-76bf23bcf997","html_url":"https://github.com/univalence/zio-spark","commit_stats":{"total_commits":539,"total_committers":11,"mean_commits":49.0,"dds":0.7272727272727273,"last_synced_commit":"0028141da0d2fa65edf8494c6ca84fbd636f2ce6"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/univalence/zio-spark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/univalence%2Fzio-spark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/univalence%2Fzio-spark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/univalence%2Fzio-spark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/univalence%2Fzio-spark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/univalence","download_url":"https://codeload.github.com/univalence/zio-spark/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/univalence%2Fzio-spark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266187162,"owners_count":23889924,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["scala","spark","zio","zio-spark"],"created_at":"2024-11-19T18:29:50.771Z","updated_at":"2025-07-20T19:32:34.281Z","avatar_url":"https://github.com/univalence.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003ezio-spark\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://img.shields.io/badge/Project%20Stage-Development-yellowgreen.svg\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Project%20Stage-Development-yellowgreen.svg\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/univalence/zio-spark/actions\"\u003e\n    \u003cimg src=\"https://github.com/univalence/zio-spark/actions/workflows/ci.yml/badge.svg\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://codecov.io/gh/univalence/zio-spark\"\u003e\n    \u003cimg src=\"https://codecov.io/gh/univalence/zio-spark/branch/master/graph/badge.svg\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://scala-steward.org\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Scala_Steward-helping-blue.svg?style=flat\u0026logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAAQCAMAAAARSr4IAAAAVFBMVEUAAACHjojlOy5NWlrKzcYRKjGFjIbp293YycuLa3pYY2LSqql4f3pCUFTgSjNodYRmcXUsPD/NTTbjRS+2jomhgnzNc223cGvZS0HaSD0XLjbaSjElhIr+AAAAAXRSTlMAQObYZgAAAHlJREFUCNdNyosOwyAIhWHAQS1Vt7a77/3fcxxdmv0xwmckutAR1nkm4ggbyEcg/wWmlGLDAA3oL50xi6fk5ffZ3E2E3QfZDCcCN2YtbEWZt+Drc6u6rlqv7Uk0LdKqqr5rk2UCRXOk0vmQKGfc94nOJyQjouF9H/wCc9gECEYfONoAAAAASUVORK5CYII=\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://index.scala-lang.org/univalence/zio-spark/zio-spark\"\u003e\n    \u003cimg src=\"https://index.scala-lang.org/univalence/zio-spark/zio-spark/latest-by-scala-version.svg?platform=jvm\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n   A functional wrapper around Spark to make it work with ZIO, \n \u003cbr\u003e improve error management and increase performances.\n\n\u003c/p\u003e\n\n## Documentation\n\nYou can find the documentation of zio-spark [here](https://univalence.github.io/zio-spark/).\nThe documentation covers additional subjects like `CancellableJobs`, code generation, ...\n\nYou can find the scaladoc of zio-spark [here](https://javadoc.io/doc/io.univalence/zio-spark_2.13/0.10.0/index.html).\n\n## Roadmap\n\n- [x] Exhaustive support of the apache.spark API (released april 2022, using ScalaMeta and code generation)\n- [x] Support for Scala 3 (released early december 2022)\n- [X] complete wrapper for SparkContext (small update)\n- [ ] port of spark-test to zio-spark (creation of zio-spark-test)\n- [ ] integration of typed-queries\n\n## Help\n\nYou can ask us (Dylan, Jonathan) for some help if you want to use the lib or have questions around it : \nhttps://calendly.com/zio-spark/help\n\n\n## Latest version\n\nIf you want to get the very last version of this library you can still download it using:\n\n```scala\nlibraryDependencies += \"io.univalence\" %% \"zio-spark\" % \"0.13.0\"\n```\n\n## Quickstart\n\n### Giter 8\n\nYou can use Gitter 8 to create an example application, with all the dependencies.\n\nFor Scala 2.13\n```bash\nsbt new univalence/zio-spark.g8\n``` \n\nFor Scala 3\n```bash\nsbt new univalence/zio-spark.g8 --useScala3=true\n```\n\n\n### Snapshots\n\nIf you want to get the latest snapshots (the version associated with the last commit on master), you can still download\nit using:\n\n```scala\nresolvers += Resolver.sonatypeRepo(\"snapshots\"),\nlibraryDependencies += \"io.univalence\" %% \"zio-spark\" % \"\u003cSNAPSHOT-VERSION\u003e\"\n```\n\nYou can find the latest version on \n[nexus repository manager](https://oss.sonatype.org/#nexus-search;gav~io.univalence~zio-spark_2.13~~~~kw,versionexpand).\n\n### Spark Version\n\nZIO-Spark is compatible with Scala 2.11, 2.12 and 2.13. Spark is provided, you must add your own Spark version in \nbuild.sbt (as you would usually). \n\n```scala\nlibraryDependencies ++= Seq(\n  \"io.univalence\"    %% \"zio-spark\"  % \"0.10.0\",\n  \"org.apache.spark\" %% \"spark-core\" % \"3.3.1\" % Provided,\n  \"org.apache.spark\" %% \"spark-sql\"  % \"3.3.1\" % Provided\n)\n```\n\n\nWe advise you to use the latest version of Spark for your scala version.\n\n## news 🎉 zio-direct support 🎉\n\nWe worked to make zio-spark available for Scala 3, so it works with [zio-direct](https://github.com/zio/zio-direct).\n\n```scala\nimport zio.*\nimport zio.direct.*\nimport zio.spark.sql.*\n\n//import for syntax + spark encoders\nimport zio.spark.sql.implicits.*\nimport scala3encoders.given\n\n//throwsAnalysisException directly\nimport zio.spark.sql.TryAnalysis.syntax.throwAnalysisException\n\nobject Main extends ZIOAppDefault {\n  val sparkSession = SparkSession.builder.master(\"local\").asLayer\n\n  override def run = {\n    defer {\n      val readBuild: RIO[SparkSession,DataFrame] = SparkSession.read.text(\"./build.sbt\")\n      val text: Dataset[String] = readBuild.run.as[String]\n\n      text.filter(_.contains(\"zio\")).show(truncate = false).run\n      \n      Console.printLine(\"what a time to be alive!\").run\n    }.provideLayer(sparkSession)\n  }\n}\n```\n\nbuild.sbt\n```scala\nscalaVersion := \"3.2.1\"\n\n\"dev.zio\" %% \"zio\" % \"2.0.5\",\n\"dev.zio\" % \"zio-direct_3\" % \"1.0.0-RC1\",\n\"io.univalence\" %% \"zio-spark\" % \"0.13.0\",\n(\"org.apache.spark\" %% \"spark-sql\" % \"3.3.1\" % Provided).cross(CrossVersion.for3Use2_13),\n(\"org.apache.hadoop\" % \"hadoop-client\" % \"3.3.1\" % Provided),\n\"dev.zio\" %% \"zio-test\" % \"2.0.5\" % Test\n```\n\n## Why ?\n\nThere are many reasons why we decide to build this library, such as:\n* allowing user to build Spark pipeline with ZIO easily.\n* making *better code*, pure FP, more composable, more readable Spark code.\n* stopping the propagation of ```implicit SparkSessions```.\n* improving some performances.\n* taking advantage of ZIO allowing our jobs to retry and to be run in parallel.\n\n\n## Design\n\"What if Spark was using better functional programming and an effect system?\"\n\nzio-spark is built with this main idea in mind, to rewrite the existing API in Spark using better \nfunctional programming principle. You will find a corresponding type for the existing API : \n\n\n| org.apache.spark | zio.spark        |\n|------------------|------------------|\n| sql.Dataset      | sql.Dataset      |\n| sql.SparkSession | sql.SparkSession |\n| ...              | ...              |\n\n\n\nIt comes with different API, for example : \n\n```scala\n/**\n * Returns the number of rows in the Dataset.\n * @group action\n * @since 1.6.0\n */\ndef count(implicit trace: Trace): Task[Long]\n```\ncompare to\n```scala\n/**\n * Returns the number of rows in the Dataset.\n * @group action\n * @since 1.6.0\n */\ndef count(): Long\n```\n\nAnother example, with errors, which allows you to handle the case where the column do not exist : \n```scala\n/**\n * Selects column based on the column name and returns it as a\n * [[Column]].\n *\n * @note\n *   The column name can also reference to a nested column like `a.b`.\n *\n * @group untypedrel\n * @since 2.0.0\n */\ndef col(colName: String): TryAnalysis[Column]\n```\ncompare to\n```scala\ndef col(colName: String): Column\n```\n\n### Existing code\n\nzio-spark can be use with existing Spark code, without modifications : \n\n```scala\ndef existingCode(implicit ss:org.apache.spark.sql.SparkSession):org.apache.spark.sql.Dataset[String] = {\n  import ss.implicits._\n  ss.read.parquet(\"toto.parquet\").as[String]\n}\n\n//...\n\nval out= \n  zio.spark.sql.fromSpark(existingCode).flatMap(ds =\u003e ZIO.attempt(ds.count()))\n\n//or lift using .zioSpark to start using the new API\n\nval out = \n  zio.spark.sql.fromSpark(existingCode).flatMap(_.zioSpark.count)\n\n```\n\nOne of the core principle is you should be able to integrate zio-spark into an existing codebase, without\nmajor modifications. In most case you can even just change the imports, and fix the compilation errors related \nto effects (dataset reads, job launches, ...).\n\n\n## Is it production ready?\nIt's not as battle tested as it should be at the moment, \nwe are migrating progressively existing projects to this new version.\n\n## Why didn't we hear about it before?\nWe did a conference talk at the end 2019 ( https://www.youtube.com/watch?v=1ttsi0YwMkI ) on it, but in French.\n\u003cbr\u003e Strangely there have been fewer conferences in 2020 - 2021 - ... or we have been very busy at work.\n\nWith the rewrite in 2022, we will do some conference with the new design in 2023, in French and in English to present the project.\n\n\n## Alternatives\n\n- [ZparkIO](https://github.com/leobenkel/ZparkIO) a framework for Spark, ZIO\n\n\n## Spark with Scala3\n- [iskra](https://github.com/VirtusLab/iskra) from VirtusLab, and interresting take and typesafety for Spark, without compromises on performance.\n- [spark-scala3](https://github.com/vincenzobaz/spark-scala3), one of our dependency to support encoders for Spark in Scala3.\n\n\n## Contributions\n\nPull requests are welcomed. We are open to organize pair-programming session to tackle improvements. If you want to add\nnew things in `zio-spark`, don't hesitate to open an issue!\n\nYou can also talk to us directly using this link if you are interested to contribute \nhttps://calendly.com/zio-spark/contribution.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funivalence%2Fzio-spark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funivalence%2Fzio-spark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funivalence%2Fzio-spark/lists"}