{"id":28446776,"url":"https://github.com/hablapps/sparkoptics","last_synced_at":"2025-06-30T06:32:37.805Z","repository":{"id":54174657,"uuid":"173307399","full_name":"hablapps/sparkOptics","owner":"hablapps","description":"Optics for Spark DataFrames","archived":false,"fork":false,"pushed_at":"2021-03-05T09:39:29.000Z","size":60,"stargazers_count":47,"open_issues_count":1,"forks_count":6,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-06-06T11:08:08.157Z","etag":null,"topics":["dataframe","dataframes","optics","scala","spark","spark-sql"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hablapps.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-01T13:34:29.000Z","updated_at":"2024-08-18T09:16:51.000Z","dependencies_parsed_at":"2022-08-13T08:20:40.208Z","dependency_job_id":null,"html_url":"https://github.com/hablapps/sparkOptics","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/hablapps/sparkOptics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hablapps%2FsparkOptics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hablapps%2FsparkOptics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hablapps%2FsparkOptics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hablapps%2FsparkOptics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hablapps","download_url":"https://codeload.github.com/hablapps/sparkOptics/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hablapps%2FsparkOptics/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262723770,"owners_count":23354107,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataframe","dataframes","optics","scala","spark","spark-sql"],"created_at":"2025-06-06T11:08:07.764Z","updated_at":"2025-06-30T06:32:37.771Z","avatar_url":"https://github.com/hablapps.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.com/hablapps/sparkOptics.svg?token=pvJZNjJ8hxxoMyPVvQ8u\u0026branch=master)](https://travis-ci.com/hablapps/sparkOptics)\n[![Maven Central](https://img.shields.io/maven-central/v/org.hablapps/spark-optics_2.11.svg)](https://maven-badges.herokuapp.com/maven-central/org.hablapps/spark-optics_2.11)\n[![Gitter](https://badges.gitter.im/hablapps/sparkOptics.svg)](https://gitter.im/hablapps/sparkOptics?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hablapps/sparkOptics/binder?filepath=%2Fnotebooks%2FSparkLenses.ipynb)\n\n# Spark-optics\nModify your complex structures in spark-sql dataframes with optics.\n\n## Getting Started\n\nNeed to set an inner element in a complex structure?\n\n```scala\nimport org.apache.spark.sql.DataFrame\nimport org.apache.spark.sql.functions.lit\nval df: DataFrame = ???\n\nimport org.hablapps.sparkOptics._\ndf.select(Lens(\"field.subfield\")(df.schema).set(lit(13)):_*)\n```\n\nWant to try it right now, click on the binder icon to lunch a interactive notebook.\n\n### Installing\n\nCompiled for Scala 2.11 with Spark 2.3 and Scala 2.12 with Spark 2.4. Tested with spark 2.3, 2.4 in scala 2.11 and for 3.0.1 in scala 2.12.\n\n```sbtshell\nlibraryDependencies += \"org.hablapps\" %% \"spark-optics\" % \"1.0.0\"\n```\n\nSpark lens doesn't have any dependencies beyond Spark itself.\n\n### Implemented optics\n\n#### Lens\nOptic to focus in a column of a provided schema.\n\n#### Protolens\nOptic equivalent to a lens, but without an specific schema, so that it can be applied to any dataframe that contains the specified column.\n\n### Motivation and larger example\nWorking with complex structures in Spark Sql's DataFrames can be hard. \nA common case if you are working with complex structures is to modify the inner elements of the structure. For instance:\n\n```scala\ncase class Street(number: Int, name: String)\ncase class Address(city: String, street: Street)\ncase class Company(name: String, address: Address)\ncase class Employee(name: String, company: Company)\n\nval employee = Employee(\"john\", Company(\"awesome inc\", Address(\"london\", Street(23, \"high street\"))))\nval df = List(employee).toDS.toDF\n```\n```\nroot\n |-- name: string (nullable = true)\n |-- company: struct (nullable = true)\n |    |-- name: string (nullable = true)\n |    |-- address: struct (nullable = true)\n |    |    |-- city: string (nullable = true)\n |    |    |-- street: struct (nullable = true)\n |    |    |    |-- number: integer (nullable = false)\n |    |    |    |-- name: string (nullable = true)\n ```\n \nIn order to modify an inner element, like changing the name of the street, we need to do something like this:\n\n```scala\nval mDF = df.select(df(\"name\"),struct(\n   df(\"company.name\").as(\"name\"),\n   struct(\n     df(\"company.address.city\").as(\"city\"),\n     struct(\n       df(\"company.address.street.number\").as(\"number\"),\n       upper(df(\"company.address.street.name\")).as(\"name\")\n     ).as(\"street\")\n   ).as(\"address\")\n ).as(\"company\"))\nmDF.printSchema\nval longCodeEmployee = mDF.as[Employee].head\n```\n```\nroot\n  |-- name: string (nullable = true)\n  |-- company: struct (nullable = false)\n  |    |-- name: string (nullable = true)\n  |    |-- address: struct (nullable = false)\n  |    |    |-- city: string (nullable = true)\n  |    |    |-- street: struct (nullable = false)\n  |    |    |    |-- number: integer (nullable = true)\n  |    |    |    |-- name: string (nullable = true)\n \nmDF: DataFrame = [name: string, company: struct\u003cname: string, address: struct\u003ccity: string, street: struct\u003cnumber: int, name: string\u003e\u003e\u003e]\nlongCodeEmployee: Employee = Employee(\n\"john\",\nCompany(\"awesome inc\", Address(\"london\", Street(23, \"HIGH STREET\"))))\n```\n \nThis can be simplified by using spark-optics. It allows you to focus in the element that you want to modify,\nand the optics will recreate the structure for you.\n\n```scala\nimport org.hablapps.sparkOptics._\nimport org.apache.spark.sql.functions._\n\nval df: DataFrame = List(employee).toDF\n\nval streetNameLens = Lens(\"company.address.street.name\")(df.schema)\nval modifiedDF = df.select(streetNameLens.modify(upper):_*)\nmodifiedDF.printSchema\nmodifiedDF.as[Employee].head\n```\n```\nroot\n|-- name: string (nullable = true)\n|-- company: struct (nullable = false)\n|    |-- name: string (nullable = true)\n|    |-- address: struct (nullable = false)\n|    |    |-- city: string (nullable = true)\n|    |    |-- street: struct (nullable = false)\n|    |    |    |-- number: integer (nullable = true)\n|    |    |    |-- name: string (nullable = true)\n\nstreetNameLens: Lens = Lens(company.address.street.name)\nmodifiedDF: DataFrame = [name: string, company: struct\u003cname: string, address: struct\u003ccity: string, street: struct\u003cnumber: int, name: string\u003e\u003e\u003e]\nres19_3: Employee = Employee(\n\"john\",\nCompany(\"awesome inc\", Address(\"london\", Street(23, \"HIGH STREET\")))\n)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhablapps%2Fsparkoptics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhablapps%2Fsparkoptics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhablapps%2Fsparkoptics/lists"}