{"id":13806839,"url":"https://github.com/dotnet/spark","last_synced_at":"2025-05-11T03:46:44.653Z","repository":{"id":34742309,"uuid":"182849051","full_name":"dotnet/spark","owner":"dotnet","description":".NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.","archived":false,"fork":false,"pushed_at":"2025-05-07T12:28:52.000Z","size":5143,"stargazers_count":2058,"open_issues_count":193,"forks_count":326,"subscribers_count":84,"default_branch":"main","last_synced_at":"2025-05-11T03:46:36.875Z","etag":null,"topics":["analytics","apache-spark","azure","bigdata","csharp","databricks","dotnet","dotnet-core","dotnet-standard","emr","fsharp","hdinsight","machine-learning","microsoft","spark","spark-sql","spark-streaming","streaming","tpcds","tpch"],"latest_commit_sha":null,"homepage":"https://dot.net/spark","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dotnet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-04-22T18:55:55.000Z","updated_at":"2025-05-07T20:49:21.000Z","dependencies_parsed_at":"2023-01-15T09:01:26.032Z","dependency_job_id":"4ac8ced1-9edb-4d8b-b038-7f5f5e2b7df6","html_url":"https://github.com/dotnet/spark","commit_stats":{"total_commits":372,"total_committers":61,"mean_commits":6.098360655737705,"dds":0.7204301075268817,"last_synced_commit":"b63c08b87a060e5100392bcc0069b53d3a607fcf"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dotnet%2Fspark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dotnet%2Fspark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dotnet%2Fspark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dotnet%2Fspark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dotnet","download_url":"https://codeload.github.com/dotnet/spark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253514555,"owners_count":21920334,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","apache-spark","azure","bigdata","csharp","databricks","dotnet","dotnet-core","dotnet-standard","emr","fsharp","hdinsight","machine-learning","microsoft","spark","spark-sql","spark-streaming","streaming","tpcds","tpch"],"created_at":"2024-08-04T01:01:16.902Z","updated_at":"2025-05-11T03:46:44.623Z","avatar_url":"https://github.com/dotnet.png","language":"C#","readme":"[![NuGet Badge](https://buildstats.info/nuget/Microsoft.Spark)](https://www.nuget.org/packages/Microsoft.Spark)\n\n![Icon](docs/img/dotnetsparklogo-6.png)\n\n# .NET for Apache® Spark™\n\n.NET for Apache Spark provides high performance APIs for using [Apache Spark](https://spark.apache.org/) from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. \n\n.NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer. \n\n.NET for Apache Spark runs on Windows, Linux, and macOS using .NET 8, or Windows using .NET Framework. It also runs on all major cloud providers including [Azure HDInsight Spark](deployment/README.md#azure-hdinsight-spark), [Amazon EMR Spark](deployment/README.md#amazon-emr-spark), [AWS](deployment/README.md#databricks) \u0026 [Azure](deployment/README.md#databricks) Databricks.\n\n**Note**: We currently have a Spark Project Improvement Proposal JIRA at [SPIP: .NET bindings for Apache Spark](https://issues.apache.org/jira/browse/SPARK-27006) to work with the community towards getting .NET support by default into Apache Spark. We highly encourage you to participate in the discussion. \n\n## Table of Contents\n\n- [Supported Apache Spark](#supported-apache-spark)\n- [Releases](#releases)\n- [Get Started](#get-started)\n- [Build Status](#build-status)\n- [Building from Source](#building-from-source)\n- [Samples](#samples)\n- [Contributing](#contributing)\n- [Inspiration and Special Thanks](#inspiration-and-special-thanks)\n- [How to Engage, Contribute and Provide Feedback](#how-to-engage-contribute-and-provide-feedback)\n- [Support](#support)\n- [.NET Foundation](#net-foundation)\n- [Code of Conduct](#code-of-conduct)\n- [License](#license)\n\n## Supported Apache Spark\n\n\u003ctable\u003e\n    \u003cthead\u003e\n        \u003ctr\u003e\n            \u003cth\u003eApache Spark\u003c/th\u003e\n            \u003cth\u003e.NET for Apache Spark\u003c/th\u003e\n        \u003c/tr\u003e\n    \u003c/thead\u003e\n    \u003ctbody align=\"center\"\u003e\n        \u003ctr\u003e\n            \u003ctd\u003e2.4*\u003c/td\u003e\n            \u003ctd rowspan=5\u003e\u003ca href=\"https://github.com/dotnet/spark/releases/tag/v2.3.0-rc1\"\u003ev2.3.0-rc1\u003c/a\u003e\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003e3.0\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003e3.1\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003e3.2\u003c/td\u003e\n        \u003c/tr\u003e        \n        \u003ctr\u003e\n            \u003ctd\u003e3.5\u003c/td\u003e\n        \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\n*2.4.2 is \u003ca href=\"https://github.com/dotnet/spark/issues/60\"\u003enot supported\u003c/a\u003e.\n\n## Releases\n\n.NET for Apache Spark releases are available [here](https://github.com/dotnet/spark/releases) and NuGet packages are available [here](https://www.nuget.org/packages/Microsoft.Spark).\n\n## Get Started\nThese instructions will show you how to run a .NET for Apache Spark app using .NET 8.\n- [Windows Instructions](docs/getting-started/windows-instructions.md)\n- [Ubuntu Instructions](docs/getting-started/ubuntu-instructions.md)\n- [MacOs Instructions](docs/getting-started/macos-instructions.md)\n\n## Build Status\n\n| ![Ubuntu icon](docs/img/ubuntu-icon-32.png) | ![Windows icon](docs/img/windows-icon-32.png) |\n| :---:         |          :---: |\n| Ubuntu | Windows |\n| | [![Build Status](https://dnceng.visualstudio.com/public/_apis/build/status/dotnet.spark?branchName=main)](https://dev.azure.com/dnceng/public/_build?definitionId=459\u0026branchName=main)|\n\n## Building from Source\n\nBuilding from source is very easy and the whole process (from cloning to being able to run your app) should take less than 15 minutes!\n\n| |  | Instructions |\n| :---: | :---         |      :--- |\n| ![Windows icon](docs/img/windows-icon-32.png) | **Windows**    | \u003cul\u003e\u003cli\u003eLocal - [.NET Framework 4.8](docs/building/windows-instructions.md#using-visual-studio-for-net-framework)\u003c/li\u003e\u003cli\u003eLocal - [.NET 8](docs/building/windows-instructions.md#using-net-core-cli-for-net-core)\u003c/li\u003e\u003cul\u003e    |\n| ![Ubuntu icon](docs/img/ubuntu-icon-32.png) | **Ubuntu**     | \u003cul\u003e\u003cli\u003eLocal - [.NET 8](docs/building/ubuntu-instructions.md)\u003c/li\u003e\u003cli\u003e[Azure HDInsight Spark - .NET 8](deployment/README.md)\u003c/li\u003e\u003c/ul\u003e      |\n\n\u003ca name=\"samples\"\u003e\u003c/a\u003e\n## Samples\n\nThere are two types of samples/apps in the .NET for Apache Spark repo:\n\n* ![Icon](docs/img/app-type-getting-started.png) Getting Started - .NET for Apache Spark code focused on simple and minimalistic scenarios.\n\n* ![Icon](docs/img/app-type-e2e.png)  End-End apps/scenarios - Real world examples of industry standard benchmarks, usecases and business applications implemented using .NET for Apache Spark. \n\nWe welcome contributions to both categories!\n\n\u003ctable\u003e\n \u003ctr\u003e\n   \u003ctd width=\"25%\"\u003e\n      \u003ch4\u003e\u003cb\u003eAnalytics Scenario\u003c/b\u003e\u003c/h4\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n      \u003ch4 width=\"35%\"\u003e\u003cb\u003eDescription\u003c/b\u003e\u003c/h4\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n      \u003ch4\u003e\u003cb\u003eScenarios\u003c/b\u003e\u003c/h4\u003e\n  \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n   \u003ctd width=\"25%\"\u003e\n      \u003ch5\u003eDataframes and SparkSQL\u003c/h5\u003e\n  \u003c/td\u003e\n  \u003ctd width=\"35%\"\u003e\n  Simple code snippets to help you get familiarized with the programmability experience of .NET for Apache Spark.\n  \u003c/td\u003e\n    \u003ctd\u003e\n      \u003ch5\u003eBasic \u0026nbsp;\u0026nbsp;\u0026nbsp;\n      \u003ca href=\"examples/Microsoft.Spark.CSharp.Examples/Sql/Batch/Basic.cs\"\u003eC#\u003c/a\u003e \u0026nbsp; \u0026nbsp; \u003ca href=\"examples/Microsoft.Spark.FSharp.Examples/Sql/Basic.fs\"\u003eF#\u003c/a\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"#\"\u003e\u003cimg src=\"docs/img/app-type-getting-started.png\" alt=\"Getting started icon\"\u003e\u003c/a\u003e\u003c/h5\u003e\n  \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n   \u003ctd width=\"25%\"\u003e\n      \u003ch5\u003eStructured Streaming\u003c/h5\u003e\n  \u003c/td\u003e\n  \u003ctd width=\"35%\"\u003e\n      Code snippets to show you how to utilize Apache Spark's Structured Streaming (\u003ca href=\"https://spark.apache.org/docs/2.3.1/structured-streaming-programming-guide.html\"\u003e2.3.1\u003c/a\u003e, \u003ca href=\"https://spark.apache.org/docs/2.3.2/structured-streaming-programming-guide.html\"\u003e2.3.2\u003c/a\u003e, \u003ca href=\"https://spark.apache.org/docs/2.4.1/structured-streaming-programming-guide.html\"\u003e2.4.1\u003c/a\u003e, \u003ca href=\"https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html\"\u003eLatest\u003c/a\u003e)\n  \u003c/td\u003e\n  \u003ctd\u003e\n      \u003ch5\u003eWord Count \u0026nbsp;\u0026nbsp;\u0026nbsp;\n      \u003ca href=\"examples/Microsoft.Spark.CSharp.Examples/Sql/Streaming/StructuredNetworkWordCount.cs\"\u003eC#\u003c/a\u003e \u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"examples/Microsoft.Spark.FSharp.Examples/Sql/Streaming/StructuredNetworkWordCount.fs\"\u003eF#\u003c/a\u003e \u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"#\"\u003e\u003cimg src=\"docs/img/app-type-getting-started.png\" alt=\"Getting started icon\"\u003e\u003c/a\u003e\u003c/h5\u003e\n      \u003ch5\u003eWindowed Word Count \u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"examples/Microsoft.Spark.CSharp.Examples/Sql/Streaming/StructuredNetworkWordCountWindowed.cs\"\u003eC#\u003c/a\u003e \u0026nbsp; \u0026nbsp;\u003ca href=\"examples/Microsoft.Spark.FSharp.Examples/Sql/Streaming/StructuredNetworkWordCountWindowed.fs\"\u003eF#\u003c/a\u003e \u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"#\"\u003e\u003cimg src=\"docs/img/app-type-getting-started.png\" alt=\"Getting started icon\"\u003e\u003c/a\u003e\u003c/h5\u003e      \n      \u003ch5\u003eWord Count on data from \u003ca href=\"https://kafka.apache.org/\"\u003eKafka\u003c/a\u003e \u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"examples/Microsoft.Spark.CSharp.Examples/Sql/Streaming/StructuredKafkaWordCount.cs\"\u003eC#\u003c/a\u003e \u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"examples/Microsoft.Spark.FSharp.Examples/Sql/Streaming/StructuredKafkaWordCount.fs\"\u003eF#\u003c/a\u003e \u0026nbsp; \u0026nbsp;\u0026nbsp;\u003ca href=\"#\"\u003e\u003cimg src=\"docs/img/app-type-getting-started.png\" alt=\"Getting started icon\"\u003e\u003c/a\u003e\u003c/h5\u003e\n  \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n   \u003ctd width=\"25%\"\u003e\n      \u003ch4\u003eTPC-H Queries\u003c/h4\u003e\n  \u003c/td\u003e\n  \u003ctd width=\"35%\"\u003e\n  Code to show you how to author complex queries using .NET for Apache Spark.\n  \u003c/td\u003e\n  \u003ctd\u003e\n      \u003ch5\u003eTPC-H Functional \u0026nbsp;\u0026nbsp;\u0026nbsp;\n      \u003ca href=\"benchmark/csharp/Tpch/TpchFunctionalQueries.cs\"\u003eC#\u003c/a\u003e \u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"#\"\u003e\u003cimg src=\"docs/img/app-type-e2e.png\" alt=\"End-to-end app icon\"\u003e\u003c/a\u003e\u003c/h5\u003e\n      \u003ch5\u003eTPC-H SparkSQL \u0026nbsp;\u0026nbsp;\u0026nbsp;\n      \u003ca href=\"benchmark/csharp/Tpch/TpchSqlQueries.cs\"\u003eC#\u003c/a\u003e  \u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"#\"\u003e\u003cimg src=\"docs/img/app-type-e2e.png\" alt=\"End-to-end app icon\"\u003e\u003c/a\u003e\u003c/h5\u003e\n  \u003c/td\u003e\n\u003c/tr\u003e\n \u003c/tr\u003e \n \u003c/table\u003e\n\n## Contributing\n\nWe welcome contributions! Please review our [contribution guide](CONTRIBUTING.md).\n\n## Inspiration and Special Thanks\n\nThis project would not have been possible without the outstanding work from the following communities:\n\n- [Apache Spark](https://spark.apache.org/): Unified Analytics Engine for Big Data, the underlying backend execution engine for .NET for Apache Spark\n- [Mobius](https://github.com/Microsoft/Mobius): C# and F# language binding and extensions to Apache Spark, a pre-cursor project to .NET for Apache Spark from the same Microsoft group.\n- [PySpark](https://spark.apache.org/docs/latest/api/python/index.html): Python bindings for Apache Spark, one of the implementations .NET for Apache Spark derives inspiration from. \n- [sparkR](https://spark.apache.org/docs/latest/sparkr.html): one of the implementations .NET for Apache Spark derives inspiration from.\n- [Apache Arrow](https://arrow.apache.org/): A cross-language development platform for in-memory data. This library provides .NET for Apache Spark with efficient ways to transfer column major data between the JVM and .NET CLR.\n- [Pyrolite](https://github.com/irmen/Pyrolite) - Java and .NET interface to Python's pickle and Pyro protocols. This library provides .NET for Apache Spark with efficient ways to transfer row major data between the JVM and .NET CLR. \n- [Databricks](https://databricks.com/): Unified analytics platform. Many thanks to all the suggestions from them towards making .NET for Apache Spark run on Azure and AWS Databricks.\n\n## How to Engage, Contribute and Provide Feedback\n\nThe .NET for Apache Spark team encourages [contributions](docs/contributing.md), both issues and PRs. The first step is finding an [existing issue](https://github.com/dotnet/spark/issues) you want to contribute to or if you cannot find any, [open an issue](https://github.com/dotnet/spark/issues?utf8=%E2%9C%93\u0026q=is%3Aissue+is%3Aopen+).\n\n## Support\n\n[.NET for Apache Spark](https://github.com/dotnet/spark) is an open source project under the [.NET Foundation](https://dotnetfoundation.org/) and \ndoes not come with Microsoft Support unless otherwise noted by the specific product. For issues with or questions about .NET for Apache Spark, please [create an issue](https://github.com/dotnet/spark/issues). The community is active and is monitoring submissions.\n\n## .NET Foundation\n\nThe .NET for Apache Spark project is part of the [.NET Foundation](http://www.dotnetfoundation.org).\n\n## Code of Conduct\n\nThis project has adopted the code of conduct defined by the [Contributor Covenant](https://contributor-covenant.org/)\nto clarify expected behavior in our community.\nFor more information, see the [.NET Foundation Code of Conduct](https://dotnetfoundation.org/code-of-conduct).\n\n\u003ca name=\"license\"\u003e\u003c/a\u003e\n## License\n\n.NET for Apache Spark is licensed under the [MIT license](LICENSE).\n","funding_links":[],"categories":["C# #","Packages","Big Data","🗒️ Cheatsheets"],"sub_categories":["Language Bindings","LINQ","📦 Libraries"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdotnet%2Fspark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdotnet%2Fspark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdotnet%2Fspark/lists"}