{"id":13407707,"url":"https://github.com/Microsoft/Mobius","last_synced_at":"2025-03-14T12:31:19.733Z","repository":{"id":37953173,"uuid":"45064427","full_name":"microsoft/Mobius","owner":"microsoft","description":"C# and F# language binding and extensions to Apache Spark","archived":false,"fork":false,"pushed_at":"2024-01-31T02:51:58.000Z","size":6758,"stargazers_count":940,"open_issues_count":54,"forks_count":211,"subscribers_count":141,"default_branch":"master","last_synced_at":"2025-03-13T01:21:20.965Z","etag":null,"topics":["apache-spark","bigdata","csharp","dataframe","dataset","dstream","eventhubs","fsharp","kafka-streaming","mapreduce","mobius","near-real-time","rdd","spark","spark-streaming","streaming"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-10-27T19:21:55.000Z","updated_at":"2025-02-16T19:57:50.000Z","dependencies_parsed_at":"2024-05-05T06:32:48.733Z","dependency_job_id":"4435723d-4987-4bf8-b9e7-04c6047ec3b5","html_url":"https://github.com/microsoft/Mobius","commit_stats":null,"previous_names":["microsoft/sparkclr"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FMobius","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FMobius/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FMobius/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FMobius/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/Mobius/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243577910,"owners_count":20313718,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","bigdata","csharp","dataframe","dataset","dstream","eventhubs","fsharp","kafka-streaming","mapreduce","mobius","near-real-time","rdd","spark","spark-streaming","streaming"],"created_at":"2024-07-30T20:00:47.929Z","updated_at":"2025-03-14T12:31:19.697Z","avatar_url":"https://github.com/microsoft.png","language":"C#","readme":"# Mobius development is deprecated and has been superseded by a more recent version '.NET for Apache Spark' from Microsoft ([Website](https://dot.net/spark) | [GitHub](https://github.com/dotnet/spark)) that runs on Azure HDInsight Spark, Amazon EMR Spark, Azure \u0026 AWS Databricks.\n\n\u003cimg src='logo/mobius-star-200.png' width='125px' alt='Mobius logo' /\u003e\n\n# Mobius: C# API for Spark\n\n[Mobius](https://github.com/Microsoft/Mobius) provides C# language binding to [Apache Spark](https://spark.apache.org/) enabling the implementation of Spark driver program and data processing operations in the languages supported in the .NET framework like C# or F#.\n\nFor example, the word count sample in Apache Spark can be implemented in C# as follows :\n\n```c#\nvar lines = sparkContext.TextFile(@\"hdfs://path/to/input.txt\");  \nvar words = lines.FlatMap(s =\u003e s.Split(' '));\nvar wordCounts = words.Map(w =\u003e new Tuple\u003cstring, int\u003e(w.Trim(), 1))  \n                      .ReduceByKey((x, y) =\u003e x + y);  \nvar wordCountCollection = wordCounts.Collect();  \nwordCounts.SaveAsTextFile(@\"hdfs://path/to/wordcount.txt\");  \n```\n\nA simple DataFrame application using TempTable may look like the following:\n\n```c#\nvar reqDataFrame = sqlContext.TextFile(@\"hdfs://path/to/requests.csv\");\nvar metricDataFrame = sqlContext.TextFile(@\"hdfs://path/to/metrics.csv\");\nreqDataFrame.RegisterTempTable(\"requests\");\nmetricDataFrame.RegisterTempTable(\"metrics\");\n// C0 - guid in requests DataFrame, C3 - guid in metrics DataFrame  \nvar joinDataFrame = GetSqlContext().Sql(  \n    \"SELECT joinedtable.datacenter\" +\n         \", MAX(joinedtable.latency) maxlatency\" +\n         \", AVG(joinedtable.latency) avglatency \" +\n    \"FROM (\" +\n       \"SELECT a.C1 as datacenter, b.C6 as latency \" +  \n       \"FROM requests a JOIN metrics b ON a.C0  = b.C3) joinedtable \" +   \n    \"GROUP BY datacenter\");\njoinDataFrame.ShowSchema();\njoinDataFrame.Show();\n```\n\nA simple DataFrame application using DataFrame DSL may look like the following:\n\n```  c#\n// C0 - guid, C1 - datacenter\nvar reqDataFrame = sqlContext.TextFile(@\"hdfs://path/to/requests.csv\")  \n                             .Select(\"C0\", \"C1\");    \n// C3 - guid, C6 - latency   \nvar metricDataFrame = sqlContext.TextFile(@\"hdfs://path/to/metrics.csv\", \",\", false, true)\n                                .Select(\"C3\", \"C6\"); //override delimiter, hasHeader \u0026 inferSchema\nvar joinDataFrame = reqDataFrame.Join(metricDataFrame, reqDataFrame[\"C0\"] == metricDataFrame[\"C3\"])\n                                .GroupBy(\"C1\");\nvar maxLatencyByDcDataFrame = joinDataFrame.Agg(new Dictionary\u003cstring, string\u003e { { \"C6\", \"max\" } });\nmaxLatencyByDcDataFrame.ShowSchema();\nmaxLatencyByDcDataFrame.Show();\n```\n\nA simple Spark Streaming application that processes messages from Kafka using C# may be implemented using the following code:\n\n```  c#\nStreamingContext sparkStreamingContext = StreamingContext.GetOrCreate(checkpointPath, () =\u003e\n    {\n      var ssc = new StreamingContext(sparkContext, slideDurationInMillis);\n      ssc.Checkpoint(checkpointPath);\n      var stream = KafkaUtils.CreateDirectStream(ssc, topicList, kafkaParams, perTopicPartitionKafkaOffsets);\n      //message format: [timestamp],[loglevel],[logmessage]\n      var countByLogLevelAndTime = stream\n                                    .Map(kvp =\u003e Encoding.UTF8.GetString(kvp.Value))\n                                    .Filter(line =\u003e line.Contains(\",\"))\n                                    .Map(line =\u003e line.Split(','))\n                                    .Map(columns =\u003e new Tuple\u003cstring, int\u003e(\n                                                          string.Format(\"{0},{1}\", columns[0], columns[1]), 1))\n                                    .ReduceByKeyAndWindow((x, y) =\u003e x + y, (x, y) =\u003e x - y,\n                                                          windowDurationInSecs, slideDurationInSecs, 3)\n                                    .Map(logLevelCountPair =\u003e string.Format(\"{0},{1}\",\n                                                          logLevelCountPair.Key, logLevelCountPair.Value));\n      countByLogLevelAndTime.ForeachRDD(countByLogLevel =\u003e\n      {\n          foreach (var logCount in countByLogLevel.Collect())\n              Console.WriteLine(logCount);\n      });\n      return ssc;\n    });\nsparkStreamingContext.Start();\nsparkStreamingContext.AwaitTermination();\n```\nFor more code samples, refer to [Mobius\\examples](./examples) directory or [Mobius\\csharp\\Samples](./csharp/Samples) directory.\n\n## API Documentation\n\nRefer to [Mobius C# API documentation](./csharp/Adapter/documentation/Mobius_API_Documentation.md) for the list of Spark's data processing operations supported in Mobius.\n\n## API Usage\n\nMobius API usage samples are available at:\n\n* [Examples folder](./examples) which contains standalone [C# and F# projects](./notes/running-mobius-app.md#running-mobius-examples-in-local-mode) that can be used as templates to start developing Mobius applications\n\n* [Samples project](./csharp/Samples/Microsoft.Spark.CSharp/) which uses a comprehensive set of Mobius APIs to implement samples that are also used for functional validation of APIs\n\n* Mobius performance test scenarios implemented in [C#](./csharp/Perf/Microsoft.Spark.CSharp) and [Scala](./scala/perf) for side by side comparison of Spark driver code\n\n## Documents\n\nRefer to the [docs folder](docs) for design overview and other info on Mobius\n\n## Build Status\n\n|Ubuntu 14.04.3 LTS |Windows |Unit test coverage |\n|-------------------|:------:|:-----------------:|\n|[![Build status](https://travis-ci.org/Microsoft/Mobius.svg?branch=master)](https://travis-ci.org/Microsoft/Mobius) |[![Build status](https://ci.appveyor.com/api/projects/status/lflkua81gg0swv6i/branch/master?svg=true)](https://ci.appveyor.com/project/SparkCLR/sparkclr/branch/master) |[![codecov.io](https://codecov.io/github/Microsoft/Mobius/coverage.svg?branch=master)](https://codecov.io/github/Microsoft/Mobius?branch=master)\n\n## Getting Started\n\n| |Windows |Linux |\n|---|:------|:----|\n|Build \u0026 run unit tests |[Build in Windows](notes/windows-instructions.md#building-mobius) |[Build in Linux](notes/linux-instructions.md#building-mobius-in-linux) |\n|Run samples (functional tests) in local mode |[Samples in Windows](notes/windows-instructions.md#running-samples) |[Samples in Linux](notes/linux-instructions.md#running-mobius-samples-in-linux) |\n|Run examples in local mode |[Examples in Windows](/notes/running-mobius-app.md#running-mobius-examples-in-local-mode) |[Examples in Linux](notes/linux-instructions.md#running-mobius-examples-in-linux) |\n|Run Mobius app |\u003cul\u003e\u003cli\u003e[Standalone cluster](notes/running-mobius-app.md#standalone-cluster)\u003c/li\u003e\u003cli\u003e[YARN cluster](notes/running-mobius-app.md#yarn-cluster)\u003c/li\u003e\u003c/ul\u003e |\u003cul\u003e\u003cli\u003e[Linux cluster](notes/linux-instructions.md#running-mobius-applications-in-linux)\u003c/li\u003e\u003cli\u003e[Azure HDInsight Spark Cluster](/notes/mobius-in-hdinsight.md)\u003c/li\u003e\u003cli\u003e[AWS EMR Spark Cluster](/notes/linux-instructions.md#mobius-in-amazon-web-services-emr-spark-cluster)\u003c/li\u003e |\n|Run Mobius Shell |\u003cul\u003e\u003cli\u003e[Local](notes/mobius-shell.md#run-shell)\u003c/li\u003e\u003cli\u003e[YARN](notes/mobius-shell.md#run-shell)\u003c/li\u003e\u003c/ul\u003e | Not supported yet |\n\n### Useful Links\n* [Configuration parameters in Mobius](./notes/configuration-mobius.md)\n* [Troubleshoot errors in Mobius](./notes/troubleshooting-mobius.md)\n* [Debug Mobius apps](./notes/running-mobius-app.md#debug-mode)\n* [Implementing Spark Apps in F# using Mobius](./notes/spark-fsharp-mobius.md)\n\n## Supported Spark Versions\n\nMobius is built and tested with Apache Spark [1.4.1](https://github.com/Microsoft/Mobius/tree/branch-1.4), [1.5.2](https://github.com/Microsoft/Mobius/tree/branch-1.5), [1.6.*](https://github.com/Microsoft/Mobius/tree/branch-1.6) and [2.0](https://github.com/Microsoft/Mobius/tree/branch-2.0).\n\n## Releases\n\nMobius releases are available at https://github.com/Microsoft/Mobius/releases. References needed to build C# Spark driver applicaiton using Mobius are also available in [NuGet](https://www.nuget.org/packages/Microsoft.SparkCLR)\n\n[![NuGet Badge](https://buildstats.info/nuget/Microsoft.SparkCLR)](https://www.nuget.org/packages/Microsoft.SparkCLR)\n\nRefer to [mobius-release-info.md](./notes/mobius-release-info.md) for the details on versioning policy and the contents of the release.\n\n## License\n\n[![License](https://img.shields.io/badge/license-MIT-blue.svg?style=plastic)](https://github.com/Microsoft/Mobius/blob/master/LICENSE)\n\nMobius is licensed under the MIT license. See [LICENSE](LICENSE) file for full license information.\n\n\n## Community\n\n[![Issue Stats](http://issuestats.com/github/Microsoft/Mobius/badge/pr)](http://issuestats.com/github/Microsoft/Mobius)\n[![Issue Stats](http://issuestats.com/github/Microsoft/Mobius/badge/issue)](http://issuestats.com/github/Microsoft/Mobius)\n[![Join the chat at https://gitter.im/Microsoft/Mobius](https://badges.gitter.im/Microsoft/Mobius.svg)](https://gitter.im/Microsoft/Mobius?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n[![Twitter](https://img.shields.io/twitter/url/http/twitter.com/MobiusForSpark.svg?style=social)](https://twitter.com/intent/tweet?text=%40MobiusForSpark%20%5Byour%20tweet%5D%20via%20%40GitHub)\n\n* Mobius project welcomes contributions. To contribute, follow the instructions in [CONTRIBUTING.md](./notes/CONTRIBUTING.md)\n\n* Options to ask your question to the Mobius community\n  * create issue on [GitHub](https://github.com/Microsoft/Mobius)\n  * create post with \"sparkclr\" tag in [Stack Overflow](https://stackoverflow.com/questions/tagged/sparkclr)\n  * join chat at [Mobius room in Gitter](https://gitter.im/Microsoft/Mobius)\n  * tweet [@MobiusForSpark](http://twitter.com/MobiusForSpark)\n\n## Code of Conduct\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n","funding_links":[],"categories":["API","Packages"],"sub_categories":["Language Bindings"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMicrosoft%2FMobius","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMicrosoft%2FMobius","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMicrosoft%2FMobius/lists"}