{"id":25290595,"url":"https://github.com/wesslen/code-tutorials-for-sophi","last_synced_at":"2025-04-06T18:22:33.735Z","repository":{"id":81790225,"uuid":"73866319","full_name":"wesslen/Code-Tutorials-for-SOPHI","owner":"wesslen","description":"Tutorials and templates for running Spark on UNCC's SOPHI platform","archived":false,"fork":false,"pushed_at":"2017-02-15T01:01:11.000Z","size":18,"stargazers_count":1,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-13T00:30:26.436Z","etag":null,"topics":["pyspark","scala","spark-sql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wesslen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-11-16T00:01:54.000Z","updated_at":"2022-08-05T17:46:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"2aeec77e-a29a-4ddc-bd23-6d229426b09e","html_url":"https://github.com/wesslen/Code-Tutorials-for-SOPHI","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wesslen%2FCode-Tutorials-for-SOPHI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wesslen%2FCode-Tutorials-for-SOPHI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wesslen%2FCode-Tutorials-for-SOPHI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wesslen%2FCode-Tutorials-for-SOPHI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wesslen","download_url":"https://codeload.github.com/wesslen/Code-Tutorials-for-SOPHI/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247527341,"owners_count":20953235,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pyspark","scala","spark-sql"],"created_at":"2025-02-13T00:26:53.717Z","updated_at":"2025-04-06T18:22:33.728Z","avatar_url":"https://github.com/wesslen.png","language":null,"readme":"# SOPHI Code\n\n## Introduction\n\nThis repository provides template code for running Spark on [SOPHI](http://sophi.uncc.edu). The code will include a mixture of Scala, PySpark and SparkR.\n\n## Code\n\n| Topics                                                          |\n| --------------------------------------------------------------- |\n| [Twitter Gnip SQL-DataFrame Manipulation with PySpark](/code/PySpark-Dataframe-Processing.md)              |\n| [Twitter Gnip Summary Count Files with PySpark](/code/PySpark-Gnip-Twitter-Summary-Files.md)    |\n| [Twitter Gnip Latent Dirichlet Allocation with Scala](/code/Scala-LDA.md)    |\n\n## How to access SOPHI\n\nTo access SOPHI, you must have an active UNCC ID username (student, faculty or staff) and be connected to the UNCC network either directly (edu-roam) or through VPN. See this [link](https://faq.uncc.edu/pages/viewpage.action?pageId=6653379) on how to set up VPN access.\n\nThis link ([https://cci-hadoopm3.uncc.edu](https://cci-hadoopm3.uncc.edu)) provides access to SOPHI's Hue Interface.\n\nTo start, click this link and then when prompted, enter your UNCC ID and password.\n\n## How to open a Notebook\n\nWithin SOPHI, click the \"Notebook\" button on the top ribbon and click the \"+ Notebook\" button to create a new Notebook.\n\nOnce within a new Notebook, create a PySpark, Scala or SparkR (not available yet) session. \n\n## Further Links\n\n* [Spark Programming Guide](https://spark.apache.org/docs/latest/programming-guide.html)\n\n* [Spark SQL and DataFrames Tutorial](http://spark.apache.org/docs/latest/sql-programming-guide.html)\n\n* [Spark Machine Learning Library Tutorial](http://spark.apache.org/docs/latest/ml-guide.html)\n\n* [Databrick's Spark Guides](https://docs.cloud.databricks.com/docs/latest/databricks_guide/index.html)\n\n* [Automating PySpark Code through YARN and Oozie](http://gethue.com/how-to-schedule-spark-jobs-with-spark-on-yarn-and-oozie/)\n\n* [PySpark and nltk (Anaconda)](https://docs.continuum.io/anaconda-cluster/howto/spark-nltk)\n\n* [CY Lin's Big Data Analytics PySpark Tutorial](https://www.ee.columbia.edu/~cylin/course/bigdata/EECS6893-BigDataAnalytics-Lecture6.pdf)\n\n* [Matteo Redaelli's PySpark Twitter GitHub Repository](https://github.com/matteoredaelli/pyspark-examples)\n\n* [Duke Computational Statistics PySpark Tutorial](http://people.duke.edu/~ccc14/sta-663-2016/21A_Introduction_To_Spark.html)\n\n* [XD-Deng's GitHub Tutorial on PySpark](https://github.com/XD-DENG/Spark-practice)\n\n* [Charles Rawles' Using Apache Spark for Sports Analytics](https://content.pivotal.io/blog/how-data-science-assists-sports)\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwesslen%2Fcode-tutorials-for-sophi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwesslen%2Fcode-tutorials-for-sophi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwesslen%2Fcode-tutorials-for-sophi/lists"}