{"id":15908093,"url":"https://github.com/ignalina/spark312","last_synced_at":"2025-04-03T00:21:49.019Z","repository":{"id":131256566,"uuid":"444245867","full_name":"Ignalina/spark312","owner":"Ignalina","description":"Cloned spark312","archived":false,"fork":false,"pushed_at":"2022-01-04T02:18:06.000Z","size":26936,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-08T14:29:12.303Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ignalina.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/security.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-04T01:13:11.000Z","updated_at":"2022-01-12T01:21:23.000Z","dependencies_parsed_at":"2023-07-08T23:31:29.270Z","dependency_job_id":null,"html_url":"https://github.com/Ignalina/spark312","commit_stats":{"total_commits":9,"total_committers":2,"mean_commits":4.5,"dds":"0.11111111111111116","last_synced_commit":"afcda3244e69b66a7d3137a3445eb459564a6536"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ignalina%2Fspark312","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ignalina%2Fspark312/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ignalina%2Fspark312/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ignalina%2Fspark312/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ignalina","download_url":"https://codeload.github.com/Ignalina/spark312/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246912446,"owners_count":20853861,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T14:09:37.686Z","updated_at":"2025-04-03T00:21:44.007Z","avatar_url":"https://github.com/Ignalina.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Apache Spark\n\nSpark is a unified analytics engine for large-scale data processing. It provides\nhigh-level APIs in Scala, Java, Python, and R, and an optimized engine that\nsupports general computation graphs for data analysis. It also supports a\nrich set of higher-level tools including Spark SQL for SQL and DataFrames,\nMLlib for machine learning, GraphX for graph processing,\nand Structured Streaming for stream processing.\n\n\u003chttps://spark.apache.org/\u003e\n\n[![Jenkins Build](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/badge/icon)](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3)\n[![AppVeyor Build](https://img.shields.io/appveyor/ci/ApacheSoftwareFoundation/spark/master.svg?style=plastic\u0026logo=appveyor)](https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark)\n[![PySpark Coverage](https://img.shields.io/badge/dynamic/xml.svg?label=pyspark%20coverage\u0026url=https%3A%2F%2Fspark-test.github.io%2Fpyspark-coverage-site\u0026query=%2Fhtml%2Fbody%2Fdiv%5B1%5D%2Fdiv%2Fh1%2Fspan\u0026colorB=brightgreen\u0026style=plastic)](https://spark-test.github.io/pyspark-coverage-site)\n\n\n## Online Documentation\n\nYou can find the latest Spark documentation, including a programming\nguide, on the [project web page](https://spark.apache.org/documentation.html).\nThis README file only contains basic setup instructions.\n\n## Building Spark\n\nSpark is built using [Apache Maven](https://maven.apache.org/).\nTo build Spark and its example programs, run:\n\n    ./build/mvn -DskipTests clean package\n\n(You do not need to do this if you downloaded a pre-built package.)\n\nMore detailed documentation is available from the project site, at\n[\"Building Spark\"](https://spark.apache.org/docs/latest/building-spark.html).\n\nFor general development tips, including info on developing Spark using an IDE, see [\"Useful Developer Tools\"](https://spark.apache.org/developer-tools.html).\n\n## Interactive Scala Shell\n\nThe easiest way to start using Spark is through the Scala shell:\n\n    ./bin/spark-shell\n\nTry the following command, which should return 1,000,000,000:\n\n    scala\u003e spark.range(1000 * 1000 * 1000).count()\n\n## Interactive Python Shell\n\nAlternatively, if you prefer Python, you can use the Python shell:\n\n    ./bin/pyspark\n\nAnd run the following command, which should also return 1,000,000,000:\n\n    \u003e\u003e\u003e spark.range(1000 * 1000 * 1000).count()\n\n## Example Programs\n\nSpark also comes with several sample programs in the `examples` directory.\nTo run one of them, use `./bin/run-example \u003cclass\u003e [params]`. For example:\n\n    ./bin/run-example SparkPi\n\nwill run the Pi example locally.\n\nYou can set the MASTER environment variable when running examples to submit\nexamples to a cluster. This can be a mesos:// or spark:// URL,\n\"yarn\" to run on YARN, and \"local\" to run\nlocally with one thread, or \"local[N]\" to run locally with N threads. You\ncan also use an abbreviated class name if the class is in the `examples`\npackage. For instance:\n\n    MASTER=spark://host:7077 ./bin/run-example SparkPi\n\nMany of the example programs print usage help if no params are given.\n\n## Running Tests\n\nTesting first requires [building Spark](#building-spark). Once Spark is built, tests\ncan be run using:\n\n    ./dev/run-tests\n\nPlease see the guidance on how to\n[run tests for a module, or individual tests](https://spark.apache.org/developer-tools.html#individual-tests).\n\nThere is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md\n\n## A Note About Hadoop Versions\n\nSpark uses the Hadoop core library to talk to HDFS and other Hadoop-supported\nstorage systems. Because the protocols have changed in different versions of\nHadoop, you must build Spark against the same version that your cluster runs.\n\nPlease refer to the build documentation at\n[\"Specifying the Hadoop Version and Enabling YARN\"](https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn)\nfor detailed guidance on building for a particular distribution of Hadoop, including\nbuilding for particular Hive and Hive Thriftserver distributions.\n\n## Configuration\n\nPlease refer to the [Configuration Guide](https://spark.apache.org/docs/latest/configuration.html)\nin the online documentation for an overview on how to configure Spark.\n\n## Contributing\n\nPlease review the [Contribution to Spark guide](https://spark.apache.org/contributing.html)\nfor information on how to get started contributing to the project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fignalina%2Fspark312","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fignalina%2Fspark312","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fignalina%2Fspark312/lists"}