{"id":13749959,"url":"https://github.com/Stratio/sparta","last_synced_at":"2025-05-09T13:31:43.711Z","repository":{"id":26828882,"uuid":"30288112","full_name":"Stratio/sparta","owner":"Stratio","description":"Real Time Analytics and Data Pipelines based on Spark Streaming","archived":false,"fork":false,"pushed_at":"2019-10-24T06:32:21.000Z","size":128883,"stargazers_count":524,"open_issues_count":8,"forks_count":197,"subscribers_count":138,"default_branch":"master","last_synced_at":"2024-05-21T01:01:19.263Z","etag":null,"topics":["analytics","hdfs","kafka","lambda","olap","real-time","scala","spark","spark-streaming","sparksql","sparta","stratio","stratio-sparta","streaming","streaming-data","triggers","workflow"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Stratio.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-02-04T08:04:40.000Z","updated_at":"2024-04-03T10:18:07.000Z","dependencies_parsed_at":"2022-09-01T16:10:59.827Z","dependency_job_id":null,"html_url":"https://github.com/Stratio/sparta","commit_stats":null,"previous_names":[],"tags_count":72,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stratio%2Fsparta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stratio%2Fsparta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stratio%2Fsparta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stratio%2Fsparta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Stratio","download_url":"https://codeload.github.com/Stratio/sparta/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253258274,"owners_count":21879619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","hdfs","kafka","lambda","olap","real-time","scala","spark","spark-streaming","sparksql","sparta","stratio","stratio-sparta","streaming","streaming-data","triggers","workflow"],"created_at":"2024-08-03T07:01:19.916Z","updated_at":"2025-05-09T13:31:42.566Z","avatar_url":"https://github.com/Stratio.png","language":"Scala","funding_links":[],"categories":["Scala","大数据"],"sub_categories":[],"readme":"Discontinued\r\n============\r\nAfter around two years of development, we have decided to discontinue this project due to a major refactor in its structure and in a near future we will launch Sparta 2.0.\r\n\r\nWe would like to thank all the open source community for their contribution.\r\nNeedless to say that you can continue using this repository as a basis for your developments as it contains the latest stable version as of today and minor issues will be attended.\r\n\r\nIf you are interested in the new Sparta 2.0 with pipelines and workflows, please contact with us in the email sparta@stratio.com\r\n\r\n\r\nAbout Stratio Sparta\r\n============\r\n\r\nAt Stratio, we have implemented several real-time analytics projects based on Apache Spark, Kafka, Flume, Cassandra, ElasticSearch or MongoDB.\r\nThese technologies were always a perfect fit, but soon we found ourselves writing the same pieces of integration code over and over again.\r\nStratio Sparta is the easiest way to make use of the Apache Spark Streaming technology and all its ecosystem.\r\nChoose your input, operations and outputs, and start extracting insights out of your data in real-time.\r\n\r\n\u003cimg src=\"./images/StrataKibana.jpg\" width=\"600\" height=\"300\" alt=\"Strata Twitter Analytics with Kibana\"/\u003e\r\n\r\nMain Features\r\n============\r\n\r\n- Pure Spark\r\n- No need of coding, only declarative analytical workflows\r\n- Data continuously streamed in \u0026 processed in near real-time\r\n- Ready to use out-of-the-box\r\n- Plug \u0026 play: flexible workflows (inputs, outputs, transformations, etc…)\r\n- High performance and Fault Tolerance\r\n- Scalable and High Availability\r\n- Big Data OLAP on real-time to small data\r\n- ETLs\r\n- Triggers over streaming data\r\n- Spark SQL language with streaming and batch data\r\n- Kerberos and CAS compatible\r\n\r\n\u003cimg src=\"./images/mainFeatures.jpg\"  alt=\"Main Features\"/\u003e\r\n\r\nArchitecture\r\n============\r\n\r\nSend one workflow as a JSON to Sparta API and execute in one Spark Cluster your own real-time plugins\r\n![Architecture](./images/architecture.jpg)\r\n\r\nSparta as a Job Manager\r\n------------\r\n\r\nSend more than one Streaming Job in the Spark Cluster and manage them with a simple UI\r\n\r\n\u003cimg src=\"./images/jobManager.jpg\" alt=\"Job Manager\"/\u003e\r\n\r\nRun workflows over Mesos, Yarn or SparkStandAlone\r\n\r\n\u003cimg src=\"./images/architectureJobs.jpg\" alt=\"Job Manager Architecture\"/\u003e\r\n\r\nSparta as a SDK\r\n------------\r\n\r\nModular components extensible with simple SDK\r\n- You can extend several points of the platform to fulfill your needs, such as adding new inputs, outputs, operators, transformations.\r\n- Add new functions to Kite SDK in order to extend the data cleaning, enrichment and normalization capabilities.\r\n![Architecture Detail](./images/architectureDetail.jpg)\r\n\r\nComponents\r\n========\r\n\r\nOn each workflow multiple components can be defined, but now all have the following architecture\r\n![workflow](./images/workflow.jpg)\r\n![Components](./images/components.jpg)\r\n\r\nCore components\r\n------------\r\n\r\nSeveral plugins are been implemented by Stratio Sparta team\r\n![Main plugins](./images/plugins.jpg)\r\n\r\nTrigger component\r\n------------\r\n\r\nWith Sparta is possible to execute queries over the streaming data, execute ETL, aggregations and Simple Event \r\nProcessing mixing streaming data with batch data on the trigger process. \r\n![triggers](./images/triggers.jpg)\r\n\r\nAggregation component\r\n------------\r\n\r\nThe aggregation process in Sparta is very powerful because is possible to generate efficient OLAP processes with \r\nstreaming data\r\n![OLAP](./images/OLAPintegration.jpg)\r\n\r\nAdvanced feature are been implemented in order to optimize the stateful operations over Spark Streaming\r\n![Aggregations](./images/aggregation.jpg)\r\n\r\nInputs\r\n------------\r\n\r\n- Twitter\r\n- Kafka\r\n- Flume\r\n- RabbitMQ\r\n- Socket\r\n- WebSocket\r\n- HDFS/S3\r\n\r\nOutputs\r\n------------\r\n\r\n- MongoDB\r\n- Cassandra\r\n- ElasticSearch\r\n- Redis\r\n- JDBC\r\n- CSV\r\n- Parquet\r\n- Http\r\n- Kafka\r\n- HDFS/S3\r\n- Http Rest\r\n- Avro\r\n- Logger\r\n\r\n![Outputs](./images/outputs.png)\r\n\r\nKey technologies\r\n========\r\n\r\n- [Spark Streaming \u0026 Spark]  (http://spark.apache.org)\r\n- [SparkSQL] (https://spark.apache.org/sql)\r\n- [Akka] (http://akka.io)\r\n- [MongoDB] (http://www.mongodb.org/)\r\n- [Apache Cassandra] (http://cassandra.apache.org)\r\n- [ElasticSearch] (https://www.elastic.co)\r\n- [Redis] (http://redis.io)\r\n- [Apache Parquet] (http://parquet.apache.org/)\r\n- [HDFS] (http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html)\r\n- [Apache Kafka] (http://kafka.apache.org)\r\n- [Apache Flume] (https://flume.apache.org/)\r\n- [RabbitMQ] (https://www.rabbitmq.com/)\r\n- [Spray] (http://spray.io/)\r\n- [KiteSDK (morphlines)] (http://kitesdk.org/docs/current)\r\n- [Apache Avro] (https://avro.apache.org/)\r\n\r\nAdvantages\r\n========\r\n\r\nSparta provide several advantages to final Users\r\n![Advantages](./images/features.jpg)\r\n\r\nBuild\r\n========\r\n\r\nYou can generate rpm and deb packages by running:\r\n\r\n`mvn clean package -Ppackage`\r\n\r\n**Note:** you need to have installed the following programs in order to build these packages:\r\n\r\nIn a debian distribution:\r\n\r\n  - fakeroot\r\n  - dpkg-dev\r\n  - rpm\r\n  - jq\r\n  \r\nIn a centOS distribution:\r\n\r\n  - fakeroot\r\n  - dpkg-dev\r\n  - rpmdevtools\r\n  - jq\r\n  \r\nIn all distributions:\r\n\r\n  - Java 8\r\n  - Maven 3\r\n\r\nLicense\r\n========\r\n\r\nLicensed to STRATIO (C) under one or more contributor license agreements.\r\nSee the NOTICE file distributed with this work for additional information\r\nregarding copyright ownership.  The STRATIO (C) licenses this file\r\nto you under the Apache License, Version 2.0 (the\r\n\"License\"); you may not use this file except in compliance\r\nwith the License.  You may obtain a copy of the License at\r\n\r\n  http://www.apache.org/licenses/LICENSE-2.0\r\n\r\nUnless required by applicable law or agreed to in writing,\r\nsoftware distributed under the License is distributed on an\r\n\"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\r\nKIND, either express or implied.  See the License for the\r\nspecific language governing permissions and limitations\r\nunder the License.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FStratio%2Fsparta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FStratio%2Fsparta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FStratio%2Fsparta/lists"}