{"id":27786175,"url":"https://github.com/angel-ml/sona","last_synced_at":"2025-04-30T15:57:42.151Z","repository":{"id":57727503,"uuid":"177914133","full_name":"Angel-ML/sona","owner":"Angel-ML","description":"Spark On Angel, arming Spark with a powerful Parameter Server, which enable Spark to train very big models","archived":false,"fork":false,"pushed_at":"2023-01-02T22:13:35.000Z","size":40223,"stargazers_count":82,"open_issues_count":20,"forks_count":50,"subscribers_count":11,"default_branch":"master","last_synced_at":"2023-07-26T22:48:41.501Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Angel-ML.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-27T03:49:43.000Z","updated_at":"2023-05-22T02:24:00.000Z","dependencies_parsed_at":"2023-02-01T04:31:48.501Z","dependency_job_id":null,"html_url":"https://github.com/Angel-ML/sona","commit_stats":null,"previous_names":[],"tags_count":1,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Angel-ML%2Fsona","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Angel-ML%2Fsona/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Angel-ML%2Fsona/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Angel-ML%2Fsona/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Angel-ML","download_url":"https://codeload.github.com/Angel-ML/sona/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251737469,"owners_count":21635603,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-30T15:57:41.301Z","updated_at":"2025-04-30T15:57:42.135Z","avatar_url":"https://github.com/Angel-ML.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SONA Overview\r\nSpark On Angel (SONA), arming Spark with a powerful Parameter Server, which enable Spark to train very big models\r\n\r\nSimilar to Spark MLlib, Spark on Angel is a standalone machine learning library built on Spark (yet it does not rely on Spark MLlib, Figure 1). \r\nSONA was based on RDD APIs and only included model training step in previous versions. In Angel 3.0, we introduce various new features to SONA:\r\n- Integration of feature engineering into SONA. Instead of simply borrowing Spark’s feature engineering operators, we add support for long index vector to all the operators to enable training of high dimensional sparse models. \r\n- Seamless connection with automatic hyperparameter tuning.\r\n- Spark-fashion APIs that introduce no cost for Spark users to switch to Angel.\r\n- Support for two new data formats: LibFFM and Dummy.\r\n\r\n| ![sona_fig00](docs/imgs/sona_fig00.png) |\r\n|  :----:    |\r\n| *Figure 1: SONA is a another machine learning \u0026 graph library on Spark Core*   |\r\n\r\nFigure 2 demonstrate the run time architecture of SONA.\r\n\r\n| ![sona_fig01](docs/imgs/sona_fig01.png) |\r\n|  :----:    |\r\n| *Figure 2: Architecture of SONA*   |\r\n\r\n- There is a `AngelClient` on Spark driver. `AngelClient` is used to start Angel parameter server, create, load, initial and save matrix of the model. \r\n- There is a `PSClient/PSAgent` on Spark executor. Algorithms can pull parameter and push gradient through `PSAgent`\r\n- The Angel *MLcore* is running in each `Task`\r\n\r\nCompared to previous version, a variety of new algorithms were added on SONA, such as Deep \u0026 Cross Network (DCN) and \r\nAttention Factorization Machines (AFM). As can be seen from Figure 2, there are significant differences \r\nbetween algorithms on SONA and those on Spark: algorithms on SONA are mainly designated for recommendations \r\nand graph embedding, while algorithms on Spark tend to be more general-purpose. \r\n\r\n| ![sona_fig02](docs/imgs/sona_fig02.png) |\r\n|  :----:    |\r\n| *Figure 3: Algorithms comparison of Spark and Angel*   |\r\n\r\nAs a result, SONA can serve as a supplement of Spark\r\n\r\n| ![sparkonangel](docs/imgs/sparkonangel.gif) |\r\n|  :----:    |\r\n| *Figure 4: Programming Example of SONA*   |\r\n\r\n\r\nFigure 4 provides an example of running distributed machine learning algorithms on SONA, including following steps:\r\n- Start parameter server at the beginning and stop it in the end.\r\n- Load training and test data as Spark DataFrame.\r\n- Define an Angel model and set parameters in Spark fashion. In this example, the algorithm is defined as a computing graph via JSON.\r\n- Use “fit” method to train the model. \r\n- Use “evaluate” method to evaluate the trained model. \r\n\r\n\r\n## Quick Start\r\nSONA supports three types of runtime models: YARN, K8s and Local. The local mode enable it easy to debug. \r\n[sona quick start](./docs/tutorials/sona_quick_start.md)\r\n \r\n## Algorithms\r\n- machine learning algorithms:\r\n    + Traditional Machine Learning Methods\r\n        - [Logistic Regression(LR)](docs/algo/lr_sona_en.md)\r\n        - [Support Vector Machine(SVM)](docs/algo/svm_sona_en.md)\r\n        - [Factorization Machine(FM)](docs/algo/fm_sona_en.md)\r\n        - [Linear Regression](docs/algo/linreg_sona_en.md)\r\n        - [Robust Regression](docs/algo/robust_sona_en.md)\r\n        - [Gradient Boosting Decision Tree](docs/GBDT.md)\r\n        - [Hyper-Parameter Tuning](docs/AutoML.md)\r\n        - [FTRL](docs/algo/ftrl_lr_sona_en.md)\r\n        - [FTRL-FM](docs/algo/ftrl_fm_sona_en.md)\r\n    + Deep Learning Methods\r\n        - [Deep Neural Network(DNN)](docs/algo/dnn_sona_en.md)\r\n        - [Mix Logistic Regression(MLR)](docs/algo/mlr_sona_en.md)\r\n        - [Deep And Wide(DAW)](docs/algo/daw_sona_en.md)\r\n        - [Deep Factorization Machine(DeepFM)](docs/algo/deepfm_sona_en.md)\r\n        - [Neural Factorization Machine(NFM)](docs/algo/nfm_sona_en.md)\r\n        - [Product Neural Network(PNN)](docs/algo/pnn_sona_en.md)\r\n        - [Attention Factorization Machine(AFM)](docs/algo/afm_sona_en.md)\r\n        - [Deep Cross Network(DCN)](docs/algo/dcn_sona_en.md)\r\n- graph algorithms:\r\n    + [Word2Vec](docs/algo/word2vec_sona_en.md)\r\n    + [LINE](docs/algo/line_sona_en.md)\r\n    + [KCore](docs/algo/kcore_sona_en.md)\r\n    + [Louvain](docs/algo/louvain_sona_en.md)\r\n\r\n## Deployment\r\n\r\n## Support\r\n- QQ account: 20171688\r\n\r\n## References\r\n\r\n## Other Resources\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fangel-ml%2Fsona","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fangel-ml%2Fsona","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fangel-ml%2Fsona/lists"}