{"id":20020268,"url":"https://github.com/multivacplatform/multivac-ml","last_synced_at":"2026-04-11T21:38:26.448Z","repository":{"id":91345027,"uuid":"157762632","full_name":"multivacplatform/multivac-ml","owner":"multivacplatform","description":"Pre-trained ML models for Apache Spark","archived":false,"fork":false,"pushed_at":"2019-02-23T15:13:45.000Z","size":970,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-05T05:50:12.366Z","etag":null,"topics":["machine-learning","nlp","spark","spark-ml"],"latest_commit_sha":null,"homepage":"https://multivac.iscpif.fr","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/multivacplatform.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-15T19:35:45.000Z","updated_at":"2020-12-19T08:54:41.000Z","dependencies_parsed_at":"2024-04-22T05:47:47.365Z","dependency_job_id":null,"html_url":"https://github.com/multivacplatform/multivac-ml","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/multivacplatform/multivac-ml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multivacplatform%2Fmultivac-ml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multivacplatform%2Fmultivac-ml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multivacplatform%2Fmultivac-ml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multivacplatform%2Fmultivac-ml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/multivacplatform","download_url":"https://codeload.github.com/multivacplatform/multivac-ml/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multivacplatform%2Fmultivac-ml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31696743,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-11T21:17:31.016Z","status":"ssl_error","status_checked_at":"2026-04-11T21:17:24.556Z","response_time":54,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","nlp","spark","spark-ml"],"created_at":"2024-11-13T08:30:48.579Z","updated_at":"2026-04-11T21:38:26.428Z","avatar_url":"https://github.com/multivacplatform.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# multivac-ml [![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/multivacplatform/multivac-ml/blob/master/LICENSE) [![Build Status](https://travis-ci.org/multivacplatform/multivac-ml.svg?branch=master)](https://travis-ci.org/multivacplatform/multivac-ml) [![multivac discuss](https://img.shields.io/badge/multivac-discuss-ff69b4.svg)](https://discourse.iscpif.fr/c/multivac) [![multivac channel](https://img.shields.io/badge/multivac-chat-ff69b4.svg)](https://chat.iscpif.fr/channel/multivac) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/0df6364b08e84dadadf83e1bc902a58b)](https://app.codacy.com/app/maziyarpanahi/multivac-ml?utm_source=github.com\u0026utm_medium=referral\u0026utm_content=multivacplatform/multivac-ml\u0026utm_campaign=Badge_Grade_Dashboard)\nPre-trained Apache Spark's ML Pipeline for NLP, Classification, etc.\n\n## Project Structure\n-   [models](models) Offline ML Models (for downloads)\n    -   [models/word2vec](models/word2vec) (Word2Vec Model)\n    -   [models/nlp](models/nlp) (Part of Speech Models)\n-   [demo](demo) Demo project\n\n\n## Facts and Figures\n### POS Tagger models\n\n**Enlgish POS tagger model (UD_English-EWT)**\nOnly `en_ewt-ud-train.conllu` file was used to train the model:\n\nPrecision, Recall and F1-Score against the test dataset `en_ewt-ud-test.conllu`\n\n|Tokens |Precision  |Recall |F1-Score |\n|-------|-----------|-------|---------|\n| 25831 |0.93       |0.91   |0.92     |\n\n\nPrecision, Recall and F1-Score against the training dataset `en_ewt-ud-train.conllu`\n\n|Tokens |Precision  |Recall |F1-Score |\n|-------|-----------|-------|---------|\n| 63785 |0.98       |0.98   |0.98     |\n\n\n\u003e **Precision** is \"how useful the POS results are\", and **Recall** is \"how complete the results are\". Precision can be seen as a measure of **exactness or quality**, whereas recall is a measure of **completeness or quantity**. https://en.wikipedia.org/wiki/Precision_and_recall\n\n\u003e The **F1 score** is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0. https://en.wikipedia.org/wiki/F1_score\n\n![Precision](https://wikimedia.org/api/rest_v1/media/math/render/svg/26106935459abe7c266f7b1ebfa2a824b334c807)\n\n![Recall](https://wikimedia.org/api/rest_v1/media/math/render/svg/4c233366865312bc99c832d1475e152c5074891b)\n\n![F1 Score](https://wikimedia.org/api/rest_v1/media/math/render/svg/057ffc6b4fa80dc1c0e1f2f1f6b598c38cdd7c23)\n\n[Read more on evaluation of the models](models/nlp)\n\n## Open Data\n**Multivac ML data**: [https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/WSWU7K](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/WSWU7K)\n\n**Multivac Open Data**: [https://dataverse.harvard.edu/dataverse/multivac](https://dataverse.harvard.edu/dataverse/multivac)\n\n## Dataset Citation\n\u003e Panahi, Maziyar;Chavalarias, David, 2018, \"Multivac Machine Learning Models\", https://doi.org/10.7910/DVN/WSWU7K, Harvard Dataverse, V2\n\n## Code of Conduct\nThis, and all github.com/multivacplatform projects, are under the [Multivac Platform Open Source Code of Conduct](https://github.com/multivacplatform/code-of-conduct/blob/master/code-of-conduct.md). Additionally, see the [Typelevel Code of Conduct](http://typelevel.org/conduct) for specific examples of harassing behavior that are not tolerated.\n\n## Copyright and License\nCode and documentation copyright (c) 2018-2019 [ISCPIF - CNRS](http://iscpif.fr). Code released under the [MIT license](https://github.com/multivacplatform/multivac-ml/blob/master/LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmultivacplatform%2Fmultivac-ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmultivacplatform%2Fmultivac-ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmultivacplatform%2Fmultivac-ml/lists"}