{"id":21159058,"url":"https://github.com/360er0/combo","last_synced_at":"2025-07-09T14:30:27.815Z","repository":{"id":37605421,"uuid":"139895577","full_name":"360er0/COMBO","owner":"360er0","description":"COMBO is jointly trained tagger, lemmatizer and dependency parser.","archived":false,"fork":false,"pushed_at":"2023-03-24T23:30:33.000Z","size":39,"stargazers_count":36,"open_issues_count":2,"forks_count":8,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-15T03:52:19.910Z","etag":null,"topics":["dependency-parser","keras","lemmatizer","tagger","universal-dependencies"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/360er0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-05T20:10:43.000Z","updated_at":"2024-01-04T16:24:24.000Z","dependencies_parsed_at":"2023-01-20T21:19:02.770Z","dependency_job_id":null,"html_url":"https://github.com/360er0/COMBO","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/360er0%2FCOMBO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/360er0%2FCOMBO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/360er0%2FCOMBO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/360er0%2FCOMBO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/360er0","download_url":"https://codeload.github.com/360er0/COMBO/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225562184,"owners_count":17488565,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dependency-parser","keras","lemmatizer","tagger","universal-dependencies"],"created_at":"2024-11-20T12:58:47.049Z","updated_at":"2024-11-20T12:58:48.283Z","avatar_url":"https://github.com/360er0.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# COMBO\nCOMBO is jointly trained neural tagger, lemmatizer and dependency parser implemented in python 3 using Keras framework. It took part in [*2018 CoNLL Universal Dependency shared task*](http://universaldependencies.org/conll18/) and ranked 3rd/4th in the [*official evaluation*](http://universaldependencies.org/conll18/results.html).\n\n## Paper\nThe COMBO description can be found here: [*Semi-Supervised Neural System for Tagging, Parsing and Lematization*](http://universaldependencies.org/conll18/proceedings/pdf/K18-2004.pdf).\n\n## Usage\nTraining your own model:\n```\npython main.py --mode autotrain --train train_data.conllu --valid valid_data.conllu --embed external_embedding.txt --model model_name.pkl --force_trees\n```\n\nMaking predictions:\n```\npython main.py --mode predict --test test_data.conllu --pred output_path.conllu --model model_name.pkl\n```\n\n## Trained models\nModels trained on UD dataset:\n\n| Language | Treebank | LAS | MLAS | BLEX | Model |\n|-|-|-|-|-|-|\n| Afrikaans | af_afribooms | 84.72 | 72.91 | 74.98 | [*377 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.af_afribooms.pkl) |\n| Ancient Greek | grc_perseus | 74.20 | 53.30 | 54.29 | [*101 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.grc_perseus.pkl) |\n| Ancient Greek | grc_proiel | 76.45 | 59.95 | 67.47 | [*101 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.grc_proiel.pkl) |\n| Arabic | ar_padt | 71.95 | 62.75 | 64.38 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ar_padt.pkl) |\n| Armenian | hy_armtdp | 28.15 | 5.02 | 11.25 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.hy_armtdp.pkl) |\n| Basque | eu_bdt | 83.12 | 68.82 | 77.96 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.eu_bdt.pkl) |\n| Bulgarian | bg_btb | 89.36 | 81.10 | 79.98 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.bg_btb.pkl) |\n| Buryat | bxr_bdt | 15.16 | 1.09 | 1.92 | [*90 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.bxr_bdt.pkl) |\n| Catalan | ca_ancora | 90.54 | 83.11 | 85.20 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ca_ancora.pkl) |\n| Chinese | zh_gsd | 63.92 | 53.48 | 57.84 | [*744 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.zh_gsd.pkl) |\n| Croatian | hr_set | 86.32 | 71.12 | 79.74 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.hr_set.pkl) |\n| Czech | cs_cac | 90.72 | 83.27 | 86.69 | [*740 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.cs_cac.pkl) |\n| Czech | cs_fictree | 91.83 | 84.23 | 87.81 | [*740 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.cs_fictree.pkl) |\n| Czech | cs_pdt | 90.34 | 84.04 | 86.96 | [*740 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.cs_pdt.pkl) |\n| Danish | da_ddt | 83.43 | 74.22 | 77.58 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.da_ddt.pkl) |\n| Dutch | nl_alpino | 87.15 | 74.93 | 77.06 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.nl_alpino.pkl) |\n| Dutch | nl_lassysmall | 84.27 | 72.65 | 75.44 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.nl_lassysmall.pkl) |\n| English | en_ewt | 82.31 | 73.33 | 76.52 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.en_ewt.pkl) |\n| English | en_gum | 82.82 | 73.24 | 73.57 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.en_gum.pkl) |\n| English | en_lines | 80.33 | 72.25 | 74.01 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.en_lines.pkl) |\n| Estonian | et_edt | 83.46 | 75.79 | 72.07 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.et_edt.pkl) |\n| Finnish | fi_ftb | 86.89 | 78.42 | 81.06 | [*739 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.fi_ftb.pkl) |\n| Finnish | fi_tdt | 85.93 | 78.65 | 72.39 | [*739 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.fi_tdt.pkl) |\n| French | fr_gsd | 85.42 | 77.08 | 79.72 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.fr_gsd.pkl) |\n| French | fr_sequoia | 88.99 | 81.48 | 84.67 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.fr_sequoia.pkl) |\n| French | fr_spoken | 74.31 | 63.43 | 65.34 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.fr_spoken.pkl) |\n| Galician | gl_ctg | 81.17 | 68.15 | 73.60 | [*736 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.gl_ctg.pkl) |\n| Galician | gl_treegal | 73.21 | 52.88 | 62.86 | [*736 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.gl_treegal.pkl) |\n| German | de_gsd | 77.43 | 54.28 | 68.59 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.de_gsd.pkl) |\n| Gothic | got_proiel | 65.87 | 50.81 | 59.30 | [*48 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.got_proiel.pkl) |\n| Greek | el_gdt | 88.49 | 76.15 | 78.57 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.el_gdt.pkl) |\n| Hebrew | he_htb | 63.69 | 50.26 | 53.58 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.he_htb.pkl) |\n| Hindi | hi_hdtb | 91.43 | 76.23 | 86.29 | [*593 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.hi_hdtb.pkl) |\n| Hungarian | hu_szeged | 79.47 | 66.09 | 72.51 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.hu_szeged.pkl) |\n| Indonesian | id_gsd | 78.40 | 67.30 | 75.10 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.id_gsd.pkl) |\n| Irish | ga_idt | 69.24 | 37.31 | 47.32 | [*206 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ga_idt.pkl) |\n| Italian | it_isdt | 91.03 | 83.18 | 84.76 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.it_isdt.pkl) |\n| Italian | it_postwita | 73.99 | 61.14 | 62.98 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.it_postwita.pkl) |\n| Japanese | ja_gsd | 73.69 | 57.82 | 60.62 | [*743 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ja_gsd.pkl) |\n| Kazakh | kk_ktb | 22.38 | 4.40 | 7.86 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.kk_ktb.pkl) |\n| Korean | ko_gsd | 80.66 | 74.49 | 66.13 | [*741 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ko_gsd.pkl) |\n| Korean | ko_kaist | 84.88 | 76.92 | 72.40 | [*743 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ko_kaist.pkl) |\n| Kurmanji | kmr_mg | 21.95 | 2.26 | 05.01 | [*45 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.kmr_mg.pkl) |\n| Latin | la_ittb | 85.54 | 79.84 | 83.51 | [*526 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.la_ittb.pkl) |\n| Latin | la_perseus | 68.07 | 49.77 | 52.75 | [*526 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.la_perseus.pkl) |\n| Latin | la_proiel | 70.08 | 56.82 | 64.94 | [*526 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.la_proiel.pkl )|\n| Latvian | lv_lvtb | 80.71 | 66.22 | 71.80 | [*637 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.lv_lvtb.pkl) |\n| North Sámi | sme_giella | 57.16 | 39.66 | 45.03 | [*47 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.sme_giella.pkl) |\n| Norwegian | no_bokmaal | 89.33 | 79.51 | 84.68 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.no_bokmaal.pkl) |\n| Norwegian | no_nynorsk | 88.36 | 79.32 | 82.89 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.no_nynorsk.pkl) |\n| Norwegian | no_nynorsklia | 68.26 | 57.51 | 60.98 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.no_nynorsklia.pkl) |\n| Old Church Slavonic | cu_proiel | 71.14 | 56.52 | 66.04 | [*48 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.cu_proiel.pkl) |\n| Old French | fro_srcmf | 84.81 | 76.75 | 81.20 | [*52 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.fro_srcmf.pkl) |\n| Persian | fa_seraji | 86.14 | 80.30 | 76.29 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.fa_seraji.pkl) |\n| Polish | pl_lfg | 94.62 | 86.44 | 89.31 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.pl_lfg.pkl) |\n| Polish | pl_sz | 91.38 | 80.45 | 85.59 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.pl_sz.pkl) |\n| Polish | poleval2018 | 86.11 | 76.18 | 79.86 | [*115 MB*](http://mozart.ipipan.waw.pl/~prybak/model_poleval2018/model_A_semi.pkl) |\n| Portuguese | pt_bosque | 87.57 | 74.31 | 80.31 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.pt_bosque.pkl) |\n| Romanian | ro_rrt | 85.31 | 76.84 | 79.54 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ro_rrt.pkl) |\n| Russian | ru_syntagrus | 91.10 | 85.37 | 87.16 | [*741 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ru_syntagrus.pkl) |\n| Russian | ru_taiga | 74.24 | 61.59 | 64.36 | [*741 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ru_taiga.pkl) |\n| Serbian | sr_set | 87.27 | 73.79 | 79.92 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.sr_set.pkl) |\n| Slovak | sk_snk | 83.76 | 63.97 | 75.34 | [*54 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.sk_snk.pkl) |\n| Slovenian | sl_ssj | 85.72 | 75.07 | 81.11 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.sl_ssj.pkl) |\n| Slovenian | sl_sst | 58.12 | 45.93 | 50.94 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.sl_sst.pkl) |\n| Spanish | es_ancora | 89.68 | 82.60 | 84.51 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.es_ancora.pkl) |\n| Swedish | sv_lines | 81.97 | 66.26 | 77.01 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.sv_lines.pkl) |\n| Swedish | sv_talbanken | 85.89 | 77.68 | 80.74 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.sv_talbanken.pkl) |\n| Turkish | tr_imst | 63.54 | 52.51 | 58.89 | [*737 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.tr_imst.pkl) |\n| Ukrainian | uk_iu | 84.71 | 69.88 | 77.97 | [*738 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.uk_iu.pkl) |\n| Upper Sorbian | hsb_ufal | 21.30 | 1.45 | 4.53 | [*139 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.hsb_ufal.pkl) |\n| Urdu | ur_udtb | 81.53 | 55.70 | 72.49 | [*485 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ur_udtb.pkl) |\n| Uyghur | ug_udt | 63.10 | 40.71 | 52.76 | [*165 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.ug_udt.pkl) |\n| Vietnamese | vi_vtb | 42.53 | 35.11 | 38.47 | [*736 MB*](http://mozart.ipipan.waw.pl/~prybak/model_conll2018/model.vi_vtb.pkl) |\n\n\n## License\nCC BY-NC-SA 4.0\n\n## Citation\n\n```\n@InProceedings{rybak-wrblewska:2018:K18-2,\n  author    = {Rybak, Piotr  and  Wr{\\'{o}}blewska, Alina},\n  title     = {Semi-Supervised Neural System for Tagging, Parsing and Lematization},\n  booktitle = {Proceedings of the {CoNLL} 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies},\n  month     = {October},\n  year      = {2018},\n  address   = {Brussels, Belgium},\n  publisher = {Association for Computational Linguistics},\n  pages     = {45--54},\n  url       = {http://www.aclweb.org/anthology/K18-2004}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F360er0%2Fcombo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F360er0%2Fcombo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F360er0%2Fcombo/lists"}