{"id":25837309,"url":"https://github.com/lifeweb-ir/lm","last_synced_at":"2026-01-31T10:33:49.226Z","repository":{"id":226861285,"uuid":"766897470","full_name":"lifeweb-ir/LM","owner":"lifeweb-ir","description":"Lifeweb AI team Language Models","archived":false,"fork":false,"pushed_at":"2024-03-11T17:56:11.000Z","size":35,"stargazers_count":16,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-04T16:33:50.272Z","etag":null,"topics":["language-model","mobilebert","persian-nlp","roberta"],"latest_commit_sha":null,"homepage":"https://lifewebco.com/ai","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lifeweb-ir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-03-04T10:35:55.000Z","updated_at":"2025-10-06T09:24:58.000Z","dependencies_parsed_at":"2024-03-10T08:24:52.359Z","dependency_job_id":"d505324a-4d98-44e3-8590-700a1d236d06","html_url":"https://github.com/lifeweb-ir/LM","commit_stats":null,"previous_names":["lifeweb-ir/lm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lifeweb-ir/LM","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifeweb-ir%2FLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifeweb-ir%2FLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifeweb-ir%2FLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifeweb-ir%2FLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lifeweb-ir","download_url":"https://codeload.github.com/lifeweb-ir/LM/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifeweb-ir%2FLM/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28938619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T10:18:23.202Z","status":"ssl_error","status_checked_at":"2026-01-31T10:18:22.693Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["language-model","mobilebert","persian-nlp","roberta"],"created_at":"2025-03-01T02:48:13.571Z","updated_at":"2026-01-31T10:33:49.208Z","avatar_url":"https://github.com/lifeweb-ir.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cdiv align=\"center\"\u003e\n\n[\u003cimg src=\"./assets/logo_en.png\"\u003e](https://lifewebco.com)\n\n# Lifeweb language models\n\n\u003c/div\u003e\n\nWelcome to the Lifeweb Language Models repository.\nHere we aim to train different Persian Language models and release them publicly to contribute our share to the Persian language's AI field.\nThe first versions of our models are all trained on our dataset called **Divan** with more than **164 million documents** and more than **10B tokens** which is normalized and deduplicated meticulously to ensure its enrichment and comprehensiveness. A better dataset leads to a better model. \n\n\n# Use Models\nYou can easily access the models using the links of Huggingface model hub provided in the table below.\n\n| Model Name                                         | Base Model | \tVocabulary Size |  |\n|----------------------------------------------------|--|------------------|--|\n| [Tehran](https://huggingface.co/lifeweb-ai/tehran) | [Roberta](https://huggingface.co/HooshvareLab/roberta-fa-zwnj-base) | 50000\t           |[Results](#Results)|\n| [Shiraz](https://huggingface.co/lifeweb-ai/shiraz) |[MobileBert](https://huggingface.co/google/mobilebert-uncased)| 50000            | [Results](#Results)|\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM, FillMaskPipeline\n\nmodel_name = \"lifeweb-ai/shiraz\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForMaskedLM.from_pretrained(model_name)\n\ntext = \"در همین لحظه که شما مشغول [MASK] این متن هستید، میلیون‌ها دیتا در فضای آنلاین در حال تولید است. ما در لایف وب به جمع‌آوری، پردازش و تحلیل این کلان داده (Big Data) می‌پردازیم.\"\n\n\nclassifier = FillMaskPipeline(model=model, tokenizer=tokenizer)\nresult = classifier(text)\nprint(result[0])\n#{'score': 0.3584367036819458, 'token': 5764, 'token_str': 'خواندن', 'sequence': 'در همین لحظه که شما مشغول خواندن این متن هستید، میلیون ها دیتا در فضای انلاین در حال تولید است. ما در لایف وب به جمع اوری، پردازش و تحلیل این کلان داده ( big data ) می پردازیم.'}\n  ```\n\n\n\n\n# Results\n\nThe Lifeweb models are evaluated on three downstream NLP tasks comprising **NER**, **Sentiment Analysis**, and **Emotion Detection**. **Tehran** outperforms every other Persian language model in terms of accuracy and macro F1. Additionally, **Shiraz** is considerably faster, and its accuracy remains highly competitive without compromising much on speed. According to [**MobileBERT paper**](https://arxiv.org/pdf/2004.02984.pdf), this model is 4.3× smaller and 5.5× faster than BERT-base.\nWe assert that our models outperform all similar models in the field, achieving a new state-of-the-art performance. Referencing [**ParsBERT**](https://arxiv.org/abs/2005.12515), [**AriaBERT**](https://assets.researchsquare.com/files/rs-3558473/v1_covered_d230d5de-50d1-42d5-ba1a-ef400ede52e3.pdf?c=1699474771) and [**FaBERT**](https://arxiv.org/abs/2402.06617), we substantiate this claim by demonstrating superior evaluation metrics, even as they themselves have highlighted their better performance among other suitable models. \n\nObvious from the table below, you can find the Colab codes for each task to use as a tutorial besides the macro F1 score. These Colab codes are run equally on 4x2080 TI graphic cards.\n\n\u003ctable class=\"tg\"\u003e\n\u003cthead\u003e\n  \u003ctr\u003e\n    \u003cth class=\"tg-c3ow\"\u003eModel\u003c/th\u003e\n    \u003cth class=\"tg-c3ow\" colspan=\"2\"\u003eNER\u003c/th\u003e\n    \u003cth class=\"tg-c3ow\" colspan=\"2\"\u003eSentiment\u003c/th\u003e\n    \u003cth class=\"tg-c3ow\" colspan=\"1\"\u003eEmotion\u003c/th\u003e\n  \u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd class=\"tg-0pky\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003eArman\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003ePeyma\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e Sentipers (multi) \u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e Snappfood \u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e Arman \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd class=\"tg-0pky\"\u003elifeweb-ai/tehran\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e\u003cstrong\u003e 71.87% \u003cbr\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e\u003cstrong\u003e 90.79% \u003cbr\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e\u003cstrong\u003e 63.75% \u003cbr\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e\u003cstrong\u003e 88.74% \u003cbr\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e\u003cstrong\u003e 77.73% \u003cbr\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd class=\"tg-0pky\"\u003elifeweb-ai/shiraz\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 67.62% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/15PUAGy9MUSBO3LPdMJ4h9DVKibREv9oY\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 86.24% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1lzVsDpl6_WhxsW8mtUNjhXzQPBMNL6Q2\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 59.17% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1L87oYYDBY1Fi0GGvjRGSdSk2rZ5vshUV\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 88.01% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1-S-VE83IGGGS9lZVydVKa4SnxshFSvT6\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 66.97% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/12SpUEsOP1I2cCp-gQsifONyu9yDUGuKG\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd class=\"tg-0pky\"\u003esbunlp/fabert\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 71.23% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1NHUG8GdGEx1R76jr1MBC8sqDFWdsAxQk\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 88.53% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1I6Nl9W_Br-WVV4odUcw0um_-dypjFyrp\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 58.51% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1jdLotilq7hedyQ8x9aTUdgJ2IP-EDLWv\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 88.60% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1DsIFzDrC_HNDaQyltJtiT3DjGA9blg_B\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 72.65% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/12H95pFpFUSYfxpRHWuS-gOQFi81hZhX-\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"\" width=\"87\" height=\"15\"\u003e\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd class=\"tg-0pky\"\u003eViraIntelligentDataMining/AriaBERT\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 69.12% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1s0aSjPYntinkupgaAiGZIvwzKXWjNHgA\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 87.15% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1qPy0nFHC8bYj9OskUyksF0gQRQ6hRgbT\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 59.26% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1P9YaP9Fem5pSlJqPxP2jG2IBq9TsLbaz\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 87.96% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1wuGFELbqx0eE1cvmPZRgfklTTa3SkpyW\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 69.11% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1UINarSRMy4yKbSeXKgSUf84IvJh-JC4q\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"\" width=\"87\" height=\"15\"\u003e\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd class=\"tg-0pky\"\u003eHooshvareLab/bert-fa-zwnj-base\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 67.49% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1HApEhtOm2p0ra1NwHLbptaxNeKqXC_TM\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 85.73% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1e67UzkbX1HPgayfi8Z1rNNy79AACr1lV\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 59.61% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1pub2tq2Qvb08s2w4cE-AfOwzWYXH6rsM\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 87.58% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1PyjCTXFB-SXfrG8Bjjpr9py39Q9J8oGZ\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 59.27% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/13jUeb2694W9SHWNYa1KMbvmeCAhnDpv0\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd class=\"tg-0pky\"\u003eHooshvareLab/roberta-fa-zwnj-base\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 69.73% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1a0o6Mx3jlK8ItWdIQgThM81hlSTE6sur\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 86.21% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1fMXN5OeWmeLlLnG1gdznvq9ruBmP3UTv\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 56.23% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/18OzPDKH1mB6-uDVmN0WWZz_etwrsZ_A3\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 87.19% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1E-rfJYZmid3a-bEpskU_j_3S4q_SQmGH\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n    \u003ctd class=\"tg-c3ow\"\u003e 57.96% \u003cbr\u003e\u003ca href=\"https://colab.research.google.com/drive/1NRphgik9y0fmZP_7MDUjMq6zTP2AfTMj\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Colab Code\" width=\"87\" height=\"15\"\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\nIf you tested our models on a public dataset, and you wanted to add your results to the table above, open a pull request or contact us. Also, make sure to have your code available online so that we can add a reference.\n\n# Contributors\n\n- Mehrdad Azizi: [**Linkedin**](https://www.linkedin.com/in/mehrdad-azizi-50839489/), [**Github**](https://github.com/mehrazi)\n- Reza Salehi Chegeni: [**Linkedin**](https://www.linkedin.com/in/reza-salehi-chegeni-6988ba271/), [**Github**](https://github.com/rezasalehichegeni)\n- Parisa Mousavi: [**Linkedin**](https://www.linkedin.com/in/seyede-parisa-mousavi/), [**Github**](https://github.com/Mousavi-Parisa)\n- Iman Hashemi: [**Linkedin**](https://www.linkedin.com/in/iman-hashemi-403738a5), [**Github**](https://github.com/hashemiiman)\n\n# Releases\n\n**v1.0(2024-03-09)**\n\nFirst version of **Tehran** and **Shiraz** models trained on **DIVAN**.\n\n# License\n\nBy contributing to this project, you agree that your contributions will be licensed under the [**Apache License 2.0**](https://www.apache.org/licenses/LICENSE-2.0)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flifeweb-ir%2Flm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flifeweb-ir%2Flm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flifeweb-ir%2Flm/lists"}