{"id":17498380,"url":"https://github.com/simonepri/lm-scorer","last_synced_at":"2025-07-13T23:35:16.551Z","repository":{"id":38409392,"uuid":"253547077","full_name":"simonepri/lm-scorer","owner":"simonepri","description":"📃Language Model based sentences scoring library","archived":false,"fork":false,"pushed_at":"2022-02-09T22:28:13.000Z","size":4800,"stargazers_count":308,"open_issues_count":8,"forks_count":36,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-06-26T23:26:04.908Z","etag":null,"topics":["language-model","lm","ml","probability","sentence"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simonepri.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-06T15:58:49.000Z","updated_at":"2025-04-27T18:52:52.000Z","dependencies_parsed_at":"2022-07-18T01:30:47.230Z","dependency_job_id":null,"html_url":"https://github.com/simonepri/lm-scorer","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/simonepri/lm-scorer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonepri%2Flm-scorer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonepri%2Flm-scorer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonepri%2Flm-scorer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonepri%2Flm-scorer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simonepri","download_url":"https://codeload.github.com/simonepri/lm-scorer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonepri%2Flm-scorer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265222950,"owners_count":23730320,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["language-model","lm","ml","probability","sentence"],"created_at":"2024-10-19T16:54:21.914Z","updated_at":"2025-07-13T23:35:16.512Z","avatar_url":"https://github.com/simonepri.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n  \u003cb\u003elm-scorer\u003c/b\u003e\n\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n  \u003c!-- PyPi --\u003e\n  \u003ca href=\"https://pypi.org/project/lm-scorer\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/lm-scorer.svg\" alt=\"PyPi version\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://colab.research.google.com/github/simonepri/lm-scorer/blob/master/examples/lm_scorer.ipynb\"\u003e\n    \u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" /\u003e\n  \u003c/a\u003e\n  \u003cbr /\u003e\n  \u003c!-- Lint --\u003e\n  \u003ca href=\"https://github.com/simonepri/lm-scorer/actions?query=workflow:lint+branch:master\"\u003e\n    \u003cimg src=\"https://github.com/simonepri/lm-scorer/workflows/lint/badge.svg?branch=master\" alt=\"Lint status\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- Test - macOS --\u003e\n  \u003ca href=\"https://github.com/simonepri/lm-scorer/actions?query=workflow:test-macos+branch:master\"\u003e\n    \u003cimg src=\"https://github.com/simonepri/lm-scorer/workflows/test-macos/badge.svg?branch=master\" alt=\"Test macOS status\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- Test - Ubuntu --\u003e\n  \u003ca href=\"https://github.com/simonepri/lm-scorer/actions?query=workflow:test-ubuntu+branch:master\"\u003e\n    \u003cimg src=\"https://github.com/simonepri/lm-scorer/workflows/test-ubuntu/badge.svg?branch=master\" alt=\"Test Ubuntu status\" /\u003e\n  \u003c/a\u003e\n  \u003cbr /\u003e\n  \u003c!-- Code style --\u003e\n  \u003ca href=\"https://github.com/ambv/black\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/code%20style-black-000000.svg\" alt=\"Code style\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- Linter --\u003e\n  \u003ca href=\"https://github.com/PyCQA/pylint\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/linter-pylint-ce963f.svg\" alt=\"Linter\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- Types checker --\u003e\n  \u003ca href=\"https://github.com/PyCQA/pylint\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/types%20checker-mypy-296db2.svg\" alt=\"Types checker\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- Test runner --\u003e\n  \u003ca href=\"https://github.com/pytest-dev/pytest\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/test%20runner-pytest-449bd6.svg\" alt=\"Test runner\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- Task runner --\u003e\n  \u003ca href=\"https://github.com/illBeRoy/taskipy\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/task%20runner-taskipy-abe63e.svg\" alt=\"Task runner\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- Build tool --\u003e\n  \u003ca href=\"https://github.com/python-poetry/poetry\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/build%20system-poetry-4e5dc8.svg\" alt=\"Build tool\" /\u003e\n  \u003c/a\u003e\n  \u003cbr /\u003e\n  \u003c!-- License --\u003e\n  \u003ca href=\"https://github.com/simonepri/lm-scorer/tree/master/license\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/license/simonepri/lm-scorer.svg\" alt=\"Project license\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  📃 Language Model based sentences scoring library\n\u003c/p\u003e\n\n## Synopsis\n\nThis package provides a simple programming interface to score sentences using different ML [language models](wiki:language-model).\n\nA simple [CLI](#cli) is also available for quick prototyping.  \nYou can run it locally or on directly on Colab using [this notebook][colab:lm-scorer].\n\nDo you believe that this is *useful*?\nHas it *saved you time*?\nOr maybe you simply *like it*?  \nIf so, [support this work with a Star ⭐️][start].\n\n## Install\n\n```bash\npip install lm-scorer\n```\n\n## Usage\n\n```python\nimport torch\nfrom lm_scorer.models.auto import AutoLMScorer as LMScorer\n\n# Available models\nlist(LMScorer.supported_model_names())\n# =\u003e [\"gpt2\", \"gpt2-medium\", \"gpt2-large\", \"gpt2-xl\", distilgpt2\"]\n\n# Load model to cpu or cuda\ndevice = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\nbatch_size = 1\nscorer = LMScorer.from_pretrained(\"gpt2\", device=device, batch_size=batch_size)\n\n# Return token probabilities (provide log=True to return log probabilities)\nscorer.tokens_score(\"I like this package.\")\n# =\u003e (scores, ids, tokens)\n# scores = [0.018321, 0.0066431, 0.080633, 0.00060745, 0.27772, 0.0036381]\n# ids    = [40,       588,       428,      5301,       13,      50256]\n# tokens = [\"I\",      \"Ġlike\",   \"Ġthis\",  \"Ġpackage\", \".\",     \"\u003c|endoftext|\u003e\"]\n\n# Compute sentence score as the product of tokens' probabilities\nscorer.sentence_score(\"I like this package.\", reduce=\"prod\")\n# =\u003e 6.0231e-12\n\n# Compute sentence score as the mean of tokens' probabilities\nscorer.sentence_score(\"I like this package.\", reduce=\"mean\")\n# =\u003e 0.064593\n\n# Compute sentence score as the geometric mean of tokens' probabilities\nscorer.sentence_score(\"I like this package.\", reduce=\"gmean\")\n# =\u003e 0.013489\n\n# Compute sentence score as the harmonic mean of tokens' probabilities\nscorer.sentence_score(\"I like this package.\", reduce=\"hmean\")\n# =\u003e 0.0028008\n\n# Get the log of the sentence score.\nscorer.sentence_score(\"I like this package.\", log=True)\n# =\u003e -25.835\n\n# Score multiple sentences.\nscorer.sentence_score([\"Sentence 1\", \"Sentence 2\"])\n# =\u003e [1.1508e-11, 5.6645e-12]\n\n# NB: Computations are done in log space so they should be numerically stable.\n```\n\n## CLI\n\n\u003cimg src=\"https://github.com/simonepri/lm-scorer/raw/master/media/cli.gif\" alt=\"lm-scorer cli\" width=\"225\" align=\"right\"/\u003e\n\nThe pip package includes a CLI that you can use to score sentences.\n\n```\nusage: lm-scorer [-h] [--model-name MODEL_NAME] [--tokens] [--log-prob]\n                 [--reduce REDUCE] [--batch-size BATCH_SIZE]\n                 [--significant-figures SIGNIFICANT_FIGURES] [--cuda CUDA]\n                 [--debug]\n                 sentences-file-path\n\nGet sentences probability using a language model.\n\npositional arguments:\n  sentences-file-path   A file containing sentences to score, one per line. If\n                        - is given as filename it reads from stdin instead.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --model-name MODEL_NAME, -m MODEL_NAME\n                        The pretrained language model to use. Can be one of:\n                        gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2.\n  --tokens, -t          If provided it provides the probability of each token\n                        of each sentence.\n  --log-prob, -lp       If provided log probabilities are returned instead.\n  --reduce REDUCE, -r REDUCE\n                        Reduce strategy applied on token probabilities to get\n                        the sentence score. Available strategies are: prod,\n                        mean, gmean, hmean.\n  --batch-size BATCH_SIZE, -b BATCH_SIZE\n                        Number of sentences to process in parallel.\n  --significant-figures SIGNIFICANT_FIGURES, -sf SIGNIFICANT_FIGURES\n                        Number of significant figures to use when printing\n                        numbers.\n  --cuda CUDA           If provided it runs the model on the given cuda\n                        device.\n  --debug               If provided it provides additional logging in case of\n                        errors.\n```\n\n\n## Development\n\nYou can install this library locally for development using the commands below.\nIf you don't have it already, you need to install [poetry](https://python-poetry.org/docs/#installation) first.\n\n```bash\n# Clone the repo\ngit clone https://github.com/simonepri/lm-scorer\n# CD into the created folder\ncd lm-scorer\n# Create a virtualenv and install the required dependencies using poetry\npoetry install\n```\n\nYou can then run commands inside the virtualenv by using `poetry run COMMAND`.  \nAlternatively, you can open a shell inside the virtualenv using `poetry shell`.\n\n\nIf you wish to contribute to this project, run the following commands locally before opening a PR and check that no error is reported (warnings are fine).\n\n```bash\n# Run the code formatter\npoetry run task format\n# Run the linter\npoetry run task lint\n# Run the static type checker\npoetry run task types\n# Run the tests\npoetry run task test\n```\n\n\n## Authors\n\n- **Simone Primarosa** - [simonepri][github:simonepri]\n\nSee also the list of [contributors][contributors] who participated in this project.\n\n\n## License\n\nThis project is licensed under the MIT License - see the [license][license] file for details.\n\n\n\n\u003c!-- Links --\u003e\n\n[start]: https://github.com/simonepri/lm-scorer#start-of-content\n[license]: https://github.com/simonepri/lm-scorer/tree/master/license\n[contributors]: https://github.com/simonepri/lm-scorer/contributors\n\n[colab:lm-scorer]: https://colab.research.google.com/github/simonepri/lm-scorer/blob/master/examples/lm_scorer.ipynb\n\n[wiki:language-model]: https://en.wikipedia.org/wiki/Language_model\n\n[github:simonepri]: https://github.com/simonepri\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonepri%2Flm-scorer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimonepri%2Flm-scorer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonepri%2Flm-scorer/lists"}