{"id":21169770,"url":"https://github.com/andreasgrv/unargmaxable","last_synced_at":"2026-03-06T07:03:06.405Z","repository":{"id":72077041,"uuid":"471480067","full_name":"andreasgrv/unargmaxable","owner":"andreasgrv","description":"Tools for detecting unargmaxable classes in low-rank classifiers","archived":false,"fork":false,"pushed_at":"2023-02-07T19:26:22.000Z","size":43,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-30T00:34:20.309Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andreasgrv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-18T18:42:55.000Z","updated_at":"2024-03-07T16:43:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"d5dd14f2-4015-48d7-8856-98ffe3763deb","html_url":"https://github.com/andreasgrv/unargmaxable","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/andreasgrv/unargmaxable","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreasgrv%2Funargmaxable","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreasgrv%2Funargmaxable/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreasgrv%2Funargmaxable/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreasgrv%2Funargmaxable/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andreasgrv","download_url":"https://codeload.github.com/andreasgrv/unargmaxable/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreasgrv%2Funargmaxable/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30164901,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T04:43:31.446Z","status":"ssl_error","status_checked_at":"2026-03-06T04:40:30.133Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-20T15:53:37.994Z","updated_at":"2026-03-06T07:03:06.364Z","avatar_url":"https://github.com/andreasgrv.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Unargmaxable\n\n\n## Definition\n### un·argmax·able Adjective\n\n* An item that cannot be assigned the largest score by a function assigning scores to items.\n\n\n## Contents\n\nThis repository contains algorithms for detecting **unargmaxable** classes in low-rank softmax layers.\nA softmax layer is by construction low-rank if we have C \u003e d + 1, where C is the number of classes\nand d is the dimensionality of the input feature vector.\n\n### [Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice](https://arxiv.org/abs/2203.06462)\nThe repository also contains code to reproduce our results, tables and figures from our paper that was accepted to ACL 2022.\n\n# Installation\n\n## Install python dependencies\n```bash\npython3.7 -m venv .env\nsource .env/bin/activate\npip install -r requirements.txt\npip install -e .\n```\n\n\n## Set environment variables\n\n```bash\nexport OMP_NUM_THREADS=1\nexport STOLLEN_NUM_PROCESSES=4\n\n# Adapt below as needed\nexport FLASK_APP=\"$PWD/stollen/server\"\n# Adapt below if you would rather install models elsewhere\nmkdir models\nexport TRANSFORMERS_CACHE=\"$PWD/models\"\n```\n\n\n### Details on environment variables\n* `export OMP_NUM_THREADS=1` is needed as otherwise we don't benefit from multithreading (numpy hogs all threads).\n* You can set `STOLLEN_NUM_PROCESSES` if you want to run the search on multiple CPUs/threads. Each thread processes a single vocabulary item in parallel. We used `export STOLLEN_NUM_PROCESSES=10` on an AMD 3900X CPU with 64 Gb of RAM.\n\n\n## Install [Gurobi](https://www.gurobi.com/academia/academic-program-and-licenses/)\n\nThe linear programming algorithm depends on Gurobi.\nIt requires a license, see link above.\n\n\n# Example Usage\n\n## Verify a randomly initialised softmax layer\n\nThis script exists as a sanity check for our algorithms.\nWe assert that we can detect which points are internal to the convex hull.\nTo make this assertion we compare results to [QHull](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.ConvexHull.html).\n\n\nAre any of the 20 class weight vectors randomly initialised in 2 and 3 dimensions unargmaxable?\n\n```bash\nstollen_random --num-classes 20 --dim 2\nstollen_random --num-classes 20 --dim 3\nstollen_random --help   # For more details / options\n```\n\nIf the dimension is 2 or 3 we also plot the resulting convex hull for visualisation purposes.\nThe result of the algorithm is also compared to the exact Qhull result if `dim \u003c 10`.\nThe approximate algorithm will have 100% recall but may have lower precision.\n```bash\nstollen_random --num-classes 300 --dim 8 --seed 3  --patience 50\n```\n\nBelow we run the exact algorithm, this should always return 100% for both precision and recall unless the input range is too large.\n```bash\nstollen_random --num-classes 300 --dim 8 --seed 3  --patience 50 --exact-algorithm lp_chebyshev\n```\n\n\n## Verify that unargmaxable tokens can be avoided by using weight normalisation\n\nAs a sanity check we verify that all classes are argmaxable when we normalise the weights or set the bias term as mentioned in Appendix D of the paper.\n\n```bash\nstollen_prevention --num-classes 500 --dim 10\nstollen_prevention --num-classes 500 --dim 10 --use-bias\n```\n\nWe can also see that the script would raise an assertion error if we did not follow the normalisation step.\n\n```bash\nstollen_prevention --num-classes 500 --dim 10 --do-not-prevent\nstollen_prevention --num-classes 500 --dim 10 --use-bias --do-not-prevent\n```\n\nNote that in high dimensions unargmaxable tokens are not expected to exist if we randomly initialise the weight vectors.\n\n\n## Verify a model stored in numpy.npz format\n\nExpects the weight matrix to be in **decoder_Wemb** attribute.\nTakes transpose, since expects the matrix in [dim, num_classes] format.\n\n```bash\nstollen_numpy --numpy-file path-to-numpy-model.npz\n```\n\n\n## Verify a model from HuggingFace\n\n```bash\nstollen_hugging --url https://huggingface.co/bert-base-cased --patience 2500 --exact-algorithm lp_chebyshev\n```\n\nNB: The script does not work with any arbitrary model: It needs to be adapted if the Softmax weights and bias are stored in an unforeseen variable.\n\n\n# Reproducing the Paper Results\n\n\n## Run experiments\n\nScripts to reproduce experiments can be found [here](experiments/stollen_search), see the README.md file for details.\nThe scripts generally write to a postgres database, but the ``save-db`` parameter can be toggled within the script to change that.\n\n## Recreate tables and figures from the original experiments\n\n### More Installation Steps needed\n\n#### OS dependencies\n\n* wget\n* gunzip\n* psql\n\n\n#### Install database\n```bash\nexport FLASK_APP=\"$PWD/stollen/server\"\ncd db\n\nexport DB_FOLDER=\"$PWD/stollen_data\"\nexport DB_PORT=5436\nexport DB_USER=`whoami`\nexport DB_NAME=stollenprob\n# fun times\nexport DB_PASSWD=\"cov1d\"\nexport PGPASSWORD=$DB_PASSWD\nexport DB_HOST=\"localhost\"\nexport DB_SSL=\"prefer\"\n\n# Creates the database, tables etc.\n./install.sh\n\n# Will download the tables in CSV format from aws s3\n# and populate the psql database\n# (the csv files are saved in the data folder - e.g. if you want to use pandas)\n./download_and_populate_db.sh\n```\n\n#### Deleting the database after done\n\nFrom the `db` folder, run:\n\n```bash\n# IMPORTANT: Run stop before deleting any files\n./stop.sh\nrm -r stollen_data\nrm -r migrations\n```\n\nThe following scripts generally accept a file with experiment ids to plot/aggregate.\nFor example:\n\n```\ncd ../paper/plots\npython plot_bounded.py  --ids-file datafiles/bounded.txt --title \"bounded models\"\n```\n\n```\npaper/\n├── appendix\n│   ├── braid-slice-regions\n│   └── check_quantiles\n├── plots\n│   ├── plot_bounded.py\n│   ├── plot_random_iterations.py\n│   ├── plot_row_iterations.py\n│   ├── plot.sh\n│   ├── stolen_probability.py\n│   └── stolen_probability_with_convex.py\n└── tables\n    ├── plot_iterations.py\n    └── print_bounded_table.py\n```\n\nYou can use the above with the experiment ids generated from your own experiments, assuming you save them to the database.\n\n### Generating plots from the paper\n\nFrom the `paper/plots` folder run:\n```bash\n./plot.sh\n```\nThis assumes you have installed and populated the database mentioned above.\n\n# Related Work\n\n* [Demeter(2020)](https://arxiv.org/abs/2005.02433) identified that unargmaxable classes can arise in classification layers and coined the more general phenomenon Stolen Probability.\n* Warren D. Smith comprehensively summarises [the history of the problem](https://rangevoting.org/WilsonOrder.html).\n\n\n# Citation\n\nPlease cite our work as:\n\n\n```\n@inproceedings{grivas-etal-2022-low,\n    title = \"Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice\",\n    author = \"Grivas, Andreas  and\n      Bogoychev, Nikolay  and\n      Lopez, Adam\",\n    booktitle = \"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\n    month = may,\n    year = \"2022\",\n    address = \"Dublin, Ireland\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2022.acl-long.465\",\n    doi = \"10.18653/v1/2022.acl-long.465\",\n    pages = \"6738--6758\",\n    abstract = \"Classifiers in natural language processing (NLP) often have a large number of output classes. For example, neural language models (LMs) and machine translation (MT) models both predict tokens from a vocabulary of thousands. The Softmax output layer of these models typically receives as input a dense feature representation, which has much lower dimensionality than the output. In theory, the result is some words may be impossible to be predicted via argmax, irrespective of input features, and empirically, there is evidence this happens in small language models (Demeter et al., 2020). In this paper we ask whether it can happen in practical large language models and translation models. To do so, we develop algorithms to detect such unargmaxable tokens in public models. We find that 13 out of 150 models do indeed have such tokens; however, they are very infrequent and unlikely to impact model quality. We release our algorithms and code to the public.\",\n}\n```\n\n# Trivia\nAs we get closer to Christmas, [stollen](https://en.wikipedia.org/wiki/Stollen) probability increases.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreasgrv%2Funargmaxable","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandreasgrv%2Funargmaxable","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreasgrv%2Funargmaxable/lists"}