{"id":19505661,"url":"https://github.com/xnought/symax","last_synced_at":"2026-04-19T01:02:21.041Z","repository":{"id":183744715,"uuid":"670678341","full_name":"xnought/symax","owner":"xnought","description":"Replacement for softmax in attention. Symmetric conversion values [0, 1] with favorable limits inspired by softmax_1. ","archived":false,"fork":false,"pushed_at":"2023-07-25T20:58:28.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-08T10:46:42.221Z","etag":null,"topics":["attention","softmax"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xnought.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-07-25T15:26:45.000Z","updated_at":"2023-07-25T17:28:14.000Z","dependencies_parsed_at":"2023-09-25T06:46:58.659Z","dependency_job_id":null,"html_url":"https://github.com/xnought/symax","commit_stats":{"total_commits":14,"total_committers":1,"mean_commits":14.0,"dds":0.0,"last_synced_commit":"8843644f27ab0709de05e16c4133de012b3123dc"},"previous_names":["xnought/selectmax","xnought/symax"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xnought%2Fsymax","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xnought%2Fsymax/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xnought%2Fsymax/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xnought%2Fsymax/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xnought","download_url":"https://codeload.github.com/xnought/symax/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240754363,"owners_count":19852189,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","softmax"],"created_at":"2024-11-10T22:33:10.580Z","updated_at":"2026-04-19T01:02:20.979Z","avatar_url":"https://github.com/xnought.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `symax`\n\nSymmetric selection of values in attention that is not shift invariant. (this is an experiment) If I'm not using softmax for probabilties, why would I care about shift invariance? That's one assumption. The other came from this Evan Miller dude which is that the limits don't go to one for extreme values for softmax which creates issues.\n\nInspired this one dudes tweet on why softmax is giving weirdness in attention\n\n$$\\text{Attention}\\left(Q, K, V\\right) = \\text{softmax}\\left(\\frac{QK^T}{\\sqrt{d}}\\right)V$$\n\nIn summary, the fact that you need to choose between discrete entities into a probability forces you to weigh stuff high even if it isn't pertinent.\n\nIn the case where the query and key shouldn't weigh any values, what do you do? Apparently it was found that certain tokens have extreme spikes (like space tokens) which mess things up.\n\nThe dude thinks a +1 fixes a lot of this in softmax.\n\nhttps://www.evanmiller.org/attention-is-off-by-one.html\n\n## Why not just remove e at this point\n\nIntuitively, why not make this symmetric by taking the `abs` instead of exponentiating the vector for softmax?\n\nThen, the attention mechanism would be sensitive to values in either direction and still have a limit that is favorable to extremities\n\n$$\\text{symax}(x, \\eta)_i = \\frac{|x_i|}{\\eta + {\\sum | x_j |}}$$\n\n-   Where |s| represents the absolute value of a number s. I guess you could interpret this as the $||x_i||_1^1$ (1-norm) so you could extend this to other norms probably.\n-   Where $\\eta$ is a scalar number that defaults to 1, (so the limit at infinities goes to 0) or learnable parameter (haven't tested this).\n-   Note this is not a probability distribution since sum \u003c 1\n\n(also begs the question if other norms would be more favorable and what type of behavior you might get)\n\n```python\ndef symax(x: tensor, eta=1, dim=0):\n    sizes = torch.abs(x)\n    return sizes / (eta + sizes.sum(dim=dim))\n```\n\n## Limits\n\nFor resonable computable large values, if all the $x$s are extreme, will tend towards 0. For example all values in $x$ are very very negative, will tend, symax will tend towards 0.\n\n## Sym Attention\n\nWith a default of $\\eta=1$ (not shown)\n\n$$\\text{SymAttention}\\left(Q, K, V\\right) = \\text{symax}\\left(\\frac{QK^T}{\\sqrt{d}}\\right)V$$\n\n## TODO\n\n-   Train a model with `SymAttention` and see what kind of values I get!\n-   Compare with regular `Attention` and with a modified $\\text{softmax}_1$ version too\n\nTest performance too!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxnought%2Fsymax","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxnought%2Fsymax","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxnought%2Fsymax/lists"}