{"id":13869740,"url":"https://github.com/OATML/non-parametric-transformers","last_synced_at":"2025-07-15T18:31:52.185Z","repository":{"id":37713616,"uuid":"374410217","full_name":"OATML/non-parametric-transformers","owner":"OATML","description":"Code for \"Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning\"","archived":false,"fork":false,"pushed_at":"2024-03-21T14:59:30.000Z","size":134,"stargazers_count":404,"open_issues_count":0,"forks_count":41,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-11-23T15:35:38.202Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OATML.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-06T16:31:16.000Z","updated_at":"2024-11-23T04:11:01.000Z","dependencies_parsed_at":"2024-11-23T15:42:04.329Z","dependency_job_id":null,"html_url":"https://github.com/OATML/non-parametric-transformers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/OATML/non-parametric-transformers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OATML%2Fnon-parametric-transformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OATML%2Fnon-parametric-transformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OATML%2Fnon-parametric-transformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OATML%2Fnon-parametric-transformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OATML","download_url":"https://codeload.github.com/OATML/non-parametric-transformers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OATML%2Fnon-parametric-transformers/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265451443,"owners_count":23767768,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T20:01:14.724Z","updated_at":"2025-07-15T18:31:52.177Z","avatar_url":"https://github.com/OATML.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning\n\n  **[Overview](#overview)**\n| **[Abstract](#abstract)**\n| **[Installation](#installation)**\n| **[Examples](#examples)**\n| **[Citation](#citation)**\n\n[![arXiv](https://img.shields.io/badge/arXiv-2106.02584-b31b1b.svg)](https://arxiv.org/abs/2106.02584)\n[![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/)\n[![Pytorch](https://img.shields.io/badge/Pytorch-1.7-red.svg)](https://shields.io/)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity)\n\n\n## Overview\n\nHi, good to see you here! 👋\n\nThanks for checking out the code for Non-Parametric Transformers (NPTs).\n\nThis codebase will allow you to reproduce experiments from the paper as well as use NPTs for your own research.\n\n## Abstract\n\nWe challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.\n\n## Installation\n\nSet up and activate the Python environment by executing\n\n```\nconda env create -f environment.yml\nconda activate npt\n```\n\nFor now, we recommend installing CUDA \u003c= 10.2:\n\nSee [issue with CUDA \u003e= 11.0 here](https://github.com/pytorch/pytorch/issues/47908).\n \nIf you are running this on a system without a GPU, use the above with `environment_no_gpu.yml` instead.\n\n## Examples\n\nWe now give some basic examples of running NPT.\n\nNPT downloads all supported datasets automatically, so you don't need to worry about that.\n\nWe use [wandb](http://wandb.com/) to log experimental results.\nWandb allows us to conveniently track run progress online.\nIf you do not want wandb enabled, you can run `wandb off` in the shell where you execute NPT.\n\nFor example, run this to explore NPT with default configuration on Breast Cancer\n\n```\npython run.py --data_set breast-cancer\n```\n\nAnother example: A run on the poker-hand dataset may look like this\n\n```\npython run.py --data_set poker-hand \\\n--exp_batch_size 4096 \\\n--exp_print_every_nth_forward 100\n```\n\nYou can find all possible config arguments and descriptions in `NPT/configs.py` or using `python run.py --help`.\n\nIn `scripts/` we provide a list with the runs and correct hyperparameter configurations presented in the paper.\n\nWe hope you enjoy using the code and please feel free to reach out with any questions 😊\n\n\n## Citation\n\nIf you find this code helpful for your work, please cite our paper\n[Paper](https://arxiv.org/abs/2106.02584) as\n\n```bibtex\n@article{kossen2021self,\n  title={Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning},\n  author={Kossen, Jannik and Band, Neil and Gomez, Aidan N. and Lyle, Clare and Rainforth, Tom and Gal, Yarin},\n  journal={arXiv:2106.02584},\n  year={2021}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOATML%2Fnon-parametric-transformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOATML%2Fnon-parametric-transformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOATML%2Fnon-parametric-transformers/lists"}