{"id":19611913,"url":"https://github.com/princeton-nlp/semsup","last_synced_at":"2025-04-27T22:33:47.068Z","repository":{"id":41337953,"uuid":"462907586","full_name":"princeton-nlp/semsup","owner":"princeton-nlp","description":"Semantic Supervision: Enabling Generalization over Output Spaces","archived":false,"fork":false,"pushed_at":"2023-01-04T22:00:25.000Z","size":8614,"stargazers_count":16,"open_issues_count":1,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-05T04:31:47.384Z","etag":null,"topics":["deep-learning","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/princeton-nlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-02-23T21:13:15.000Z","updated_at":"2023-09-15T14:02:36.000Z","dependencies_parsed_at":"2023-02-02T21:16:49.777Z","dependency_job_id":null,"html_url":"https://github.com/princeton-nlp/semsup","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/princeton-nlp%2Fsemsup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/princeton-nlp%2Fsemsup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/princeton-nlp%2Fsemsup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/princeton-nlp%2Fsemsup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/princeton-nlp","download_url":"https://codeload.github.com/princeton-nlp/semsup/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251219600,"owners_count":21554444,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","machine-learning"],"created_at":"2024-11-11T10:44:58.254Z","updated_at":"2025-04-27T22:33:42.049Z","avatar_url":"https://github.com/princeton-nlp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Semantic Supervision: Enabling Generalization over Output Spaces\n\n[**Website**](https://sites.google.com/princeton.edu/semantic-supervision/) |\n[**Paper**](https://arxiv.org/abs/2202.13100)\n\n![semsup](https://user-images.githubusercontent.com/8196527/200193827-9f1fb40b-35f3-4e6b-a685-0005319b6bbc.gif)\n\n## Setup\nFirst clone the repository and install dependencies:\n```\ngit clone https://github.com/princeton-nlp/semsup.git\npip install -e semsup\n```\n\nInside the `semsup` folder, download class descriptions and make the cache folder:\n```\ncd semsup\nbash download.sh\nmkdir data_cache\n```\n\nExperiments in our paper were run using `python 3.9.7` and `CUDA toolkit 11.1`.\n\n## How to run the code\nScripts and config files to train and evaluate models on AWA2, CIFAR and 20Newsgroups are found in the folders `run_{awa, cifar, ng}`. Training commands are found in the bash scripts `run_{dataset}_{model}.sh` in each folder. Each config file has a `scen{1,2,3}` in the file name to indicate which scenario the config is for (see the paper for details on the scenarios). Run each script inside the respective folder. For example:\n```\ncd run_cifar\nbash run_cifar_semsup.sh\n```\n\nDataset downloading to the `data_cache` directory is automatically handled. Test commands are provided in the `test_{awa, cifar, ng}.sh` scripts. Note that you will need to change the `--checkpoints` argument to point to where the model weights are stored.\n\n### RCV1\nObtaining RCV1 data and running RCV1 code require additional instructions that can be found [here](run_rcv1/RCV1_README.md).\n\n## Notes\n### Building on this code\nWe use `pytorch-lightning` (except for RCV1). The `SemSupDataModule` class in `semsup/data/core.py` inherits from `pl.LightningDataModule` and handles the tokenization, caching, and sampling of class descriptions. All data modules in `semsup/data/{awa, cifar, newsgroups}` inherit from `SemSupDataModule`.\n\nThe `SemSupModel` in `semsup/model/core.py` inherits from `pl.LightningModule` and constructs the output matrix. All SemSup models inherit from this class. Together, the two base `SemSup` classes allow for easy conversion of any supervised problem into a `SemSup` approach.\n\nThe class descriptions `.labels` files are a list of jsons with `\"text\"` and `\"label\"` keys. The label be one of the labels provided in the `train_classes` and `val_classes` variables in `SemSupDataArgs`.\n\n### Running in Colab\n(For AWA, CIFAR, Newsgroups). To run the code in Google Colab, make sure to run the following cell first to setup your environment.\n```\n!git clone https://github.com/princeton-nlp/semsup.git\n%cd semsup/\n!pip uninstall --yes torchtext torchaudio # not compatible with our version of torch\n!pip install -e .\n!bash download.sh\n!mkdir data_cache\n# RESTART RUNTIME TO UPDATE NEWLY INSTALLED MODULES\n```\nYou should now be able to run the scripts (make sure to `cd` to the folder) for example:\n```\n%cd /content/semsup/run_ng\n!bash unit_test_ng.sh\n```\n### License\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprinceton-nlp%2Fsemsup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprinceton-nlp%2Fsemsup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprinceton-nlp%2Fsemsup/lists"}