{"id":13595060,"url":"https://github.com/nyu-mll/GLUE-baselines","last_synced_at":"2025-04-09T10:32:49.032Z","repository":{"id":39705916,"uuid":"109272230","full_name":"nyu-mll/GLUE-baselines","owner":"nyu-mll","description":"[DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations","archived":false,"fork":false,"pushed_at":"2021-08-03T22:12:41.000Z","size":1510,"stargazers_count":791,"open_issues_count":16,"forks_count":165,"subscribers_count":25,"default_branch":"master","last_synced_at":"2025-04-04T07:08:04.294Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://gluebenchmark.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nyu-mll.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-02T14:02:38.000Z","updated_at":"2025-03-22T09:31:24.000Z","dependencies_parsed_at":"2022-08-09T15:24:49.063Z","dependency_job_id":null,"html_url":"https://github.com/nyu-mll/GLUE-baselines","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nyu-mll%2FGLUE-baselines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nyu-mll%2FGLUE-baselines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nyu-mll%2FGLUE-baselines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nyu-mll%2FGLUE-baselines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nyu-mll","download_url":"https://codeload.github.com/nyu-mll/GLUE-baselines/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248020593,"owners_count":21034459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:43.208Z","updated_at":"2025-04-09T10:32:44.024Z","avatar_url":"https://github.com/nyu-mll.png","language":"Python","funding_links":[],"categories":["📂 Benchmarks \u0026 Datasets","Python","Evaluation","Anthropomorphic-Taxonomy"],"sub_categories":["Typical Intelligence Quotient (IQ)-General Intelligence evaluation benchmarks"],"readme":"# GLUE Baselines\nThis repo contains the code for baselines for the [Generalized Language Understanding Evaluation](https://gluebenchmark.com/) (GLUE) benchmark.\nSee [our paper](https://openreview.net/pdf?id=rJ4km2R5t7) for more details about GLUE or the baselines.\n\n# Deprecation Warning\n\nUse this code to reproduce our baselines. If you want code to use as a starting point for new development, though, we strongly recommend using [jiant](https://github.com/jsalt18-sentence-repl/jiant) instead—it's a much more extensive and much better-documented toolkit built around the same goals.\n\n## Dependencies\n\nMake sure you have installed the packages listed in environment.yml.\nWhen listed, specific particular package versions are required.\nIf you use conda, you can create an environment from this package with the following command:\n\n```\nconda env create -f environment.yml\n```\n\nNote: The version of AllenNLP available on pip may not be compatible with PyTorch 0.4, in which we recommend installing from [source](https://github.com/allenai/allennlp).\n\n## Downloading GLUE\n\nWe provide a convenience python script for downloading all GLUE data and standard splits.\n\n```\npython download_glue_data.py --data_dir glue_data --tasks all\n```\n\nAfter downloading GLUE, point ``PATH_PREFIX`` in  ``src/preprocess.py`` to the directory containing the data.\n\nIf you are blocked from s3.amazonaws.com (as may be the case in China), downloading MRPC will fail, instead you can run the command below:\n\n```\ngit clone https://github.com/wasiahmad/paraphrase_identification.git\npython download_glue_data.py --data_dir glue_data --tasks all --path_to_mrpc=paraphrase_identification/dataset/msr-paraphrase-corpus\n```\n\n## Running\n\nTo run our baselines, use ``src/main.py``.\nBecause preprocessing is expensive (particularly for ELMo) and we often want to run multiple experiments using the same preprocessing, we use an argument ``--exp_dir`` for sharing preprocessing between experiments. We use argument ``--run_dir`` to save information specific to a particular run, with ``run_dir`` usually nested within ``exp_dir``.\n\n\n```\npython main.py --exp_dir EXP_DIR --run_dir RUN_DIR --train_tasks all --word_embs_file PATH_TO_GLOVE\n```\n\nNB: The version of AllenNLP used has [issues](https://github.com/allenai/allennlp/issues/342) with tensorboard. You may need to substitute calls ``from tensorboard import SummaryWriter`` to ``from tensorboardX import SummaryWriter`` in your AllenNLP source files.\n\n\n## GloVe, CoVe, and ELMo\n\nMany of our models make use of [GloVe pretrained word embeddings](https://nlp.stanford.edu/projects/glove/), in particular the 300-dimensional, 840B version.\nTo use GloVe vectors, download and extract the relevant files and set ``word_embs_file`` to the GloVe file.\nTo learn embeddings from scratch, set ``--glove`` to 0.\n\nWe use the CoVe implementation provided [here](https://github.com/salesforce/cove).\nTo use CoVe, clone the repo and fill in ``PATH_TO_COVE`` in ``src/models.py`` and set ``--cove`` to 1.\n\nWe use the ELMo implementation provided by [AllenNLP](https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md).\nTo use ELMo, set ``--elmo`` to 1. To use ELMo without GloVe, additionally set ``--elmo_no_glove`` to 1.\n\n## Reference\n\nIf you use this code or GLUE, please consider citing us.\n\n```\n @unpublished{wang2018glue\n     title={{GLUE}: A Multi-Task Benchmark and Analysis Platform for\n             Natural Language Understanding}\n     author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill,\n             Felix and Levy, Omer and Bowman, Samuel R.}\n     note={arXiv preprint 1804.07461}\n     year={2018}\n }\n```\n\nFeel free to contact alexwang _at_ nyu.edu with any questions or comments.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnyu-mll%2FGLUE-baselines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnyu-mll%2FGLUE-baselines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnyu-mll%2FGLUE-baselines/lists"}