{"id":13738105,"url":"https://github.com/jayelm/emergent-generalization","last_synced_at":"2026-01-23T18:20:36.621Z","repository":{"id":62986472,"uuid":"373944198","full_name":"jayelm/emergent-generalization","owner":"jayelm","description":"Emergent Communication of Generalizations, NeurIPS 2021","archived":false,"fork":false,"pushed_at":"2021-09-29T04:47:22.000Z","size":264,"stargazers_count":14,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-15T06:33:23.690Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2106.02668","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jayelm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-04T19:43:37.000Z","updated_at":"2024-09-03T08:33:52.000Z","dependencies_parsed_at":"2022-11-10T11:16:34.646Z","dependency_job_id":null,"html_url":"https://github.com/jayelm/emergent-generalization","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayelm%2Femergent-generalization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayelm%2Femergent-generalization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayelm%2Femergent-generalization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayelm%2Femergent-generalization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jayelm","download_url":"https://codeload.github.com/jayelm/emergent-generalization/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253096424,"owners_count":21853600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T03:02:11.392Z","updated_at":"2026-01-23T18:20:36.612Z","avatar_url":"https://github.com/jayelm.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Emergent Communication of Generalizations\n\nhttps://arxiv.org/abs/2106.02668\n\nNeurIPS 2021\n\n## Setup\n\n- Download and process birds (CUB) data [here](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html), then unzip into `/data/cub` directory (i.e. the filepath should be `data/cub/CUB_200_2011/*`), then run `python save_cub_np.py` in the `/data` directory to save cub images to easily accessible npz format.\n- Download and process ShapeWorld data: use `data/download_shapeworld.sh`. There are 3 datasets:\n    - [shapeworld](http://nlp.stanford.edu/data/muj/emergent-generalization/shapeworld/shapeworld.tar.gz): 20k games over the 312 conjunctive concepts\n    - [shapeworld_ref](http://nlp.stanford.edu/data/muj/emergent-generalization/shapeworld/shapeworld_ref.tar.gz): 20k games over the 30 conjunctive concepts possible for\n        reference games only\n    - [shapeworld_all](http://nlp.stanford.edu/data/muj/emergent-generalization/shapeworld/shapeworld_all.tar.gz): 20k games over the 312 conjunctive concepts, *but no\n        compositional split*.\n\nNote that running concept/setref game agents loads the shapeworld ref dataset\n(and vice versa) because we do zero shot eval, so download everything.\n\nCode used for generating the ShapeWorld data is located [here](https://github.com/jayelm/minishapeworld/tree/neurips2021).\n\n## Running experiments\n\n### Quickstart\n\n`./run_cub.sh` and `./run_sw.sh` contain ready-to-go commands for running\nreference, set reference, and concept agents for birds and ShapeWorld,\nrespectively. See those scripts for more details. If you need more info, read\non for the basic workflow.\n\n### 1. Train model\n\nRun `python code/train.py --cuda --name NAME --dataset DATASET` where `NAME` is\nan experiment name (results saved in `exp/NAME`) and `DATASET` is the dataset\n(`data/cub`, `data/shapeworld`, or `data/shapeworld_ref`).\nTo specify the type of game use the following additional flags:\n\n- `--percent_novel 1.0`: runs concept game (i.e. percent novel indicates what\n    % of images are novel to the student; note you can try values in between if\n    you'd like)\n- `--percent_novel 0.0`: runs setref game\n- `--percent_novel 0.0 --reference_game`: runs reference game. **For\n    ShapeWorld, be sure to pass in the 30-concept reference game dataset `shapeworld_ref`,\n    not the standard 312-concept dataset `shapeworld`!**\n\nAdditional flags relevant for experiments:\n- `--max_lang_length`: maximum message length. **This includes sos/eos token,\n    so the true length is this value minus 2**\n- `--vocab_size`: vocab size of the agents.\n- `--n_examples`: number of examples given to agents\n- `--uniform_weight`: add uniform noise to gumbel softmax exploration policy\n- `--wandb`: activate wandb logging (run `wandb init` yourself)\n\nThere are other options documented in `code/io_util.py`.\n\n\n#### Metrics\n\nMetrics are logged into `exp/NAME/metrics.csv` and logged to wandb (if\n`--wandb` is enabled). The relevant ones are:\n\n- `train_acc`:\n- `test`. For shapeworld, this metric is split into `{test,val}_acc` and `{test,val}_same_acc` to denote unseen and seen splits, respectively, where `{test,val}_avg_acc` averages the two.\n- `{train,test}_langts`/`{train,test}_ts`: edit-distance based topographic similarity. For Birds (`cub`), the metric is `ts`; for ShapeWorld the metric is `langts`.\n- `{train,test}_hausdorff`: Hausdorff distance based topographic similarity.\n\nThere are many other metrics, most of which should be reasonably intuitive,\nthough contact authors for clarifications.\n\nThere are also all of the above metrics split by game type (we eval ref agents on setref, concept, etc).\n\nMutual information and entropy are measured later (see below).\n\n### 2. Sample language from model\n\nThe above command produces a `metrics.csv` with most metrics, but I measure\nentropy and AMI at the end by sampling a bunch of language from the model and\nanalyzing that corpus. To do so, run\n\n```\n# (no --cuda flag needed; will use whatever flag was set at train time)\npython code/sample.py exp/NAME\n```\n\nwhich by default samples 200k messages from a trained model into\n`exp/NAME/sampled_lang.csv` and some summary statistics into\n`exp/NAME/sampled_stats.json`.\n\nNow, if you just want the information theoretic systematicity metrics, for both\nBirds and ShapeWorld run\n`python code/acre.py exp/NAME/sampled_lang.csv --dataset DATASET --cuda --stats_only`\nwhich **does not run ACRe**, but rather just dumps some summary statistics:\n\n- `exp/NAME/sampled_lang_overall_stats.json`: this contains entropy,\n    unnormalized mutual information, and adjusted mutual information\n- `exp/NAME/sampled_lang_stats.csv`: this is a list of utterances generated for\n    each concept, with their counts. Also entropy information. This can be used\n    to plot the sunburst (i.e. nested pie) plots in the paper. See \"4.\n    Visualizing Model Outputs\"\n\nAgain, we haven't actually run ACRe. If you want to run ACRe, read on:\n\n### 3. Train ACRe\n\nIf you actually want to train an ACRe model you should train your model with\nthe `shapeworld_all` dataset, which doesn't involve the compositional split\n(though you can still do ACRe analysis on models trained normally).\n\nRun ACRe without the `--stats_only` flag. Rather, run\n\n`python code/acre.py exp/NAME/sampled_lang.csv --dataset DATASET --cuda`\n\nwhich trains an ACRe model to reconstruct the agent language according to the\nconcepts of `DATASET`. This prints out some top1 acc/loss metrics and the\nfollowing files:\n\n- `exp/NAME/sampled_lang_{train,test}_acre_metrics.csv`: overall loss/top1 acc\n    for ACRe reconstruction compared to the ground truth language only (i.e.\n    not evaluating a listener model yet), as well as these metrics broken down\n    by concept\n- `exp/NAME/sampled_lang_{train,test}_sampled_lang.pt`: Contains ground truth\n    model language for both train/test ACRe splits, as well as ACRe\n    reconstructions. This gets used to evaluate a listener in the next section.\n- `exp/NAME/acre_split.json`: The split of train/test concepts used for ACRe.\n\n### 4. Evaluate ACRe on Listener\n\nRun\n\n`python code/eval_zero_shot.py exp/NAME --cuda`.\n\nwhich evaluates across `--epochs` epochs (default 5), categorizing concepts by\nwhether they belong to the ACRe train or test split, and evaluates several\ntypes of language on the listener:\n\n- `ground_truth_1`: the model lang located in `exp/NAME/sampled_lang_{train,test}_sampled_lang.pt`.\n- `same_concept`: language sampled from other model utterances from the same concept\n- `acre`: ACRe reconstructed language.\n- `random`: random language uniformly sampled from the possible set of\n    utterances (not reported in paper; worse than `any_concept`)\n- `any_concept`: random language sampled from utterances from any concept (the random baseline in the paper)\n- `closest_concept`: language sampled from utterances for the \"closest\" concept as measured by edit distance\n- `ground_truth_2`: (sanity check) re-sample language from the teacher; should be close to `ground_truth_1` performance.\n\nThese results are saved into\n\n- `exp/NAME/zero_shot_{train,test}.json`: BLEU-1 and listener acc aggregated\n    across all concepts, and for each concept individually\n- `exp/NAME/zero_shot_lang_type_stats.csv`: a lang stats file similar to\n    `exp/NAME/sampled_lang_stats.csv` described above, which can be used to\n    visualize outputs for the various language distributions as described in\n    the next section.\n\n### 5. Visualizing model outputs\n\nThis requires `R` and the `sunburstR` package, as well as a generated\n`sampled_lang_stats.csv` which is produced by `acre.py` (just the\n`--stats_only` flag will do). Then an example usage is located in lines\n425--441 of `analysis/analysis.Rmd`.\n\n### 6. Evaluating across different games\n\nAccuracy and topographic similarity metrics are evaluated zero-shot across\ndifferent games in the main train script, though entropy/AMI metrics aren't\ncollected. To obtain those, and to get all the results in one place, sample\nlanguage while using a `--force_*` flag to force the game to be ref, setref, or\nconcept. This adds a `_force_{ref,setref,concept}` prefix to every file\noutputted by `sample.py`, e.g. `sampled_lang_force_ref.csv`. For example:\n\n```\npython code/sample.py exp/NAME --force_reference_game\npython code/acre.py exp/NAME/sampled_lang_force_ref.csv --dataset data/shapeworld_ref\n```\n\nwhich now produces `exp/NAME/sampled_stats_force_ref.json`,\n`exp/NAME/sampled_lang_force_ref_overall_stats.json`,\n`exp/NAME/sampled_lang_force_ref_stats.csv`, etc.\n\n**If you're analyzing ShapeWorld, remember to specify the right dataset - either\nref, or setref/concept - when printing summary statistics via `acre.py`**.\n\n## Dependencies\n\nThis code was tested with python 3.8 and `torch==1.8.1`. A specific\nenvironments file is located in `requirements.txt`, but other common package\nversions are likely to be compatible as well.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjayelm%2Femergent-generalization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjayelm%2Femergent-generalization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjayelm%2Femergent-generalization/lists"}