{"id":21295631,"url":"https://github.com/eric11eca/common-bench","last_synced_at":"2025-03-15T17:14:03.632Z","repository":{"id":105984455,"uuid":"575852685","full_name":"eric11eca/common-bench","owner":"eric11eca","description":"EPFL Machine Learning course project 2. Associated with NLP Lab. Commonsense reasoning benchmark and probing for large language models.","archived":false,"fork":false,"pushed_at":"2022-12-22T12:48:19.000Z","size":25281,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-22T06:48:20.519Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eric11eca.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-08T12:55:48.000Z","updated_at":"2023-04-13T08:41:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"bffbd8df-339a-4e83-9d52-e0c696a2a83f","html_url":"https://github.com/eric11eca/common-bench","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric11eca%2Fcommon-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric11eca%2Fcommon-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric11eca%2Fcommon-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric11eca%2Fcommon-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eric11eca","download_url":"https://codeload.github.com/eric11eca/common-bench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243762267,"owners_count":20343979,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-21T14:05:51.834Z","updated_at":"2025-03-15T17:14:03.627Z","avatar_url":"https://github.com/eric11eca.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n#  ML Project 2: Human-centerd Commonsense Benchmark\n\nEPFL Machine Learning course project 2. Associated with NLP Lab. Commonsense reasoning benchmark and probing for large language models.\n\n###  Baseline Models\n\nWe employed [T5](https://arxiv.org/pdf/1910.10683.pdf) based models.\n\n* [UnifiedQA](https://arxiv.org/abs/2005.00700)\n\n* [Macaw](https://arxiv.org/abs/2109.02593)\n\n* [FLAN](https://ai.googleblog.com/2021/10/introducing-flan-more-generalizable.html)\n\n* [T0++](https://huggingface.co/bigscience/T0pp)\n\nAlso, we employed large language models.\n\n* [OPT66B](https://huggingface.co/facebook/opt-66b/tree/main)\n\n* [GPT3](https://openai.com/api/)\n\n###  Human-centered Commonsense Benchmark\n\nWe employed 5 different commonsense benchmarks from social interaction to ethical judgment that human could face in every real-life.\n\n* [Theory of Mind Task Dataset](https://arxiv.org/abs/1808.09352)\n\n* [Social Interaction QA](https://arxiv.org/abs/1904.09728)\n\n* [Complementary Commonsense](https://arxiv.org/abs/2106.00969)\n\n* [SCRUPLES](https://paperswithcode.com/paper/scruples-a-corpus-of-community-ethical)\n\n* [COmmonsense Dataset Adversarially-authored by Humans](https://arxiv.org/abs/1904.04365)\n\n## Download Datasets from: [https://drive.google.com/drive/folders/1eSjhEyg7w4wZJS39ptEimIi-4H2stT7h?usp=sharing](https://drive.google.com/drive/folders/1eSjhEyg7w4wZJS39ptEimIi-4H2stT7h?usp=sharing)\n\n##  Installation\n\n```\n\npip install -r requirement.txt\n\n```\n\nWe tested our python codes on the interactive mode of RunAI @ EPFL cluster. Please look through if you are new user of [RunAI](https://github.com/sori424/runLLM).\n\n####  WANDB dataset/model versioning and loading\n\nThis repo is designed to work with wandb for dataset and model versioning, experimental visualization, etc.. Assuming that you have a [**wandb**](https://wandb.ai/home) account you first need to set your *WANDB_API_KEY*\n\n```bash\n\nexport WANDB_API_KEY=XXXXXXXXXXXXXXXX\n\n```\n\nIn the code above you can then specify: `--wandb_entity`, `--wandb_project` (the target project), `--wandb_name` (name of experiment), `--wandb_data` (for automatic loading of data), `--wandb_model` (for automatic loading of models). In **RunAI** wandb can be used by adding `WANDB_API_KEY` to the `env` variables. \n\n##  Quickstart\nTo run the code, simply execute the main bash script:\n```\n\nbash run.sh\n\n```\n\nFor running setup, you can change the configurations below.\n\n```\n\nDATASET=\"socialiqa\"\n\nTASK=\"socialiqa\"\n\nMODEL_TYPE=\"opt\" \u003c-- select from [\"t5\", \"opt\", \"bloom\", \"gpt\"]\n\nMODEL_NAME_OR_PATH=\"facebook/opt-66b\" \u003c-- volume directory with model checkpoints (.bin) or hugginface download ('facebook/opt-66b').\n\nTRAIN_BATCH_SIZE=4   \u003c-- training batch size\n\nPREDICT_BATCH_SIZE=1 \u003c-- prediction batch size\n\nN_GPU=8 \u003c-- number of GPUs to use\n\n```\n\n## In-context Learning\n\nTo run the code for vinalla **In-context Learning**, first modify the running command in `run.sh`:\n```\n\naccelerate launch main.py \\\n\t--do_inference \\\n\t--dataset ${DATASET} \\\n\t--task ${TASK} \\\n\t--model_type ${MODEL_TYPE} \\\n\t--model_name_or_path ${MODEL_NAME_OR_PATH} \\\n\t--predict_batch_size ${PREDICT_BATCH_SIZE} \\\n\t--wandb_name ${MODEL_NAME_OR_PATH}-${DATASET}-icl-4-rand \\\n\t--n_gpu ${N_GPU} \\\n\t--max_data 0 \\\n\t--do_icl \\\t\t\t\u003c-- **Add this flag**\n\t--num_examples 2\t\u003c-- **Number of demonstrations used**\n\t\n```\nThen, execute the script. To use examples pre-selected  by the KNN method, modify the running command:\n```\n\naccelerate launch main.py \\\n\t--do_inference \\\n\t--dataset ${DATASET} \\\n\t--task ${TASK} \\\n\t--model_type ${MODEL_TYPE} \\\n\t--model_name_or_path ${MODEL_NAME_OR_PATH} \\\n\t--predict_batch_size ${PREDICT_BATCH_SIZE} \\\n\t--wandb_name ${MODEL_NAME_OR_PATH}-${DATASET}-icl-4-rand \\\n\t--n_gpu ${N_GPU} \\\n\t--max_data 0 \\\n\t--do_icl \\\t\t\t\n\t--num_examples 2\t\n\t--search \t\t\t\u003c-- **Add this flag**\n\t--encoder simcse\t\u003c-- **Name of the sentence encoder for embedding**\n```\nThen, execute the script.\n\n## KNN Example Selection\n```\n\npython dynamic_icl.py \\\n\t--dataset $DATASET_NAME \\\n\t--task $TASK_NAME \\\n\t--encoder_name simcse \\ \u003c-- nli_mean or simcse\n\t--metric cosine \\\t\u003c-- cosine or euclidean\n\t--num_neighbors 16\n\t\n```\nThe output file will be under the name `$DATA_DIR/$DATASET/train_$ENCODER_NAME.json`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feric11eca%2Fcommon-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feric11eca%2Fcommon-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feric11eca%2Fcommon-bench/lists"}