{"id":13590406,"url":"https://github.com/allenai/OpenBookQA","last_synced_at":"2025-04-08T13:31:03.606Z","repository":{"id":75564571,"uuid":"151313893","full_name":"allenai/OpenBookQA","owner":"allenai","description":"Code for experiments on OpenBookQA from the EMNLP 2018 paper  \"Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering\"","archived":false,"fork":false,"pushed_at":"2021-03-12T22:05:32.000Z","size":132,"stargazers_count":121,"open_issues_count":3,"forks_count":29,"subscribers_count":13,"default_branch":"main","last_synced_at":"2024-11-06T10:45:21.270Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/allenai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-10-02T19:49:12.000Z","updated_at":"2024-09-19T01:42:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"ba8dfd5f-a350-4021-be42-884d8621d4f7","html_url":"https://github.com/allenai/OpenBookQA","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allenai%2FOpenBookQA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allenai%2FOpenBookQA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allenai%2FOpenBookQA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allenai%2FOpenBookQA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/allenai","download_url":"https://codeload.github.com/allenai/OpenBookQA/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247851472,"owners_count":21006764,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:00:44.944Z","updated_at":"2025-04-08T13:31:00.382Z","avatar_url":"https://github.com/allenai.png","language":"Python","readme":"# OpenBookQA Models\n\nThis repository provides code for various baseline models reported in the EMNLP-2018 paper\nintroducing the OpenBookQA dataset:\n[Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering](https://www.semanticscholar.org/paper/24c8adb9895b581c441b97e97d33227730ebfdab)\n\n```bib\n@inproceedings{OpenBookQA2018,\n title={Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering},\n author={Todor Mihaylov and Peter Clark and Tushar Khot and Ashish Sabharwal},\n booktitle={EMNLP},\n year={2018}\n}\n```\n\nPlease visit the [OpenBookQA Leaderboard](https://leaderboard.allenai.org/open_book_qa) for the latest on this challenge!\n\n\n# Setting Up the Environment\n\n1. Create the `obqa` environment using Anaconda\n\n   ```\n   conda create -n obqa python=3.6\n   ```\n\n2. Activate the environment\n\n   ```\n   source activate obqa\n   ```\n\n3. Install the requirements in the environment:\n\n   Note: The script below installs Pytorch 0.4.0 for CUDA 8 only. If you are using a different CUDA version,\n   please visit http://pytorch.org/ and install the relevant version.\n\n   ```\n   bash scripts/install_requirements.sh\n   ```\n\n\n# Downloading and Preparing Data\n\nDownload the OpenBookQA dataset and embeddings using the script below.\nNote that this includes downloading `glove.840B.300d.txt.gz`, a 2GB file\ncontaining 300-dimensional [GloVe word embeddings](https://nlp.stanford.edu/projects/glove)\ntrained on 840B tokens, which can take several minutes.\nIf you already have this file, you might consider altering the script.\n\n ```\n bash scripts/download_and_prepare_data.sh\n ```\n\n# Download Pre-trained models\n\nIf you are interested in using the pre-trained models from the paper,\nyou can download them using the command below.\n\nNote: Some of the models that use ELMo are more than 700MB.\nIf you do not plan to use them or have a slow internet connection,\nyou might want to modify the download script and exclude them from downloading.\n\nNote: The downloaded models are for best performing run on Dev.\n\n ```\n bash scripts/download_trained_models.sh\n ```\n\n\n# Training/Evaluating Neural Baselines for OpenBookQA\n\nIf you use the script below, you might want to first look at\n``scripts/experiments/qa/run_experiment_openbookqa.sh`` and set the\n``EXPERIMENTS_OUTPUT_DIR_BASE`` environment variable to a directory\nwhere you want to save the output of the experiments.\nDefault is ``_experiments``.\n\nNote: If you want to use GPU for the experiments, make sure to change the\n`trainer.cuda_device` setting to the desired CUDA device id. Default is `-1` (no GPU).\nYou can also use `scripts/experiments/qa/run_experiment_openbookqa_gpu.sh` (automatically sets `trainer.cuda_device` to CUDA device `0`)\ninstead of `scripts/experiments/qa/run_experiment_openbookqa.sh` in the experiments commands below.\n\n## 1. Without External Knowledge\n\nTable: Comparison between models with Glove (default) and ELMo.\nThe comparison is mentioned in the text of the paper.\nThe results displayed here are avg accuracy (equivalent to exam score)\nand the Std across 5 runs with different random seeds and the result for the best run on Dev.\n\n| Model                               | Dev (5 runs) | Test (5 runs) | Dev (Best run)| Test  |\n|-------------------------------------|:------------:|:-------------:|:-------------:|:-----:|\n| Question-to-Choice (Question Match) | 54.6±1.2     | 50.2±0.9      | 56.8          | 49.8  |\n| Question-to-Choice + ELMo           | 57.1±1.1     | 50.6±1.2      | 58.4          | 50.0  |\n| ESIM                                | 53.9±0.4     | 48.9±1.1      | 54.4          | 47.4  |\n| ESIM + ELMo                         | 55.5±0.6     | 50.7±0.7      | 56.4          | 49.6  |\n\n### 1.1 Question-to-Choice Model (Question Match)\n\nExperiments with pre-trained [GloVe](https://nlp.stanford.edu/projects/glove) embedding vectors:\n\n*Train a new model*\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/reader_mc_qa_question_to_choice.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n*Evaluate on the pre-trained model*\n```\nMODEL_ARCHIVE=data/trained_models/model_q2ch_best_run.tar.gz\nEVALUATION_DATA_FILE=data/OpenBookQA-V1-Sep2018/Data/Main/test.jsonl\npython obqa/run.py evaluate_predictions_qa_mc \\\n                  --archive_file ${MODEL_ARCHIVE} \\\n                  --evaluation_data_file ${EVALUATION_DATA_FILE} \\\n                  --output_file ${MODEL_ARCHIVE##*/}_pred_${EVALUATION_DATA_FILE##*/}\n```\n\n\nExperiments with [ELMo](https://allennlp.org/elmo) contextual word representations:\n\n*Train a new model*\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/reader_mc_qa_question_to_choice_elmo.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n*Evaluate on the pre-trained model*\n```\nMODEL_ARCHIVE=data/trained_models/model_q2ch_elmo_best_run.tar.gz\nEVALUATION_DATA_FILE=data/OpenBookQA-V1-Sep2018/Data/Main/test.jsonl\npython obqa/run.py evaluate_predictions_qa_mc \\\n                  --archive_file ${MODEL_ARCHIVE} \\\n                  --evaluation_data_file ${EVALUATION_DATA_FILE} \\\n                  --output_file ${MODEL_ARCHIVE##*/}_pred_${EVALUATION_DATA_FILE##*/}\n```\n\n### 1.2 ESIM Model\n\nExperiments with Glove pre-trained embeddings vectors:\n\n*Train a new model*\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/reader_mc_qa_esim.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n*Evaluate on the pre-trained model*\n```\nMODEL_ARCHIVE=data/trained_models/model_esim_best_run.tar.gz\nEVALUATION_DATA_FILE=data/OpenBookQA-V1-Sep2018/Data/Main/test.jsonl\npython obqa/run.py evaluate_predictions_qa_mc \\\n                  --archive_file ${MODEL_ARCHIVE} \\\n                  --evaluation_data_file ${EVALUATION_DATA_FILE} \\\n                  --output_file ${MODEL_ARCHIVE##*/}_pred_${EVALUATION_DATA_FILE##*/}\n```\n\nExperiments with ELMo contextual representations:\n\n*Train a new model*\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/reader_mc_qa_esim_elmo.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n*Evaluate on the pre-trained model*\n```\nMODEL_ARCHIVE=data/trained_models/model_esim_elmo_best_run.tar.gz\nEVALUATION_DATA_FILE=data/OpenBookQA-V1-Sep2018/Data/Main/test.jsonl\npython obqa/run.py evaluate_predictions_qa_mc \\\n                  --archive_file ${MODEL_ARCHIVE} \\\n                  --evaluation_data_file ${EVALUATION_DATA_FILE} \\\n                  --output_file ${MODEL_ARCHIVE##*/}_pred_${EVALUATION_DATA_FILE##*/}\n```\n\n## 2. Knowledge-Enhanced Models\n\n### 2.1. Retrieve external knowledge\n\n#### 2.1.1. Open Book Knowledge (1326 Science facts)\n\nRank OpenBook (Science) knowledge facts for the given question:\n```\nDATA_DIR_ROOT=data/\nKNOWLEDGE_DIR_ROOT=data/knowledge\nOPENBOOKQA_DIR=${DATA_DIR_ROOT}/OpenBookQA-V1-Sep2018\n\nranking_out_dir=${OPENBOOKQA_DIR}/Data/Main/ranked_knowledge/openbook\nmkdir -p ${ranking_out_dir}\ndata_file=${OPENBOOKQA_DIR}/Data/Main/full.jsonl\nknow_file=${KNOWLEDGE_DIR_ROOT}/openbook.csv\n\nPYTHONPATH=. python obqa/data/retrieval/knowledge/rank_knowledge_for_mc_qa.py \\\n                     -o ${ranking_out_dir} -i ${data_file} \\\n                     -k ${know_file} -n tfidf --max_facts_per_choice 100 \\\n                     --limit_items 0\n```\n\n#### 2.1.2. Commonsense Knowledge\n\n##### Open Mind Common Sense part of ConceptNet (cn5omcs)\n\n```\nDATA_DIR_ROOT=data/\nKNOWLEDGE_DIR_ROOT=data/knowledge\nOPENBOOKQA_DIR=${DATA_DIR_ROOT}/OpenBookQA-V1-Sep2018\n\nranking_out_dir=${OPENBOOKQA_DIR}/Data/Main/ranked_knowledge/cn5omcs\nmkdir -p ${ranking_out_dir}\ndata_file=${OPENBOOKQA_DIR}/Data/Main/full.jsonl\nknow_file=${KNOWLEDGE_DIR_ROOT}/CN5/cn5_omcs.json\n\nPYTHONPATH=. python obqa/data/retrieval/knowledge/rank_knowledge_for_mc_qa.py \\\n                     -o ${ranking_out_dir} -i ${data_file} \\\n                     -k ${know_file} -n tfidf --max_facts_per_choice 100 \\\n                     --limit_items 0\n```\n\n\n##### WordNet part of ConceptNet (cn5wordnet)\n\n```\nDATA_DIR_ROOT=data/\nKNOWLEDGE_DIR_ROOT=data/knowledge\nOPENBOOKQA_DIR=${DATA_DIR_ROOT}/OpenBookQA-V1-Sep2018\n\nranking_out_dir=${OPENBOOKQA_DIR}/Data/Main/ranked_knowledge/cn5wordnet\nmkdir -p ${ranking_out_dir}\ndata_file=${OPENBOOKQA_DIR}/Data/Main/full.jsonl\nknow_file=${KNOWLEDGE_DIR_ROOT}/CN5/cn5_wordnet.json\n\nPYTHONPATH=. python obqa/data/retrieval/knowledge/rank_knowledge_for_mc_qa.py \\\n                     -o ${ranking_out_dir} -i ${data_file} \\\n                     -k ${know_file} -n tfidf --max_facts_per_choice 100 \\\n                     --limit_items 0\n```\n\n#### 2.1.3. Retrieve \"Gold\" Fact from the Open Book (Oracle)\n\nNote: This is Oracle knowledge -- a hypothetical setting that *assumes access to\nthe gold science fact*. The goal here is to allow research effort to focus on\nthe sub-challenges of retrieving the missing commonsense knowledge, and reasoning\nwith both facts in order to answer the question. A full model for OpenBookQA should,\nof course, not rely on such Oracle knowledge.\n\n```\nDATA_DIR_ROOT=data/\nKNOWLEDGE_DIR_ROOT=data/knowledge\nOPENBOOKQA_DIR=${DATA_DIR_ROOT}/OpenBookQA-V1-Sep2018\n\nranking_out_dir=${OPENBOOKQA_DIR}/Data/Main/ranked_knowledge/openbook_oracle\nmkdir -p ${ranking_out_dir}\ndata_file=${OPENBOOKQA_DIR}/Data/Main/full.jsonl\nknow_file=${OPENBOOKQA_DIR}/Data/Additional/full_complete.jsonl\n\nPYTHONPATH=. python obqa/data/retrieval/knowledge/rank_knowledge_for_mc_qa.py \\\n                    -o ${ranking_out_dir} -i ${data_file} \\\n                    -k ${know_file} -n tfidf  --max_facts_per_choice 1 \\\n                    --limit_items 0 \\\n                    --knowledge_reader reader_gold_facts_arc_mc_qa_2 \\\n                    --dataset_reader reader_arc_qa_question_choice_facts\n```\n\n### 2.2. Train Knowledge-Enhanced Reader With Above Knowledge\n\nVarious baselines that adapt and train the\n[Knowledge-Enhanced Reader](https://www.semanticscholar.org/paper/21da1c528d055a134f22e0f8a0b4011fe825a5e7)\nmodel from ACL-2018 for the OpenBookQA setting, using various sources of knowledge.\n\n#### 2.2.1. Oracle Setting\n\n* Oracle Open Book fact + Conceptnet OMCS\n(referred to as the `f + ConceptNet` Oracle setup in the paper)\n\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/knowreader_v1_mc_qa_multi_source_oracle_openbook_plus_cn5omcs.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n* Oracle Open Book fact + WordNet\n (referred to as the `f + WordNet` Oracle setup in the paper)\n\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/knowreader_v1_mc_qa_multi_source_oracle_openbook_plus_cn5wordnet.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n#### 2.2.2. Normal (Non-Oracle) Setting\n\nNote: These experiments are **not** reported in the main paper! These are additional\nbaseline models whose Dev and Test scores are listed below for reference.\n\n\nTable: Additional (Non-Oracle) experiments with external knowledge.\nThe results displayed here are avg accuracy (equivalent to exam score)\nand the Std across 5 runs with different random seeds and the result for the best run on Dev.\n\n| Model                     | Dev (5 runs) | Test (5 runs) | Dev (Best run)| Test  |\n|---------------------------|:------------:|:-------------:|:-------------:|:-----:|\n| ConceptNet only (cn5omcs) | 54.0±0.6     | 51.1±2.1      | 54.4          | 52.2  |\n| Wordnet only (cn5wordnet) | 54.9±0.4     | 49.4±1.5      | 55.6          | 51.4  |\n| OpenBook + ConceptNet     | 53.8±1.0     | 51.2±1.1      | 54.6          | 50.8  |\n| OpenBook + Wordnet        | 53.3±0.7     | 50.6±0.6      | 54.2          | 51.2  |\n\nBelow are commands for training new models or evaluating on the pre-trained models from the EMNLP paper.\nNote that even if you *just* evaluate on pre-trained models, you still\nneed to run the knowledge retrieval from [2.1. Retrieve external knowledge](#21-retrieve-external-knowledge).\n\n* Open Mind Common Sense part of ConceptNet only (cn5omcs)\n\n*Train a new model*\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/knowreader_v1_mc_qa_multi_source_cn5omcs.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n*Evaluate on the pre-trained model*\n```\nMODEL_ARCHIVE=data/trained_models/model_kn_conceptnet5_best_run.tar.gz\nEVALUATION_DATA_FILE=data/OpenBookQA-V1-Sep2018/Data/Main/test.jsonl\npython obqa/run.py evaluate_predictions_qa_mc_know_visualize \\\n                  --archive_file ${MODEL_ARCHIVE} \\\n                  --evaluation_data_file ${EVALUATION_DATA_FILE} \\\n                  --output_file ${MODEL_ARCHIVE##*/}_pred_${EVALUATION_DATA_FILE##*/}\n```\n\n* WordNet part of ConceptNet only (cn5wordnet)\n\n*Train a new model*\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/knowreader_v1_mc_qa_multi_source_cn5wordnet.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n*Evaluate on the pre-trained model*\n```\nMODEL_ARCHIVE=data/trained_models/model_kn_wordnet_best_run.tar.gz\nEVALUATION_DATA_FILE=data/OpenBookQA-V1-Sep2018/Data/Main/test.jsonl\npython obqa/run.py evaluate_predictions_qa_mc_know_visualize \\\n                  --archive_file ${MODEL_ARCHIVE} \\\n                  --evaluation_data_file ${EVALUATION_DATA_FILE} \\\n                  --output_file ${MODEL_ARCHIVE##*/}_pred_${EVALUATION_DATA_FILE##*/}\n```\n\n* Open Book + Open Mind Common Sense part of ConceptNet\n(Note: this is **not** the Oracle setup from the paper; instead, science facts from\nthe Open Book are retrieved based on a TF-IDF similarity measure with the question\nand answer choices)\n\n*Train a new model*\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/knowreader_v1_mc_qa_multi_source_openbook_plus_cn5omcs.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n*Evaluate on the pre-trained model*\n```\nMODEL_ARCHIVE=data/trained_models/model_kn_conceptnet5_and_openbook_best_run.tar.gz\nEVALUATION_DATA_FILE=data/OpenBookQA-V1-Sep2018/Data/Main/test.jsonl\npython obqa/run.py evaluate_predictions_qa_mc_know_visualize \\\n                  --archive_file ${MODEL_ARCHIVE} \\\n                  --evaluation_data_file ${EVALUATION_DATA_FILE} \\\n                  --output_file ${MODEL_ARCHIVE##*/}_pred_${EVALUATION_DATA_FILE##*/}\n```\n\n* Open Book + WordNet part of ConceptNet\n(Note: Similar to above, this is **not** the Oracle setup from the paper)\n\n*Train a new model*\n```\nconfig_file=training_config/qa/multi_choice/openbookqa/knowreader_v1_mc_qa_multi_source_openbook_plus_cn5wordnet.json\nbash scripts/experiments/qa/run_experiment_openbookqa.sh ${config_file}\n```\n\n*Evaluate on the pre-trained model*\n```\nMODEL_ARCHIVE=data/trained_models/model_kn_wordnet_and_openbook_best_run.tar.gz\nEVALUATION_DATA_FILE=data/OpenBookQA-V1-Sep2018/Data/Main/test.jsonl\npython obqa/run.py evaluate_predictions_qa_mc_know_visualize \\\n                  --archive_file ${MODEL_ARCHIVE} \\\n                  --evaluation_data_file ${EVALUATION_DATA_FILE} \\\n                  --output_file ${MODEL_ARCHIVE##*/}_pred_${EVALUATION_DATA_FILE##*/}\n```\n\n\n# Appendix\n\n## A. Experiments with SciTail, using BiLSTM max-out model\n\nIf you are also interested in SciTail entailment task\n([Khot et. al 2017](https://www.semanticscholar.org/paper/3ce2e40571fe1f4ac4016426c0606df6824bf619)),\nhere is a simple BiLSTM max-out model that attains an accuracy of\n87% and 85% on the Dev and Test sets, resp.\n(without extensive hyper-parameter tuning).\n\n### A.1 Download Scitail Dataset\n\n```\nbash scripts/download_and_prepare_data_scitail.sh\n```\n\n### A.2 Train the Entailment Model\n\n```\npython obqa/run.py train \\\n    -s _experiments/scitail_bilstm_maxout/ \\\n    training_config/entailment/scitail/stacked_nn_aggregate_custom_bilstm_maxout_scitail.json\n```\n\n\n## B. Experiments with ARC, using Question-to-Choice BiLSTM max-out model\n\nIf you are also interested in the [ARC Challenge](http://data.allenai.org/arc/),\nour Question-to-Choice BiLSTM max-out model obtains an\naccuracy of 33.9% on the Test set (without extensive hyper-parameter tuning).\n\n### Download ARC Dataset\n\n```\nbash scripts/download_and_prepare_data_arc.sh\n```\n\n### Train the QA Model\n\n```\npython obqa/run.py train \\\n    -s _experiments/qa_multi_question_to_choices/ \\\n    training_config/qa/multi_choice/arc/reader_qa_multi_choice_max_att_ARC_Chellenge_full.json\n```\n\n\n# Contact\n\nIf you have any questions or comments about the code, data, or models, please\ncontact Todor Mihaylov, Ashish Sabharwal, Tushar Khot, or Peter Clark.\n\n---\n","funding_links":[],"categories":["Document Question Answering","Anthropomorphic-Taxonomy"],"sub_categories":["English","Typical Intelligence Quotient (IQ)-General Intelligence evaluation benchmarks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallenai%2FOpenBookQA","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fallenai%2FOpenBookQA","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallenai%2FOpenBookQA/lists"}