{"id":13754281,"url":"https://github.com/THUDM/CogQA","last_synced_at":"2025-05-09T22:31:43.609Z","repository":{"id":35851244,"uuid":"189153000","full_name":"THUDM/CogQA","owner":"THUDM","description":"Source code and dataset for ACL 2019 paper \"Cognitive Graph for Multi-Hop Reading Comprehension at Scale\"","archived":false,"fork":false,"pushed_at":"2023-03-31T14:41:37.000Z","size":36682,"stargazers_count":457,"open_issues_count":13,"forks_count":82,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-04-05T08:08:30.027Z","etag":null,"topics":["bert","graph-neural-networks","question-answering"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/THUDM.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-05-29T04:54:51.000Z","updated_at":"2025-02-16T09:53:43.000Z","dependencies_parsed_at":"2022-09-18T01:43:05.078Z","dependency_job_id":"ce0e2c80-903b-4068-8fc1-cff813519efb","html_url":"https://github.com/THUDM/CogQA","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FCogQA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FCogQA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FCogQA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FCogQA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/THUDM","download_url":"https://codeload.github.com/THUDM/CogQA/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335731,"owners_count":21892720,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","graph-neural-networks","question-answering"],"created_at":"2024-08-03T09:01:53.054Z","updated_at":"2025-05-09T22:31:38.586Z","avatar_url":"https://github.com/THUDM.png","language":"Python","funding_links":[],"categories":["知识图谱问答KBQA、多跳推理"],"sub_categories":["其他_文本生成、文本对话"],"readme":"# CogQA\n\n### [Project](https://sites.google.com/view/cognitivegraph/) | [arXiv](https://arxiv.org/abs/1905.05460)\n\nSource codes for the paper **Cognitive Graph for Multi-Hop Reading Comprehension at Scale.**  *(ACL 2019 Oral)* \n\nWe also have a [Chinese blog](https://zhuanlan.zhihu.com/p/72981392) about CogQA on Zhihu (知乎) besides the [paper](https://arxiv.org/abs/1905.05460).\n\n## Introduction\n\nCogQA is a novel framework for multi-hop question answering in **web-scale** documents. Founded on the dual process theory in cognitive science, CogQA gradually builds a *cognitive graph* in an iterative process by coordinating an implicit extraction module (System 1) and an explicit reasoning module (System 2). While giving accurate answers, our framework further provides **explainable** reasoning paths. \n\n## Preprocess\n\n1. Download and setup Redis database following https://redis.io/download\n2. Download the dataset, evalute script and fullwiki data (enwiki-20171001-pages-meta-current-withlinks-abstracts) from https://hotpotqa.github.io. Unzip `improved_retrieval.zip` in this repo.\n3. ``pip install -r requirements.txt``\n4. Run ``python read_fullwiki.py`` to load wikipedia documents to redis (check the size of `dump.rdb` in the redis folder is about 2.4GB).\n5. Run ``python process_train.py`` to generate `hotpot_train_v1.1_refined.json`, which contains edges in gold-only cognitive graphs.\n6. ``mkdir models``\n\n## Training\n\nThe codes automatic assign tasks on all available devices, each handling `batch_size / num_gpu` samples. We recommend that each gpu has at least 11GB memory to hold 2 batch.\n\n1. Run `python train.py` to train Task #1(span extraction).\n2. Run `python train.py --load=True --mode='bundle'` to train Task #2(answer prediction).\n\n## Evaluation\n\nThe `cogqa.py` is the algorithm to answer questions with a trained model. We split the 1-hop nodes found by another similar model into `improved_retrieval.zip` for reuse in other algorithm. It  can **directly** improve your result on fullwiki setting by just replacing the original input.\n\n1. unzip  ` improved_retrieval.zip`.\n\n2. `python cogqa.py --data_file='hotpot_dev_fullwiki_v1_merge.json'`\n3. `python hotpot_evaluate_v1.py hotpot_dev_fullwiki_v1_merge_pred.json hotpot_dev_fullwiki_v1_merge.json` \n4. You can check the cognitive graph (reasoning process) in the `cg` part of the predicted json file.\n\n## Notes\n\n1. The changes of this version from the preview version is mainly about **detailed comments**.\n2. The relatively sensetive hyperparameters includes the number of  negative samples, top K, learning rate of task #2, scale factors between different parts...\n3. If our work is useful to you, please cite our paper or star 🌟  our repo~~\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTHUDM%2FCogQA","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTHUDM%2FCogQA","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTHUDM%2FCogQA/lists"}