{"id":13425975,"url":"https://github.com/victordibia/neuralqa","last_synced_at":"2025-04-05T22:04:24.691Z","repository":{"id":39024027,"uuid":"265138409","full_name":"victordibia/neuralqa","owner":"victordibia","description":"NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT","archived":false,"fork":false,"pushed_at":"2023-06-07T20:12:03.000Z","size":31784,"stargazers_count":231,"open_issues_count":43,"forks_count":31,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-29T21:03:52.782Z","etag":null,"topics":["bert-model","deep-learning","elastic-search","information-retrieval","natural-language-processing"],"latest_commit_sha":null,"homepage":"https://victordibia.github.io/neuralqa/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/victordibia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-19T03:55:56.000Z","updated_at":"2024-10-03T13:02:36.000Z","dependencies_parsed_at":"2024-11-11T17:33:23.434Z","dependency_job_id":"682f3a56-7cd0-4089-924a-471d8f0fece0","html_url":"https://github.com/victordibia/neuralqa","commit_stats":{"total_commits":272,"total_committers":4,"mean_commits":68.0,"dds":"0.018382352941176516","last_synced_commit":"fb48f4d45d5856195baef25b4707e7b282cc364d"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Fneuralqa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Fneuralqa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Fneuralqa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Fneuralqa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/victordibia","download_url":"https://codeload.github.com/victordibia/neuralqa/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247406085,"owners_count":20933803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert-model","deep-learning","elastic-search","information-retrieval","natural-language-processing"],"created_at":"2024-07-31T00:01:23.324Z","updated_at":"2025-04-05T22:04:24.671Z","avatar_url":"https://github.com/victordibia.png","language":"JavaScript","funding_links":[],"categories":["JavaScript","文本数据和NLP"],"sub_categories":[],"readme":"## NeuralQA: A Usable Library for (Extractive) Question Answering on Large Datasets with BERT\n\n[![License: MIT](https://img.shields.io/github/license/victordibia/neuralqa)](https://opensource.org/licenses/MIT)\n![docs](https://github.com/victordibia/neuralqa/workflows/docs/badge.svg?style=flat-square)\n\n\u003e Still in **alpha**, lots of changes anticipated. View demo on [neuralqa.fastforwardlabs.com](https://neuralqa.fastforwardlabs.com/#/).\n\n\u003cimg width=\"100%\" src=\"https://raw.githubusercontent.com/victordibia/neuralqa/master/docs/images/manual.jpg\"\u003e\n\n`NeuralQA` provides an easy to use api and visual interface for Extractive Question Answering (QA),\non large datasets. The QA process is comprised of two main stages - **Passage retrieval (Retriever)** is implemented using [ElasticSearch](https://www.elastic.co/downloads/elasticsearch)\nand **Document Reading (Reader)** is implemented using pretrained BERT models via the\nHuggingface [Transformers](https://github.com/huggingface/transformers) api.\n\n## Usage\n\n```shell\npip3 install neuralqa\n```\n\nCreate (or navigate to) a folder you would like to use with NeuralQA. Run the following command line instruction within that folder.\n\n```shell\nneuralqa ui --port 4000\n```\n\nnavigate to [http://localhost:4000/#/](http://localhost:4000/#/) to view the NeuralQA interface. Learn about other command line options in the documentation [here](https://victordibia.github.io/neuralqa/usage.html#command-line-options) or how to [configure](https://victordibia.github.io/neuralqa/configuration.html) NeuralQA to use your own reader models or retriever instances.\n\n\u003e Note: To use NeuralQA with a retriever such as ElasticSearch, follow the [instructions here](https://www.elastic.co/downloads/elasticsearch) to download, install, and launch a local elasticsearch instance and add it to your config.yaml file.\n\n### How Does it Work?\n\n\u003cimg width=\"100%\" src=\"https://raw.githubusercontent.com/victordibia/neuralqa/master/docs/images/architecture.png\"\u003e\n\nNeuralQA is comprised of several high level modules:\n\n- **Retriever**: For each search query (question), scan an index (elasticsearch), and retrieve a list of candidate matched passages.\n\n- **Reader**: For each retrieved passage, a BERT based model predicts a span that contains the answer to the question. In practice, retrieved passages may be lengthy and BERT based models can process a maximum of 512 tokens at a time. NeuralQA handles this in two ways. Lengthy passages are chunked into smaller sections with a configurable stride. Secondly, NeuralQA offers the option of extracting a subset of relevant snippets (RelSnip) which a BERT reader can then scan to find answers. Relevant snippets are portions of the retrieved document that contain exact match results for the search query.\n\n- **Expander**: Methods for generating additional (relevant) query terms to improve recall. Currently, we implement Contextual Query Expansion using finetuned Masked Language Models. This is implemented via a user in the loop flow where the user can choose to include any suggested expansion terms.\n\n\u003cimg width=\"100%\" src=\"https://raw.githubusercontent.com/victordibia/neuralqa/master/docs/images/expand.jpg\"\u003e\n\n- **User Interface**: NeuralQA provides a visual user interface for performing queries (manual queries where question and context are provided as well as queries over a search index), viewing results and also sensemaking of results (reranking of passages based on answer scores, highlighting keyword match, model explanations).\n\n## Configuration\n\nProperties of modules within NeuralQA (ui, retriever, reader, expander) can be specified via a [yaml configuration](neuralqa/config_default.yaml) file. When you launch the ui, you can specify the path to your config file `--config-path`. If this is not provided, NeuralQA will search for a config.yaml in the current folder or create a [default copy](neuralqa/config_default.yaml)) in the current folder. Sample configuration shown below:\n\n```yaml\nui:\n  queryview:\n    intro:\n      title: \"NeuralQA: Question Answering on Large Datasets\"\n      subtitle: \"Subtitle of your choice\"\n    views: # select sections of the ui to hide or show\n      intro: True\n      advanced: True\n      samples: False\n      passages: True\n      explanations: True\n      allanswers: True\n    options: # values for advanced options\n      stride: ..\n      maxpassages: ..\n      highlightspan: ..\n\n  header: # header tile for ui\n    appname: NeuralQA\n    appdescription: Question Answering on Large Datasets\n\nreader:\n  title: Reader\n  selected: twmkn9/distilbert-base-uncased-squad2\n  options:\n    - name: DistilBERT SQUAD2\n      value: twmkn9/distilbert-base-uncased-squad2\n      type: distilbert\n    - name: BERT SQUAD2\n      value: deepset/bert-base-cased-squad2\n      type: bert\n```\n\n## Documentation\n\nAn attempt is being made to better document NeuralQA here - [https://victordibia.github.io/neuralqa/](https://victordibia.github.io/neuralqa/).\n\n## Citation\n\nA paper introducing NeuralQA and its components can be [found here](https://arxiv.org/abs/2007.15211).\n\n```\n@article{dibia2020neuralqa,\n    title={NeuralQA: A Usable Library for Question Answering (Contextual Query Expansion + BERT) on Large Datasets},\n    author={Victor Dibia},\n    year={2020},\n    journal={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictordibia%2Fneuralqa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvictordibia%2Fneuralqa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictordibia%2Fneuralqa/lists"}