{"id":21474535,"url":"https://github.com/kristiyanvachev/leaf-question-generation","last_synced_at":"2025-07-15T08:32:40.982Z","repository":{"id":55432871,"uuid":"413578825","full_name":"KristiyanVachev/Leaf-Question-Generation","owner":"KristiyanVachev","description":"Easy to use and understand multiple-choice question generation algorithm using T5 Transformers.","archived":false,"fork":false,"pushed_at":"2022-03-07T06:45:58.000Z","size":189,"stargazers_count":92,"open_issues_count":3,"forks_count":18,"subscribers_count":4,"default_branch":"main","last_synced_at":"2023-08-06T17:52:49.785Z","etag":null,"topics":["ai","distractors","mcq","ml","multiple-choice","neural-networks","nlp","question-generation","quiz","sense2vec","t5","test","transformers"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KristiyanVachev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-04T20:48:52.000Z","updated_at":"2023-07-10T17:37:08.000Z","dependencies_parsed_at":"2022-08-15T00:10:49.478Z","dependency_job_id":null,"html_url":"https://github.com/KristiyanVachev/Leaf-Question-Generation","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KristiyanVachev%2FLeaf-Question-Generation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KristiyanVachev%2FLeaf-Question-Generation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KristiyanVachev%2FLeaf-Question-Generation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KristiyanVachev%2FLeaf-Question-Generation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KristiyanVachev","download_url":"https://codeload.github.com/KristiyanVachev/Leaf-Question-Generation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226028494,"owners_count":17562267,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","distractors","mcq","ml","multiple-choice","neural-networks","nlp","question-generation","quiz","sense2vec","t5","test","transformers"],"created_at":"2024-11-23T10:23:38.922Z","updated_at":"2024-11-23T10:23:39.418Z","avatar_url":"https://github.com/KristiyanVachev.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Leaf: Multiple-Choice Question Generation\n\nEasy to use and understand multiple-choice question generation algorithm using  [T5 Transformers](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html).  The application accepts a short passage of text and uses two fine-tuned T5 Transformer models to first generate multiple **question-answer pairs** corresponding to the given text, after which it uses them to generate ***distractors***  -  additional options used to confuse the test taker.\n\n\n\n![question generation process](https://i.ibb.co/fQwPZZv/qg-process.jpg \"question generation process\")\n\nOriginally inspired by a Bachelor's machine learning course ([github link](https://github.com/KristiyanVachev/Question-Generation)) and then continued as a topic for my Master's thesis at Sofia University, Bulgaria. \n\n## ECIR 2022 Demonstration paper\nThis work has been accepted as a demo paper for the [ECIR 2022 conference.](https://ecir2022.org/) \n\n**Video demonstration:** [here](https://www.youtube.com/watch?v=tpxl-UnfmQc)\n\n**Live demo:** *coming soon*\n\n**Paper:** [here](https://arxiv.org/abs/2201.09012)\n\n*Abstract:*\nTesting with quiz questions has proven to be an effective strategy for better educational processes. However, manually creating quizzes is a tedious and time-consuming task.  To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for classroom settings, Leaf could be also used in an industrial setup, e.g., to facilitate onboarding and knowledge sharing, or as a component of chatbots, question answering systems, or Massive Open Online Courses (MOOCs).\n\n## Generating question and answer pairs\nTo generate the question-answer pairs we have fine-tuned a T5 transformer model from [huggingface](https://huggingface.co/transformers/model_doc/t5.html) on the [SQuAD1.1. dataset](https://rajpurkar.github.io/SQuAD-explorer/) which is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles.\n\nThe model accepts the target answer and context as input:\n\n    'answer' + '\u003csep\u003e + 'context' \n\nand outputs a question that answers the given answer for the corresponding text. \n\n    'answer' + '\u003csep\u003e + 'question' \n\n\nTo allow us to generate question-answer pairs without providing a target answer, we have trained the algorithm to do so when in place of the target answer the '[MASK]' token is passed. \n\n    '[MASK]' + '\u003csep\u003e + 'context' \n\nThe full training script can be found in the `training` directory or accessed directly in [Google Colab](https://colab.research.google.com/drive/15GAaD-33jw81sugeBFj_Bp9GkbE_N6E1?usp=sharing). \n\n\n## Generating incorrect options  (distractors) \nTo generate the distractors, another [T5 transformer model](https://huggingface.co/transformers/model_doc/t5.html)   has been fine-tuned. This time using the [RACE dataset](https://huggingface.co/datasets/race) which consists of more than 28,000 passages and nearly 100,000 questions. The dataset is collected from English examinations in China, which are designed for middle school and high school students.\n\nThe model accepts the target answer, question and context as input:\n\n    'answer' + '\u003csep\u003e + 'question' + 'context' \n\nand outputs 3 distractors separated by the `'\u003csep\u003e'` token.\n\n    'distractor1' + '\u003csep\u003e + 'distractor2' + '\u003csep\u003e 'distractor3' \n\n\nThe full training script can be found in the `training` directory or accessed directly in [Google Colab](https://colab.research.google.com/drive/1kWZviQVx1BbelWp0rwZX7H3GIPS7_ZrP?usp=sharing). \n\nTo extend the variety of distractors with simple words that are not so closely related to the context, we have also used [sense2vec](https://pypi.org/project/sense2vec/) word embeddings in the cases where the T5 model does not good enough distractors. \n\n\n## Web application\nTo demonstrate the algorithm, a simple Angular web application has been created. It accepts the given paragraph along with the desired number of questions and outputs each generated question with the ability to redact them (shown below). The algorithm is exposing a simple REST API using *flask* which is consumed by the web app.\n\n\n![question generation process](https://i.ibb.co/WFJjCgH/1-edited-fullscreen.png \"Web application \")\n\nThe code for the web application is located in a separated repository [here](https://github.com/KristiyanVachev/QGT-FrontEnd). \n\n\n\n\n\n## Installation guide\n\n### Creating a virtual environment *(optional)*\nTo avoid any conflicts with python packages from other projects, it is a good practice to create a [virtual environment](https://docs.python.org/3/library/venv.html) in which the packages will be installed. If you do not want to this you can skip the next commands and directly install the the requirements.txt file. \n\nCreate a virtual environment :\n\n    python -m venv venv\n\nEnter the virtual environment:\n\n*Windows:*\n\n    . .\\venv\\Scripts\\activate\n\n*Linux or MacOS*\n\n    source .\\venv\\Scripts\\activate\n\n### Installing packages\n\n    pip install -r .\\requirements.txt \n\n### Downloading data\n\n#### Question-answer model\nDownload the [multitask-qg-ag model](https://drive.google.com/file/d/1-vqF9olcYOT1hk4HgNSYEdRORq-OD5CF/view?usp=sharing) checkpoint and place it in the  `app/ml_models/question_generation/models/` directory.\n\n#### Distractor generation \nDownload the [race-distractors model](https://drive.google.com/file/d/1jKdcbc_cPkOnjhDoX4jMjljMkboF-5Jv/view?usp=sharing) checkpoint and place it in the  `app/ml_models/distractor_generation/models/` directory.\n\nDownload [sense2vec](https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2015_md.tar.gz), extract it and place the `s2v_old`  folder  and place it in the `app/ml_models/sense2vec_distractor_generation/models/` directory.\n\n## Training on your own\nThe training scripts are available in the `training` directory.  You can download the notebooks directly from there or open the  [Question-Answer Generation](https://colab.research.google.com/drive/15GAaD-33jw81sugeBFj_Bp9GkbE_N6E1?usp=sharing) and [Distractor Generation](https://colab.research.google.com/drive/1kWZviQVx1BbelWp0rwZX7H3GIPS7_ZrP?usp=sharing) in Google Colab. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkristiyanvachev%2Fleaf-question-generation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkristiyanvachev%2Fleaf-question-generation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkristiyanvachev%2Fleaf-question-generation/lists"}