https://github.com/thunlp/OpenPrompt

An Open-Source Framework for Prompt-Learning.
https://github.com/thunlp/OpenPrompt

ai deep-learning natural-language-processing natural-language-understanding nlp nlp-library nlp-machine-learning pre-trained-language-models pre-trained-model prompt prompt-based-tuning prompt-learning prompt-toolkit prompts pytorch transformer

Last synced: over 1 year ago
JSON representation

An Open-Source Framework for Prompt-Learning.

Host: GitHub
URL: https://github.com/thunlp/OpenPrompt
Owner: thunlp
License: apache-2.0
Created: 2021-09-30T09:38:45.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2024-07-16T03:48:08.000Z (almost 2 years ago)
Last Synced: 2024-10-29T15:03:24.653Z (over 1 year ago)
Topics: ai, deep-learning, natural-language-processing, natural-language-understanding, nlp, nlp-library, nlp-machine-learning, pre-trained-language-models, pre-trained-model, prompt, prompt-based-tuning, prompt-learning, prompt-toolkit, prompts, pytorch, transformer
Language: Python
Homepage: https://thunlp.github.io/OpenPrompt/
Size: 14.4 MB
Stars: 4,348
Watchers: 44
Forks: 449
Open Issues: 96
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

awesome-list - OpenPrompt - An Open-Source Framework for Prompt-Learning. (Natural Language Processing / Conversation & Translation)
StarryDivineSky - thunlp/OpenPrompt - learning 是将预训练语言模型应用于下游NLP任务的最新范式，它使用文本模板修改输入文本并直接使用 PLM 执行预训练任务。该库提供了一个标准、灵活和可扩展的框架来部署即时学习管道。 OpenPrompt支持直接从Huggingface Transformer加载PLM。将来，我们还将支持其他库实现的 PLM。 (预训练模型)
Awesome-Prompt-Engineering - [Github
awesome-prompt-engineering-zh-cn - [Github
awesome-sentiment-attitude-extraction - [github
awesome-opensource-ai - OpenPrompt - Unified framework for prompt-learning with pre-trained language models. Supports templating, verbalizing, and optimization strategies. Winner of ACL 2022 Best Demo Paper Award. Apache 2.0 licensed. (📋 Contents / 🧪 13. Developer Tools & Integrations)

README

          





**An Open-Source Framework for Prompt-learning.**

------



  Overview •

  Installation •

  How To Use •

  Docs •

  Paper •

  Citation •

  Performance •





![version](https://img.shields.io/badge/version-v1.0.1-blue)

## What's New?

 

- ❗️ April 2023: $\color{red}{\normalsize{\textbf{Want to build your Chat AI?}}}$ **We are releasing [UltraChat](https://github.com/thunlp/UltraChat), use OpenPrompt and UltraChat to conduct supervised instruction tuning, see** [`./tutorial/9_UltraChat.py`](./tutorial/9_UltraChat.py).

- Aug 2022: Thanks to contributor [zhiyongLiu1114](https://github.com/zhiyongLiu1114), OpenPrompt now supports [ERNIE 1.0](https://github.com/thunlp/OpenPrompt/tree/main/tutorial/7_ernie_paddlepaddle) in PaddlePaddle.

- July 2022: OpenPrompt supports OPT now.

- June 2022: OpenPrompt wins ACL 2022 Best Demo Paper Award.

- Mar 2022: We add a [tutorial](https://github.com/thunlp/OpenPrompt/blob/main/tutorial/6.1_chinese_dataset_uer_t5.py) as the response to [issue 124](https://github.com/thunlp/OpenPrompt/issues/124), which uses a customized tokenizer_wrapper to perform tasks that are not in the default configuration of OpenPrompt (e.g., Bert tokenizer+T5 model）.

- Feb 2022: Check out our sister repo [OpenDelta](https://github.com/thunlp/OpenDelta)!

- Dec 2021: `pip install openprompt`

- Dec 2021: [SuperGLUE performance](https://github.com/thunlp/OpenPrompt/tree/main/results) are added

- Dec 2021: We support **generation paradigm for all tasks** by adding a new verbalizer:[GenerationVerbalizer](https://github.com/thunlp/OpenPrompt/blob/main/openprompt/prompts/generation_verbalizer.py) and a [tutorial: 4.1_all_tasks_are_generation.py](https://github.com/thunlp/OpenPrompt/blob/main/tutorial/4.1_all_tasks_are_generation.py)

- Nov 2021: Now we have released a paper [OpenPrompt: An Open-source Framework for Prompt-learning](https://arxiv.org/abs/2111.01998).

- Nov 2021 PrefixTuning supports t5 now.

- Nov 2021: We made some major changes from the last version, where a flexible template language is newly introduced! Part of the docs is outdated and we will fix it soon.

## Overview

**Prompt-learning** is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from [huggingface transformers](https://github.com/huggingface/transformers). In the future, we will also support PLMs implemented by other libraries. For more resources about prompt-learning, please check our [paper list](https://github.com/thunlp/PromptPapers).







## What Can You Do via OpenPrompt?

![demo](docs/source/demo.gif)

- Use the implementations of current prompt-learning approaches.* We have implemented various of prompting methods, including templating, verbalizing and optimization strategies under a unified standard. You can easily call and understand these methods.

- *Design your own prompt-learning work.* With the extensibility of OpenPrompt, you can quickly practice your prompt-learning ideas.

## Installation

**Note: Please use Python 3.8+ for OpenPrompt**

### Using Pip

Our repo is tested on Python **3.8+** and PyTorch **1.8.1+**, install OpenPrompt using pip as follows:

```shell

pip install openprompt

```

To play with the latest features, you can also install OpenPrompt from the source.

### Using Git

Clone the repository from github:

```shell

git clone https://github.com/thunlp/OpenPrompt.git

cd OpenPrompt

pip install -r requirements.txt

python setup.py install

```

Modify the code

```

python setup.py develop

```

## Use OpenPrompt

### Base Concepts

A  `PromptModel`  object contains a `PLM`, a (or multiple) `Template`  and a (or multiple) `Verbalizer`, where the `Template` class is defined to wrap the original input with templates, and the `Verbalizer` class is to construct a projection between labels and target words in the current vocabulary. And a `PromptModel`  object practically participates in training and inference.

### Introduction by a Simple Example

With the modularity and flexibility of OpenPrompt, you can easily develop a prompt-learning pipeline.

#### Step 1: Define a task

The first step is to determine the current NLP task, think about what’s your data looks like and what do you want from the data! That is, the essence of this step is to determine the `classes` and the `InputExample` of the task. For simplicity, we use Sentiment Analysis as an example. tutorial_task.

```python

from openprompt.data_utils import InputExample

classes = [ # There are two classes in Sentiment Analysis, one for negative and one for positive

    "negative",

    "positive"

]

dataset = [ # For simplicity, there's only two examples

    # text_a is the input text of the data, some other datasets may have multiple input sentences in one example.

    InputExample(

        guid = 0,

        text_a = "Albert Einstein was one of the greatest intellects of his time.",

    ),

    InputExample(

        guid = 1,

        text_a = "The film was badly made.",

    ),

]

```

#### Step 2: Define a Pre-trained Language Models (PLMs) as backbone.

Choose a PLM to support your task. Different models have different attributes, we encourge you to use OpenPrompt to explore the potential of various PLMs. OpenPrompt is compatible with models on [huggingface](https://huggingface.co/transformers/).

```python

from openprompt.plms import load_plm

plm, tokenizer, model_config, WrapperClass = load_plm("bert", "bert-base-cased")

```

#### Step 3: Define a Template.

A `Template` is a modifier of the original input text, which is also one of the most important modules in prompt-learning. 

We have defined `text_a` in Step 1.

```python

from openprompt.prompts import ManualTemplate

promptTemplate = ManualTemplate(

    text = '{"placeholder":"text_a"} It was {"mask"}',

    tokenizer = tokenizer,

)

```

#### Step 4: Define a Verbalizer

A `Verbalizer` is another important (but not necessary) in prompt-learning,which projects the original labels (we have defined them as `classes`, remember?) to a set of label words. Here is an example that we project the `negative` class to the word bad, and project the `positive` class to the words good, wonderful, great.

```python

from openprompt.prompts import ManualVerbalizer

promptVerbalizer = ManualVerbalizer(

    classes = classes,

    label_words = {

        "negative": ["bad"],

        "positive": ["good", "wonderful", "great"],

    },

    tokenizer = tokenizer,

)

```

#### Step 5: Combine them into a PromptModel

Given the task, now we have a `PLM`, a `Template` and a `Verbalizer`, we combine them into a `PromptModel`. Note that although the example naively combine the three modules, you can actually define some complicated interactions among them.

```python

from openprompt import PromptForClassification

promptModel = PromptForClassification(

    template = promptTemplate,

    plm = plm,

    verbalizer = promptVerbalizer,

)

```

#### Step 6: Define a DataLoader

A ``PromptDataLoader`` is basically a prompt version of pytorch Dataloader, which also includes a ``Tokenizer``, a ``Template`` and a ``TokenizerWrapper``.

```python

from openprompt import PromptDataLoader

data_loader = PromptDataLoader(

    dataset = dataset,

    tokenizer = tokenizer,

    template = promptTemplate,

    tokenizer_wrapper_class=WrapperClass,

)

```

#### Step 7: Train and inference

Done! We can conduct training and inference the same as other processes in Pytorch.

```python

import torch

# making zero-shot inference using pretrained MLM with prompt

promptModel.eval()

with torch.no_grad():

    for batch in data_loader:

        logits = promptModel(batch)

        preds = torch.argmax(logits, dim = -1)

        print(classes[preds])

# predictions would be 1, 0 for classes 'positive', 'negative'

```

Please refer to our [tutorial scripts](https://github.com/thunlp/OpenPrompt/tree/main/tutorial), and [documentation](https://thunlp.github.io/OpenPrompt/) for more details.

## Datasets

We provide a series of download scripts in the `dataset/` folder, feel free to use them to download benchmarks.

## Performance Report

There are too many possible combinations powered by OpenPrompt. We are trying our best

to test the performance of different methods as soon as possible. The performance will be constantly updated into the [Tables](https://github.com/thunlp/OpenPrompt/tree/main/results/).

We also encourage the users to find the best hyper-parameters for their own tasks and report the results by making pull request.

## Known Issues

Major improvement/enhancement in future.

- We made some major changes from the last version, so part of the docs is outdated. We will fix it soon.

## Citation

Please cite our paper if you use OpenPrompt in your work

```bibtex

@article{ding2021openprompt,

  title={OpenPrompt: An Open-source Framework for Prompt-learning},

  author={Ding, Ning and Hu, Shengding and Zhao, Weilin and Chen, Yulin and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong},

  journal={arXiv preprint arXiv:2111.01998},

  year={2021}

}

```

## Contributors



  



We thank all the contributors to this project, more contributors are welcome!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thunlp/OpenPrompt

Awesome Lists containing this project

README