Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/joeornstein/promptr

Format and Complete Few-Shot LLM Prompts
https://github.com/joeornstein/promptr

Last synced: 23 days ago
JSON representation

Format and Complete Few-Shot LLM Prompts

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = FALSE,
message = FALSE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# promptr

We developed the `promptr` package so that researchers could easily format and submit LLM prompts using the R programming language. It provides a handful of convenient functions to query the OpenAI API and return the output as a tidy R dataframe. The package is intended to be particularly useful for social scientists using LLMs for text classification and scaling tasks.

## Installation

You can install the release version of

```{r, eval = FALSE}
install.packages('promptr')
```

You can install the development version of `promptr` from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("joeornstein/promptr")
```

You will also need a developer account with OpenAI and an API key. For best performance, you may also want to provide credit card information (this significantly boosts your API rate limit, even if you’re not spending money).

Once your account is created, copy-paste your API key into the following line of R code.

```
library(promptr)

openai_api_key('YOUR API KEY GOES HERE', install = TRUE)
```

Now you're all set up!

## Completing Prompts

The workhorse function of the `promptr` package is `complete_prompt()`. This function submits a prompt to the OpenAI API and returns a dataframe with the five most likely next word predictions and their associated probabilities.

```{r}
library(promptr)

complete_prompt('I feel like a')
```

If you prefer the model to autoregressively generate text instead of outputting the next-word probabilities, you can set the `max_tokens` input greater than 1. The function will return a character object with the most likely completion.

```{r}
complete_prompt('I feel like a', max_tokens = 18)
```

Note that by default, the `temperature` input is set to 0, which means the model will always return the most likely completion for your prompt. Increasing temperature allows the model to randomly select words from its estimated probability distribution (see the API reference for more on these parameters).

You can also change which model variant the function calls using the `model` input. By default, it is set to "gpt-3.5-turbo-instruct", the RLHF variant of GPT-3.5. For the base GPT-3 variants, try "davinci-002" (175 billion parameters) or "babbage-002" (1.3 billion parameters).

## Formatting Prompts

Manually typing prompts with multiple few-shot examples can be tedious and error-prone, particularly if you want to include context-specific instructions or few-shot examples. We include the `format_prompt()` function to aid in that process.

The function is designed with classification problems in mind. If you input the text you would like to classify along with a set of instructions, the default prompt template looks like this:

```{r}
prompt <- format_prompt(text = 'I feel positively morose today.',
instructions = 'Decide whether this statment is happy or sad.')
prompt
```

You can customize the template using `glue` syntax, with placeholders for {text} and {label}.

```{r}
format_prompt(text = 'I feel positively morose today.',
instructions = 'Decide whether this statment is happy or sad.',
template = 'Statement: {text}\nSentiment: {label}')
```

This function is particularly useful when including few-shot examples in the prompt. If you input these examples as a tidy dataframe, the `format_prompt()` function will paste them into the prompt according to the template. The `examples` dataframe must have at least two columns, one called "text" and the other called "label".

```{r}
examples <- data.frame(
text = c('What a pleasant day!',
'Oh bother.',
'Merry Christmas!',
':-('),
label = c('happy', 'sad', 'happy', 'sad')
)

examples

prompt <- format_prompt(text = 'I feel positively morose today.',
instructions = 'Decide whether this statment is happy or sad.',
examples = examples,
template = 'Statement: {text}\nSentiment: {label}')

prompt
```

Once you're satisfied with the format of the prompt, you can submit it with `complete_prompt()`:

```{r}
complete_prompt(prompt)
```

The full pipeline---first formatting the text into a prompt, then submitting the prompt for completion---looks like this:

```{r}
'What a joyous day for our adversaries.' |>
format_prompt(instructions = 'Classify this text as happy or sad.',
examples = examples) |>
complete_prompt()
```

The biggest advantage of using text prompts like these is **efficiency**. One can request up to 2,048 next-word probability distributions in a single API call, whereas ChatGPT prompts (see next section) can only be submitted one at a time. Both the `format_prompt()` function and the `complete_prompt()` function are vectorized so that users can submit multiple texts to be classified simultaneously.

```{r}
texts <- c('What a wonderful world??? As if!', 'Things are looking up.', 'Me gusta mi vida.')

texts |>
format_prompt(instructions = 'Classify these texts as happy or sad.',
examples = examples) |>
complete_prompt()
```

## Example: Supreme Court Tweets

To illustrate the entire workflow, let's classify the sentiment of social media posts from the Supreme Court Tweets dataset included in the package.

```{r}
data(scotus_tweets) # the full dataset
data(scotus_tweets_examples) # a dataframe with few-shot examples
```

Let's focus on tweets posted following the *Masterpiece Cakeshop v Colorado* (2018) decision, formatting the prompts with a set of instructions and few-shot examples tailored to that context.

```{r}
library(tidyverse)

masterpiece_tweets <- scotus_tweets |>
filter(case == 'masterpiece')

instructions <- 'Read these tweets posted the day after the US Supreme Court ruled in favor of a baker who refused to bake a wedding cake for a same-sex couple (Masterpiece Cakeshop, 2018). For each tweet, decide whether its sentiment is Positive, Neutral, or Negative.'

masterpiece_examples <- scotus_tweets_examples |>
filter(case == 'masterpiece')

masterpiece_tweets$prompt <- format_prompt(text = masterpiece_tweets$text,
instructions = instructions,
examples = masterpiece_examples)

masterpiece_tweets$prompt[3]
```

Then we can submit this list of prompts using `complete_prompt()`:

```{r, echo=TRUE, eval=FALSE}
masterpiece_tweets$out <- complete_prompt(masterpiece_tweets$prompt)
```

```{r, echo = FALSE, eval = TRUE}
load('data-raw/masterpiece_tweets.RData')
```

The estimated probability distribution for each completion is now a list of dataframes in the `out` column. We can compute a simple sentiment score by taking the estimated probability each tweet is Positive minus the estimated probability the tweet is Negative:

```{r}
masterpiece_tweets$score <- masterpiece_tweets$out |>
lapply(mutate, token = str_to_lower(token)) |>
lapply(summarize,
positive = sum(probability[token=='positive']),
negative = sum(probability[token=='negative'])) |>
lapply(summarize,score=positive-negative) |>
unlist()
```

Finally, let's compare those scores from GPT-3.5 with the authors' hand-coded sentiment scores (-1 for Negative, 0 for Neutral, and +1 for Positive).

```{r}
ggplot(data = masterpiece_tweets,
mapping = aes(
x = (expert1 + expert2 + expert3) / 3,
y = score
)) +
geom_jitter(width = 0.1) +
labs(x = 'Hand-Coded Sentiment',
y = 'GPT-3.5 Sentiment Score') +
theme_bw()
```

## Chat Completions

The most recent OpenAI language models---including ChatGPT and GPT-4---have been fine-tuned to function as "chat" models, and interacting with them through the API requires a slightly different format for the inputs. Instead of a single text prompt, few-shot prompts are expressed in the form of a "dialogue" between the user and the model, which we can represent in `R` as a "list of lists".

```{r}
prompt <- list(
list(role = 'user',
content = 'Hello can you help me with a homework problem?'),
list(role = 'assistant',
content = 'Sure thing! What is the problem?'),
list(role = 'user',
content = 'I need to explain why Frederick the Great was so fond of potatoes?')
)
```

Users can submit a chat prompt to the API using the `complete_chat()` function. The default model is "gpt-3.5-turbo" (the most cost-effective chat model offered through the API as of February 2024).

```{r}
complete_chat(prompt, max_tokens = 300)
```

The `format_chat()` function allows users to create a chat prompt using the same syntax as `format_prompt()`.

```{r}
tweet <- masterpiece_tweets$text[4]
cat(tweet)

prompt <- format_chat(tweet,
instructions = 'Read these tweets posted the day after the US Supreme Court ruled in favor of a baker who refused to bake a wedding cake for a same-sex couple (Masterpiece Cakeshop, 2018). For each tweet, decide whether its sentiment is Positive, Neutral, or Negative.',
examples = masterpiece_examples)

prompt
```

One advantage of these chat models is that they typically do not require as many few-shot examples to perform well, but their big practical disadvantage is that we can only submit one chat to the API at a time.

```{r}
response <- complete_chat(prompt)
response
```

```{r, echo=FALSE, eval=FALSE}
'Rate the sentiment of this tweet on a scale from 0 to 100, where 0 means "Extremely Negative" and 100 means "Extremely Positive".'
```