https://github.com/krea-ai/open-prompts

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/krea-ai/open-prompts
Owner: krea-ai
Created: 2022-08-20T15:43:23.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2022-09-22T21:01:49.000Z (over 2 years ago)
Last Synced: 2024-08-02T14:11:48.637Z (7 months ago)
Language: Python
Size: 8.96 MB
Stars: 754
Watchers: 10
Forks: 46
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

open prompts

open prompt knowledge.

explore prompts

newsletter
·
community
·
contribute

# Open Prompts

Open Prompts contains the data we use to build [krea.ai](https://krea.ai). Now, you can get access to this data too.

You can either download a (large) CSV file with image links and meta-data of >10M generations, or access it through [our free API](https://devapi.krea.ai) (still in development).

Everyone is welcome to contribute with their own prompts, and [ideas](https://discord.gg/K8anVEWbQC).

If you want to use this data to implement a semantic search engine with CLIP (like we did), check out [prompt-search](https://github.com/krea-ai/prompt-search).

(back to top)

# About

AI models like Stable Diffusion, DALL-E, or Midjourney, are capable of creating stunning images from text descriptions. They provide us with freedom to produce an image of almost anything we can imagine.

Platforms like Lexica, OpenArt, and Krea.ai let us explore millions of AI generated images—as well as the prompts that produced them. They help you see what words work for generating certain styles and to assess how each AI model interprets different concepts.

We are just starting to explore the possibilities of text-to-image models, and we do not necessarily need to re-train them to dramatically improve their results; we can also learn how to prompt them effectively.

We hope this repository serves anyone who wants to analyze large datasets of prompts, create datasets to train new models, or build tools that help people create better prompts.

(back to top)

# Data

There are three main data sources that you can use.

1. **Prompts API**. We released a (experimental) REST-based API that you can query to find and paginate through prompts—and its generations.

1. **CSV dataset**. This is a large CSV file that contains more than 10 million generations extracted from the Stability AI Discord during the beta testing of Stable Diffusion v1.3.
1. **For now, this dataset only includes URLs that are served from the official Discord CDN**. If you want to download compressed images in `webp` format, use the Prompts API where the data has been parsed and stored in our internal servers.

1. **This repository**. This repository contains a small but structured set of data that we created for the category section of [krea.ai](https://www.krea.ai). Anyone can contribute.

(back to top)

## Prompts API

You can get query data from the dataset using our (experimental) [Prompts API](https://devapi.krea.ai).

(back to top)

## CSV file

To download the dataset click **[this link](https://drive.google.com/file/d/1c4WHxtlzvHYd0UY5WCMJNn2EO-Aiv2A0/view)** or execute the following `wget` command:

```bash

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1c4WHxtlzvHYd0UY5WCMJNn2EO-Aiv2A0' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1c4WHxtlzvHYd0UY5WCMJNn2EO-Aiv2A0" -O openprompts.csv && rm /tmp/cookies.txt

# See: https://stackoverflow.com/a/39087286/10391569
```

### Lite dataset

Since the file is large (>3 GB), you may want to download a “lite” version of it first so you can experiment with the data. You can find the mini-dataset in the `data` subfolder ([the dataset file](./data/1k.csv) is named `1k.csv`).

### Structure of the dataset

The CSV file has a simple and raw structure. There are two columns: `prompt` and `raw_data`.

```python
import csv
import json

from pprint import pprint

with open("dataset.csv") as f:
csv_reader = csv.DictReader(f)
for row_number, row in enumerate(csv_reader):
if row_number > 0:
break
datum = row

pprint(datum['prompt'])
pprint(datum['raw_data'])
```
```python
# OUTPUT
('A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses '
'standing on the grass in front of the Sydney Opera House holding a sign on '
'the chest that says Welcome Friends, subject: kangaroo, subject detail: '
'wearing orange hoodie, wearing blue sunglasses, subject location: sydney '
'opera house, subject action: holding sign')

{'image_uri': 'PENDING',
'modifiers': ['portrait photo',
'kangaroo wearing',
'orange hoodie',
'blue sunglasses standing',
'grass',
'sydney opera house holding',
'sign',
'chest',
'says welcome friends',
'subject kangaroo',
'subject detail wearing orange hoodie',
'wearing blue sunglasses',
'subject location sydney opera house',
'subject action holding sign'],
'raw_discord_data': {'cfg_scale': 15.0,
'content': '',
'content_type': 'image/png',
'height': 512,
'image_proxy_uri': '',
'image_uri': 'https://cdn.discordapp.com/attachments/1005543895024812062/1006343074768769054/A_portrait_photo_of_a_kangaroo_wearing_an_orange_hoodie_and_blue_sunglasses_standing_on_the_grass_in_front_of_the_Sydney_Opera_House_holding_a_sign_-C_15.0_-n_9_-i_-S_556046175_ts-1660001285_idx-4.png',
'is_grid': 0,
'num_generations': 9,
'num_step': 50,
'seed': 556046175,
'timestamp': 1660001285,
'width': 512},
'thumbnail_uri': 'PENDING'}
```

We may publish parsing scripts in the future, but we are focused on building more features for [krea.ai](https://www.krea.ai) for now. If you know Python, we would love to feature your parsing scripts here. To do so, simply [fork the repo and submit a PR](https://github.com/krea-ai/open-prompts/fork).

### How was the dataset made?

This dataset was created using a crawler and one-time use parsing scripts that mixed our own crawled generations with the [dataset](https://github.com/paperdave/stable-diffusion-sqlite) published by [paperdave](https://github.com/paperdave/) (thanks Dave!).

(back to top)

## In-repository dataset

This dataset started as a manual work that we conducted to create the modifiers in [krea.ai](https://www.krea.ai).

It is smaller than the previous dataset, but it is simpler as well. It is just plain-text files that
anyone can edit.

We want the best prompt engineers out there to grow it for the benefit of everyone else. For now the instructions for contributing can be found here, but in the future we will look for a cleaner way to upload prompts to this dataset—ideally including images too!

This dataset differentiates between two different kinds of elements: _modifiers_ and _presets_.

### Modifiers

Modifiers are those parts of a text prompt that contain the stylistic information of it. For example, if we want a prompt to look like a 3D render, we could use `octane render`, `unreal engine`, or `ray tracing` to enhance the style of our generations.

Modifiers can be very variate, from very precise colors and shapes to very abstract concepts and emotions—some people even find it useful to use emojis! The following is a tree representation of how we have organized the modifiers in this project:

```
├── README.md
├── modifiers
│ ├── modifier-category-1
│ │ ├── modifier-subcategory-1.txt
│ │ ├── modifier-subcategory-2.txt
│ │ ├── ...
│ ├── modifier-category-2
│ │ ├── ...
│ ├── ...
...
```

All the modifiers can be found within the folder `modifiers`, and they are organized within sub-categories that at its time belong to a parent category. Each of the subfolders within `modifiers` represents a different category—and the name of each subfolder specifies the name of each category. Sub-categories are represented within `txt` files where their name represent the name of the sub-category, and they contain a different modifier in each row.

The following is an example of how the subcategory `3D` from the category `digital art` could look like:

```
artstation
renderman
octane render
3d render
high quality 3d render
```

Note that each line represents a SINGLE modifier, and that there is nothing else in the file, just modifiers separated by lines.

### Presets

Presets are sets of modifiers that work well when used together and they normally share similarities. Organizing sets of modifiers within presets can come handy for speeding up the creation of prompts. For example, if we know that `greg rutkowski` creates amazing 3D art, we will probably find ourselves combining it all the time with modifiers such as `unreal engine`, `3D`, `artstation` and even with other similar artists like `wlop`.

The following is a tree representation of how we have organized the presets in this project:

```
├── README.md
├── presets
│ ├── preset-author
│ │ ├── preset-title-1.txt
│ │ ├── preset-title-2.txt
│ │ ├── ...
│ ├── preset-author-2
│ │ ├── ...
│ ├── ...
...
```

All the presets can be found within a folder within `presets`. Each of these folders will contain the name of the author that created each preset. Inside these folders, each preset is created in a different `txt` file. Each file contains a different modifier in each row.

The following is an example of how the subcategory `glossy tubes` from the category `krea` could look like:

```
glossy translucent glass with abstract tubular shapes
psychedelic texture
colors range between pastel blue and pastel pink
highly intricate
hyper detailed render
caspar david friedrich
ArtStation HD
```

We found that using all these modifiers combined works particularly well.

(back to top)

# Contributing

[We](https://github.com/krea-ai/open-prompts/fork) [love](https://github.com/krea-ai/open-prompts/fork) [PRs](https://github.com/krea-ai/open-prompts/fork)! If you want to add your own parsing scripts, modifiers to the in-repository dataset—or anything really—simply [fork the repository](https://github.com/krea-ai/open-prompts/fork) and propose changes. We will review them swiftly.

(back to top)

# Create your own CLIP Search Engine with _Open Prompts_

In our [https://github.com/krea-ai/clip-search](clip-search) repository you will find everything you need to create a semantic search engine with CLIP.

(back to top)

# Get in touch

- Follow and DM us on Twitter: [@krea_ai](https://twitter.com/krea_ai)
- Join [our Discord community](https://discord.gg/3mkFbvPYut)
- Email either `v` or `d` (`v` at `krea` dot `ai`; `d` at `krea` dot `ai` respectively)

(back to top)

# Contributors

#### [@blademort](https://twitter.com/blademort)

**art**: `art movements`, `art styles`, and `descriptive terms`

**general**: `design tools and communities`, and `genres`

(back to top)

[contributors-shield]: https://img.shields.io/github/contributors/github_username/repo_name.svg?style=for-the-badge
[contributors-url]: https://github.com/github_username/repo_name/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/github_username/repo_name.svg?style=for-the-badge
[forks-url]: https://github.com/github_username/repo_name/network/members
[stars-shield]: https://img.shields.io/github/stars/github_username/repo_name.svg?style=for-the-badge
[stars-url]: https://github.com/github_username/repo_name/stargazers
[issues-shield]: https://img.shields.io/github/issues/github_username/repo_name.svg?style=for-the-badge
[issues-url]: https://github.com/github_username/repo_name/issues
[license-shield]: https://img.shields.io/github/license/github_username/repo_name.svg?style=for-the-badge
[license-url]: https://github.com/github_username/repo_name/blob/master/LICENSE.txt
[twiter-shield]: https://img.shields.io/badge/-Twitter-black.svg?style=for-the-badge&logo=twitter&colorB=111
[linkedin-url]: https://linkedin.com/in/linkedin_username
[product-screenshot]: static/screenshot.png
[next.js]: https://img.shields.io/badge/next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white
[next-url]: https://nextjs.org/
[react.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB
[react-url]: https://reactjs.org/
[vue.js]: https://img.shields.io/badge/Vue.js-35495E?style=for-the-badge&logo=vuedotjs&logoColor=4FC08D
[vue-url]: https://vuejs.org/
[angular.io]: https://img.shields.io/badge/Angular-DD0031?style=for-the-badge&logo=angular&logoColor=white
[angular-url]: https://angular.io/
[svelte.dev]: https://img.shields.io/badge/Svelte-4A4A55?style=for-the-badge&logo=svelte&logoColor=FF3E00
[svelte-url]: https://svelte.dev/
[laravel.com]: https://img.shields.io/badge/Laravel-FF2D20?style=for-the-badge&logo=laravel&logoColor=white
[laravel-url]: https://laravel.com
[bootstrap.com]: https://img.shields.io/badge/Bootstrap-563D7C?style=for-the-badge&logo=bootstrap&logoColor=white
[bootstrap-url]: https://getbootstrap.com
[jquery.com]: https://img.shields.io/badge/jQuery-0769AD?style=for-the-badge&logo=jquery&logoColor=white
[jquery-url]: https://jquery.com

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/krea-ai/open-prompts

Awesome Lists containing this project

README

open prompts