{"id":13788200,"url":"https://github.com/rbbrdckybk/ai-art-generator","last_synced_at":"2025-04-10T06:50:23.892Z","repository":{"id":37785708,"uuid":"440603375","full_name":"rbbrdckybk/ai-art-generator","owner":"rbbrdckybk","description":"For automating the creation of large batches of AI-generated artwork locally.","archived":false,"fork":false,"pushed_at":"2023-03-30T18:42:46.000Z","size":13708,"stargazers_count":634,"open_issues_count":8,"forks_count":127,"subscribers_count":13,"default_branch":"main","last_synced_at":"2024-11-18T02:37:00.873Z","etag":null,"topics":["clip-guided-diffusion","deep-learning","generative-art","image-generation","machine-learning","stable-diffusion","vqgan-clip"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rbbrdckybk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-12-21T17:45:32.000Z","updated_at":"2024-10-07T19:41:32.000Z","dependencies_parsed_at":"2024-01-07T03:51:56.289Z","dependency_job_id":"57efbf12-e85b-4487-af24-6f6bbb696296","html_url":"https://github.com/rbbrdckybk/ai-art-generator","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rbbrdckybk%2Fai-art-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rbbrdckybk%2Fai-art-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rbbrdckybk%2Fai-art-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rbbrdckybk%2Fai-art-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rbbrdckybk","download_url":"https://codeload.github.com/rbbrdckybk/ai-art-generator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248173852,"owners_count":21059595,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clip-guided-diffusion","deep-learning","generative-art","image-generation","machine-learning","stable-diffusion","vqgan-clip"],"created_at":"2024-08-03T21:00:38.952Z","updated_at":"2025-04-10T06:50:23.868Z","avatar_url":"https://github.com/rbbrdckybk.png","language":"Python","funding_links":[],"categories":["Training","Python"],"sub_categories":["Task Chaining"],"readme":"# 2022-09-28 Update:\nJust a note that I've launched [Dream Factory](https://github.com/rbbrdckybk/dream-factory), a significant upgrade to this. It's got an (optional) GUI, true simultaneous multi-GPU support, an integrated gallery with full EXIF metadata support, and many other new [features](https://github.com/rbbrdckybk/dream-factory#features). \n\nI dropped VQGAN and Disco Diffusion support to focus on Stable Diffusion, so if you want VQGAN and/or Disco Diffusion you should stick with this for now. Otherwise I encourage everyone to migrate to Dream Factory! I'll continue to patch bug fixes on this repo but I likely won't be adding new features going foward.\n\n# AI Art Generator\nFor automating the creation of large batches of AI-generated artwork locally. Put your GPU(s) to work cranking out AI-generated artwork 24/7 with the ability to automate large prompt queues combining user-selected subjects, styles/artists, and more! More info on which models are available after the sample pics.  \nSome example images that I've created via this process (these are cherry-picked and sharpened):  \n\u003cimg src=\"/samples/sample01.jpg\" width=\"367\" height=\"220\" alt=\"sample image 1\" title=\"sample image 1\"\u003e\n\u003cimg src=\"/samples/sample02.jpg\" width=\"220\" height=\"220\" alt=\"sample image 2\" title=\"sample image 2\"\u003e\n\u003cimg src=\"/samples/sample03.jpg\" width=\"220\" height=\"220\" alt=\"sample image 3\" title=\"sample image 3\"\u003e\n\u003cimg src=\"/samples/sample04.jpg\" width=\"220\" height=\"220\" alt=\"sample image 4\" title=\"sample image 4\"\u003e\n\u003cimg src=\"/samples/sample05.jpg\" width=\"220\" height=\"220\" alt=\"sample image 5\" title=\"sample image 5\"\u003e\n\u003cimg src=\"/samples/sample06.jpg\" width=\"367\" height=\"220\" alt=\"sample image 6\" title=\"sample image 6\"\u003e  \nNote that I did not create or train the models used in this project, nor was I involved in the original coding. I've simply modified the original colab versions so they'll run locally and added some support for automation.\nModels currently supported, with links to their original implementations:\n * [Stable Diffusion](https://github.com/CompVis/stable-diffusion)\n * CLIP-guided Diffusion (via [Disco Diffusion](https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb) adapted to run locally)\n * [VQGAN+CLIP](https://colab.research.google.com/github/justinjohn0306/VQGAN-CLIP/blob/main/VQGAN%2BCLIP(Updated).ipynb)\n\n# Requirements\n\nYou'll need an Nvidia GPU, preferably with a decent amount of VRAM. 12GB of VRAM is sufficient for 512x512 output images depending on model and settings, and 8GB should be enough for 384x384 (8GB should be considered a reasonable minimum!). To generate 1024x1024 images, you'll need ~24GB of VRAM or more. Generating small images and then upscaling via [ESRGAN](https://github.com/xinntao/Real-ESRGAN) or some other package provides very good results as well.\n\nIt should be possible to run on an AMD GPU, but you'll need to be on Linux to install the ROCm version of Pytorch. I don't have an AMD GPU to throw into a Linux machine so I haven't tested this myself.\n\n# Setup\n\nThese instructions were tested on a Windows 10 desktop with an Nvidia 3080 Ti GPU (12GB VRAM), and also on an Ubuntu Server 20.04.3 system with an old Nvidia Tesla M40 GPU (24GB VRAM).\n\n**[1]** Install [Anaconda](https://www.anaconda.com/products/individual), open the root terminal, and create a new environment (and activate it):\n```\nconda create --name ai-art python=3.9\nconda activate ai-art\n```\n\n**[2]** Install Pytorch:\n```\nconda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch\n```\nNote that you can customize your Pytorch installation by using [the online tool located here](https://pytorch.org/get-started/locally/).\n\n**[3]** Install other required Python packages:\n```\nconda install -c anaconda git urllib3\npip install transformers keyboard pillow ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops torch_optimizer\n```\n\n**[4]** Clone this repository and switch to its directory:\n```\ngit clone https://github.com/rbbrdckybk/ai-art-generator\ncd ai-art-generator\n```\nNote that Linux users may need single quotes around the URL in the clone command.\n\n**[5]** Clone additional required repositories:\n```\ngit clone https://github.com/openai/CLIP\ngit clone https://github.com/CompVis/taming-transformers\n```\n\n**[6]** Download the default VQGAN pre-trained model checkpoint files:\n```\nmkdir checkpoints\ncurl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - \"https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fconfigs%2Fmodel.yaml\u0026dl=1\"\ncurl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - \"https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fckpts%2Flast.ckpt\u0026dl=1\"\n```\nNote that Linux users should replace the double quotes in the curl commands with single quotes.\n\n**[7]** (Optional) Download additional pre-trained models:  \nAdditional models are not necessary, but provide you with more options. [Here is a good list of available pre-trained models](https://github.com/CompVis/taming-transformers#overview-of-pretrained-models).  \nFor example, if you also wanted the FFHQ model (trained on faces): \n```\ncurl -L -o checkpoints/ffhq.yaml -C - \"https://app.koofr.net/content/links/0fc005bf-3dca-4079-9d40-cdf38d42cd7a/files/get/2021-04-23T18-19-01-project.yaml?path=%2F2021-04-23T18-19-01_ffhq_transformer%2Fconfigs%2F2021-04-23T18-19-01-project.yaml\u0026force\"\ncurl -L -o checkpoints/ffhq.ckpt -C - \"https://app.koofr.net/content/links/0fc005bf-3dca-4079-9d40-cdf38d42cd7a/files/get/last.ckpt?path=%2F2021-04-23T18-19-01_ffhq_transformer%2Fcheckpoints%2Flast.ckpt\"\n```\n\n**[8]** (Optional) Test VQGAN+CLIP:  \n```\npython vqgan.py -s 128 128 -i 200 -p \"a red apple\" -o output/output.png\n```\nYou should see output.png created in the output directory, which should loosely resemble an apple.\n\n**[9]** Install packages for CLIP-guided diffusion (if you're only interested in VQGAN+CLIP, you can skip everything from here to the end): \n```\npip install ipywidgets omegaconf torch-fidelity einops wandb opencv-python matplotlib lpips datetime timm\nconda install pandas\n```\n\n**[10]** Clone repositories for CLIP-guided diffusion:\n```\ngit clone https://github.com/crowsonkb/guided-diffusion\ngit clone https://github.com/assafshocher/ResizeRight\ngit clone https://github.com/CompVis/latent-diffusion\n```\n\n**[11]** Download models needed for CLIP-guided diffusion:\n```\nmkdir content\\models\ncurl -L -o content/models/256x256_diffusion_uncond.pt -C - \"https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt\"\ncurl -L -o content/models/512x512_diffusion_uncond_finetune_008100.pt -C - \"http://batbot.tv/ai/models/guided-diffusion/512x512_diffusion_uncond_finetune_008100.pt\"\ncurl -L -o content/models/secondary_model_imagenet_2.pth -C - \"https://ipfs.pollinations.ai/ipfs/bafybeibaawhhk7fhyhvmm7x24zwwkeuocuizbqbcg5nqx64jq42j75rdiy/secondary_model_imagenet_2.pth\"\nmkdir content\\models\\superres\ncurl -L -o content/models/superres/project.yaml -C - \"https://heibox.uni-heidelberg.de/f/31a76b13ea27482981b4/?dl=1\"\ncurl -L -o content/models/superres/last.ckpt -C - \"https://heibox.uni-heidelberg.de/f/578df07c8fc04ffbadf3/?dl=1\"\n```\nNote that Linux users should again replace the double quotes in the curl commands with single quotes, and replace the **mkdir** backslashes with forward slashes.\n\n**[12]** (Optional) Test CLIP-guided diffusion:  \n```\npython diffusion.py -s 128 128 -i 200 -p \"a red apple\" -o output.png\n```\nYou should see output.png created in the output directory, which should loosely resemble an apple.\n\n**[13]** Clone Stable Diffusion repository (if you're not interested in SD, you can skip everything from here to the end):\n```\ngit clone https://github.com/rbbrdckybk/stable-diffusion\n```\n\n**[14]** Install additional dependancies required by Stable Diffusion:\n```\npip install diffusers\n```\n\n**[15]** Download the Stable Diffusion pre-trained checkpoint file:\n```\nmkdir stable-diffusion\\models\\ldm\\stable-diffusion-v1\ncurl -L -o stable-diffusion/models/ldm/stable-diffusion-v1/model.ckpt -C - \"https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt\"\n```\n**If the curl command doesn't download the checkpoint, it's gated behind a login.** You'll need to register [here](https://huggingface.co/CompVis) (only requires email and name) and then you can download the checkpoint file [here](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt).  \nAfter downloading, you'll need to place the .ckpt file in the directory created above and name it **model.ckpt**.  \n\n**[16]** (Optional) Test Stable Diffusion:  \nThe easiest way to test SD is to create a simple prompt file with **!PROCESS = stablediff** and a single subject. See *example-prompts.txt* and the next section for more information. Assuming you create a simple prompt file called *test.txt* first, you can test by running:\n```\npython make_art.py test.txt\n```\nImages should be saved to the **output** directory if successful (organized into subdirectories named for the date and prompt file).\n\n**[17]** Setup ESRGAN/GFPGAN (if you're not planning to upscale images, you can skip this and everything else):\n```\ngit clone https://github.com/xinntao/Real-ESRGAN\npip install basicsr facexlib gfpgan\ncd Real-ESRGAN\ncurl -L -o experiments/pretrained_models/RealESRGAN_x4plus.pth -C - \"https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth\"\npython setup.py develop\ncd ..\n```\n  \nYou're done!\n  \nIf you're getting errors outside of insufficient GPU VRAM while running and haven't updated your installation in awhile, try updating some of the more important packages, for example:\n```\npip install transformers -U\n```\n\n# Usage\n\nEssentially, you just need to create a text file containing the subjects and styles you want to use to generate images. If you have 5 subjects and 20 styles in your prompt file, then a total of 100 output images will be created (20 style images for each subject).\n\nTake a look at **example-prompts.txt** to see how prompt files should look. You can ignore everything except the [subjects] and [styles] areas for now. Lines beginning with a '#' are comments and will be ignored, and lines beginning with a '!' are settings directives and are explained in the next section. For now, just modify the example subjects and styles with whatever you'd like to use.\n\nAfter you've populated **example-prompts.txt** to your liking, you can simply run:\n```\npython make_art.py example-prompts.txt\n```\nDepending on your hardware and settings, each image will take anywhere from a few seconds to a few hours (on older hardware) to create. If you can run Stable Diffusion, I strongly recommend it for the best results - both in speed and image quality.\n\nOutput images are created in the **output/[current date]-[prompt file name]/** directory by default. The output directory will contain a JPG file for each image named for the subject \u0026 style used to create it. So for example, if you have \"a monkey on a motorcycle\" as one of your subjects, and \"by Picasso\" as a style, the output image will be created as output/[current date]-[prompt file name]/a-monkey-on-a-motorcycle-by-picasso.jpg (filenames will vary a bit depending on process used).\n\nYou can press **CTRL+SHIFT+P** any time to pause execution (the pause will take effect when the current image is finished rendering). Press **CTRL+SHIFT+P** again to unpause. Useful if you're running this on your primary computer and need to use your GPU for something else for awhile. You can also press **CTRL+SHIFT+R** to reload the prompt file if you've changed it (the current work queue will be discarded, and a new one will be built from the contents of your prompt file). **Note that keyboard input only works on Windows.**\n\nThe settings used to create each image are saved as metadata in each output JPG file by default. You can read the metadata info back by using any EXIF utility, or by simply right-clicking the image file in Windows Explorer and selecting \"properties\", then clicking the \"details\" pane. The \"comments\" field holds the command used to create the image.\n\n# Advanced Usage\n\nDirectives can be included in your prompt file to modify settings for all prompts that follow it. These settings directives are specified by putting them on their own line inside of the [subject] area of the prompt file, in the following format:  \n\n**![setting to change] = [new value]**  \n\nFor **[setting to change]**, valid directives are:  \n * PROCESS\n * CUDA_DEVICE\n * WIDTH\n * HEIGHT\n * ITERATIONS (vqgan/diffusion only)\n * CUTS (vqgan/diffusion only)\n * INPUT_IMAGE\n * SEED\n * LEARNING_RATE (vqgan only)\n * TRANSFORMER (vqgan only)\n * OPTIMISER (vqgan only)\n * CLIP_MODEL (vqgan only)\n * D_VITB16, D_VITB32, D_RN101, D_RN50, D_RN50x4, D_RN50x16 (diffusion only)\n * STEPS (stablediff only)\n * CHANNELS (stablediff only)\n * SAMPLES (stablediff only)\n * STRENGTH (stablediff only)\n * SD_LOW_MEMORY (stablediff only)\n * USE_UPSCALE (stablediff only)\n * UPSCALE_AMOUNT (stablediff only)\n * UPSCALE_FACE_ENH (stablediff only)\n * UPSCALE_KEEP_ORG (stablediff only)\n * REPEAT\n\nSome examples: \n```\n!PROCESS = vqgan\n```\nThis will set the current AI image-generation process. Valid options are **vqgan** for VQGAN+CLIP, **diffusion** for CLIP-guided diffusion (Disco Diffusion), or **stablediff** for Stable Diffusion.\n```\n!CUDA_DEVICE = 0\n```\nThis will force GPU 0 be to used (the default). Useful if you have multiple GPUs - you can run multiple instances, each with it's own prompt file specifying a unique GPU ID.\n```\n!WIDTH = 384\n!HEIGHT = 384\n```\nThis will set the output image size to 384x384. A larger output size requires more GPU VRAM. Note that for Stable Diffusion these values should be multiples of 64.\n```\n!TRANSFORMER = ffhq\n```\nThis will tell VQGAN to use the FFHQ transformer (somewhat better at faces), instead of the default (vqgan_imagenet_f16_16384). You can follow step 7 in the setup instructions above to get the ffhq transformer, along with a link to several others.\n\nWhatever you specify here MUST exist in the checkpoints directory as a .ckpt and .yaml file.\n```\n!INPUT_IMAGE = samples/face-input.jpg\n```\nThis will use samples/face-input.jpg (or whatever image you specify) as the starting image, instead of the default random noise. Input images must be the same aspect ratio as your output images for good results. Note that when using with Stable Diffusion the output image size will be the same as your input image (your height/width settings will be ignored).\n```\n!SEED = 42\n```\nThis will use 42 as the input seed value, instead of a random number (the default). Useful for reproducibility - when all other parameters are identical, using the same seed value should produce an identical image across multiple runs. Set to nothing or -1 to reset to using a random value.\n```\n!INPUT_IMAGE = \n```\nSetting any of these values to nothing will return it to its default. So in this example, no starting image will be used.\n```\n!STEPS = 50\n```\nSets the number of steps (simliar to iterations) when using Stable Diffusion to 50 (the default). Higher values take more time and may improve image quality. Values over 100 rarely produce noticeable differences compared to lower values.\n```\n!SCALE = 7.5\n```\nSets the guidance scale when using Stable Diffusion to 7.5 (the default). Higher values (to a point, beyond ~25 results may be strange) will cause the the output to more closely adhere to your prompt.\n```\n!SAMPLES = 1\n```\nSets the number of times to sample when using Stable Diffusion to 1 (the default). Values over 1 will cause multiple output images to be created for each prompt at a slight time savings per image. There is no cost in GPU VRAM required for incrementing this.\n```\n!STRENGTH = 0.75\n```\nSets the influence of the starting image to 0.75 (the default). Only relevant when using Stable Diffusion with an input image. Valid values are between 0-1, with 1 corresponding to complete destruction of the input image, and 0 corresponding to leaving the starting image completely intact. Values between 0.25 and 0.75 tend to give interesting results.\n```\n!SD_LOW_MEMORY = no\n```\nUse a forked repo with much lower GPU memory requirements when using Stable Diffusion (yes/no)? Setting this to **yes** will switch over to using a memory-optimized version of SD that will allow you to create higher resolution images with far less GPU memory (512x512 images should only require around 4GB of VRAM). The trade-off is that inference is **much** slower compared to the default official repo. For comparison: on a RTX 3060, a 512x512 image at default settings takes around 12 seconds to create; with *!SD_LOW_MEMORY = yes*, the same image takes over a minute. Recommend keeping this off unless you have under 8GB GPU VRAM, or want to experiment with creating larger images before upscaling.\n```\n!USE_UPSCALE = no\n```\nAutomatically upscale images created with Stable Diffusion (yes/no)? Uses ESRGAN/GFPGAN (see additional settings below).\n```\n!UPSCALE_AMOUNT = 2\n```\nHow much to scale when *!USE_UPSCALE = yes*. Default is 2.0x; higher values require more VRAM and time.\n```\n!UPSCALE_FACE_ENH = no\n```\nWhether or not to use GFPGAN (vs default ESRGAN) when upscaling. GFPGAN provides the best results with faces, but may provide slightly worse results if used on non-face subjects.\n```\n!UPSCALE_KEEP_ORG = no\n```\nKeep the original unmodified image when upscaling (yes/no)? If set to no (the default), the original image will be deleted. If set to yes, the original image will be saved in an **/original** subdirectory of the image output folder.\n```\n!REPEAT = no\n```\nWhen all jobs in the prompt file are finished, restart back at the top of the file (yes/no)? Default is no, which will simply terminate execution when all jobs are complete.\n\nTODO: finish settings examples \u0026 add usage tips/examples, document random_art.py\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frbbrdckybk%2Fai-art-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frbbrdckybk%2Fai-art-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frbbrdckybk%2Fai-art-generator/lists"}