{"id":13456173,"url":"https://github.com/sharonzhou/long_stable_diffusion","last_synced_at":"2026-05-01T01:06:29.793Z","repository":{"id":58734455,"uuid":"532661125","full_name":"sharonzhou/long_stable_diffusion","owner":"sharonzhou","description":"Long-form text-to-images generation, using a pipeline of deep generative models (GPT-3 and Stable Diffusion)","archived":false,"fork":false,"pushed_at":"2022-10-30T04:16:56.000Z","size":61,"stargazers_count":687,"open_issues_count":2,"forks_count":53,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-03-24T05:48:43.514Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sharonzhou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-09-04T21:20:28.000Z","updated_at":"2025-02-27T22:39:57.000Z","dependencies_parsed_at":"2022-09-07T14:22:54.040Z","dependency_job_id":null,"html_url":"https://github.com/sharonzhou/long_stable_diffusion","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sharonzhou%2Flong_stable_diffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sharonzhou%2Flong_stable_diffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sharonzhou%2Flong_stable_diffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sharonzhou%2Flong_stable_diffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sharonzhou","download_url":"https://codeload.github.com/sharonzhou/long_stable_diffusion/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245243321,"owners_count":20583607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T08:01:17.255Z","updated_at":"2026-05-01T01:06:28.014Z","avatar_url":"https://github.com/sharonzhou.png","language":"Python","funding_links":[],"categories":["Python","NLP"],"sub_categories":[],"readme":"## Long Stable Diffusion: Long-form text to images\ne.g. story -\u003e Stable Diffusion -\u003e illustrations\n\nRight now, Stable Diffusion can only take in a short prompt. What if you want to illustrate a full story? Cue Long Stable Diffusion, a pipeline of generative models to do just that with just a bash script!\n\n### Come at me with an example?\nYep! We just published [Never Hire a Herd of Goats to Mow your Lawn](https://storiesby.ai/p/never-hire-a-herd-of-goats-to-mow), an AI-generated story illustrated by this repo.\n\n![Goat illustrations](https://user-images.githubusercontent.com/2941408/188747682-a751e2be-554e-4d05-ac08-a557d04b221a.png)\n\n### Steps\n1. Start with long-form text that you want accompanying images for, e.g. a story to illustrate.\n2. Ask GPT-3 for several illustration ideas for beginning, middle, end, via the OpenAI API.\n3. \"Translate\" the ideas from English to \"prompt-English\", e.g. add suffixes like `trending on art station` for better results.\n4. The \"prompt-English\" prompts are put through Stable Diffusion to generate the images.\n5. All the images and prompts are dumped into a `.docx`, for easy copy-pasting.\n\n### Purpose\nI made this to automate my self, ie. prompt AI for illustrations to accompany AI-generated stories, for the [Stories by AI](https://storiesby.ai/) podcast. Come check us out! And please suggest ways to improve—comments and pull requests are always welcome :)\n\nThis was also just a weekend hackathon project to reward myself for doing a lot of work the past couple of months, and for feeling guilty about not using my wonderful and beautiful Titan RTXs to their full potential.\n\n## Run\nThis bash script runs what you need. It assumes 2 GPUs with 24GB memory each. See the note above, under Steps, to change this assumption for your compute needs. I had too much fun with multiprocessing and making it faster.\n\n`bash run.sh -f three_little_pigs`\n![threelittlepigs](https://user-images.githubusercontent.com/2941408/188760072-9765b085-1763-466e-8944-d4b9ecbb755b.png)\n\nTo run your own text, replace `three_little_pigs` with the name of your new `.txt` file, put in the `texts/` folder.\n\n`bash run.sh -f \u003cname_of_txtfile_in_texts_dir\u003e`\n\n#### What you need before you run it like that\n- Install the requirements\n- Make sure you set your OpenAI API key, e.g. in terminal `export OPENAI_TOKEN=\u003cyour_token\u003e`\n- Make sure you have run 'huggingface-cli login' with a valid token\n- Make sure you have access to https://huggingface.co/CompVis/stable-diffusion-v1-4\n- For using extracts method, you need to install nltk and run nltk.download('punkt') in a python shell\n- Then, put your favorite story or article in a `.txt` file in the `texts/` folder\n\n#### Method Selection\nCurrently two methods for generating the image prompts from text are supported.\n- \"sections\": Inputs the entire text to GPT-3 and tells it to generate images for the start, middle, or end of the text.\n- \"extracts\": Splits the text from the `.txt` file into smaller chronological bits of text, and then generates an image prompt for each bit of text.\n\nAdditional methods yet to be implemented are following:\n- \"summary\": Generates a summary from the `.txt.` file, then prompts GPT-3 to generate image prompts from the summary.\n- \"summary+extracts\": A combation of \"summary\" and \"extracts\" method, where both summary and the extract are fed into GPT-3 to generate image prompts.\n\n#### Output Selection\nCurrently one type of output is supported\n- \"docx\": A word file with the images and prompts.\n\nAdditional output formats yet to be implemented are:\n- \"txt\": Just a text file with the image prompts (does not run stable diffusion).\n- \"images\": Just image PNG files with their title being the prompt.\n- \"html\": A self contained HTML page with the original text and suggested images\n- \"markdown\": A markdown file with the original text and image embeds\n- \"latex\": A latex file with the original text and '\u003cfigure\u003e' components for the images\n- \"pdf\": A self contained PDF documents with the original text and images, compiled from latex\n\n\n### Files and folders\n- `run_two_gpus.sh`: This is the main entry script into the program to parallelize across GPUs easily.\n- `run.py`: Where most of the magic happens: getting image prompts from GPT-3, making images from those prompts (using stable diffusion, multithreading), saving all those and also dumping those images and prompts to a docx file. This is what `run_two_gpus.sh` calls.\n- `stable_diffusion.py`: Just runs stable diffusion if you want to use it by itself (I do). `run.py` calls it.\n- `dump_docx.py`: Just dumps image prompts and images into a single docx for a particular text. Again, it's useful if you want to use it by itself on the saved images and prompts. I do, because I'm actually overwriting the file when multiprocessing and sometimes will just use this as a postprocessing step. Yes, you can join those and change that but I don't really care, since sometimes my GPUs misbehave and I'll need to rerun it anyways.\n\n- `texts/`: Folder to put your texts in, as a `.txt` file.\n- `image_prompts/`: Generated image prompts by GPT-3 based on your text.\n- `images`: Generated images by Stable Diffusion based on GPT-3's image prompts.\n- `docx/`: Microsoft Word document for a text with images and their prompts all in one.\n\n- `clean_lexica.py`: Preprocessing step for Stable Diffusion prompts from Lexica - clean up the prompts and put them into a single file.\n- `effective_prompts_fs.txt`: Effective \"prompt-English\" to use for few-shot translation from English GPT-3 prompts to prompt-English (1884 tokens).\n\n#### Multi-processing Multi-GPU Note\nMulti-processing is optimized for 2 Titan RTXs, with 24GB RAM each. Changing the number of GPUs to parallelize on is a simple edit in `run_two_gpus.sh`: just copy the first line and change CUDA_VISIBLE_DEVICES to the appropriate GPU id.\n\nChanging the number of processes for each GPU is an argument that can be passed in through `run_two_gpus.sh` as `-n \u003cnum_processes_per_gpu\u003e` for each run. This is an int used in `run.py`. I've found that my GPUs can handle 3, but are happier with 2.\n\n### Complete\n- [x] Pipeline of asking GPT3 for image prompts\n- [x] Image prompts to stable diffusion\n- [x] Multiprocessing to max out a single GPU\n- [x] GPU multiprocessing stable diffusion\n- [x] Docx dump of images and image prompts\n- [x] Translation layer between English prompt and \"prompt English\" (lexica)\n- [x] Flesh out readme\n- [x] Open source\n\n### Todo\n- [ ] Walkthrough video of code\n- [ ] Replace stable_diffusion.py with txt2img.py from CompViz stable-diffusion repo\n- [ ] Support for configuring image generation (based on txt2img.py)\n- [ ] Support for different content types (fiction/blog post/essay/news article)\n- [ ] 'summary+extract' method\n- [ ] output to txt\n- [ ] output to markdown\n- [ ] output to markdown\n- [ ] output to html\n- [ ] output to latex\n- [ ] output to pdf\n- [ ] refactor from a sequence of script to a python library\n\n### Future\n- [ ] Translation from English to 'prompt English' can be improved with: finetuned model with several million data samples (instead of 36)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsharonzhou%2Flong_stable_diffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsharonzhou%2Flong_stable_diffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsharonzhou%2Flong_stable_diffusion/lists"}