{"id":20699631,"url":"https://github.com/hyeonsangjeon/aisketcher","last_synced_at":"2025-04-22T22:19:36.944Z","repository":{"id":167172412,"uuid":"642722592","full_name":"hyeonsangjeon/AIsketcher","owner":"hyeonsangjeon","description":"Text-to-image generation using Huggingface stable diffusion ControlNet conditioning and AWS Translate's prompt translation function","archived":false,"fork":false,"pushed_at":"2023-08-25T01:35:22.000Z","size":992,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-22T22:19:36.000Z","etag":null,"topics":["aws","boto3","canny-edge-detection","controlnet","controlnetmodel","huggingface","huggingface-transformers","opencv-python","pndmscheduler","stable-diffusion","stablediffusioncontrolnetpipeline","translate"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hyeonsangjeon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-19T07:48:07.000Z","updated_at":"2024-01-16T00:08:57.000Z","dependencies_parsed_at":"2024-11-17T00:31:22.213Z","dependency_job_id":"1762f9c5-ab64-4eda-bb1d-99488a6bb094","html_url":"https://github.com/hyeonsangjeon/AIsketcher","commit_stats":null,"previous_names":["hyeonsangjeon/aisketcher"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyeonsangjeon%2FAIsketcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyeonsangjeon%2FAIsketcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyeonsangjeon%2FAIsketcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyeonsangjeon%2FAIsketcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hyeonsangjeon","download_url":"https://codeload.github.com/hyeonsangjeon/AIsketcher/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250331863,"owners_count":21413113,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","boto3","canny-edge-detection","controlnet","controlnetmodel","huggingface","huggingface-transformers","opencv-python","pndmscheduler","stable-diffusion","stablediffusioncontrolnetpipeline","translate"],"created_at":"2024-11-17T00:31:09.740Z","updated_at":"2025-04-22T22:19:36.932Z","avatar_url":"https://github.com/hyeonsangjeon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square)](https://raw.githubusercontent.com/hyeonsangjeon/youtube-dl-nas/master/LICENSE)\n[![Downloads](https://static.pepy.tech/badge/AIsketcher)](https://pepy.tech/project/AIsketcher)\n[![PyPI version](https://badge.fury.io/py/AIsketcher.svg)](https://pypi.org/project/AIsketcher/)\n\n# AIsketcher\n\n- Stable Diffusion model : Lykon/DreamShaper[1] \n- Text-to-Image Generation with ControlNet Conditioning : used Canny edge detection [2][3]\n- prompt translator english to korean : Amazon Translate [4]\n\nText-to-image generation using Huggingface stable diffusion ControlNet conditioning and AWS Translate's prompt translation function\n\n![screenshot1](https://github.com/hyeonsangjeon/AIsketcher/blob/main/pic/yahunjeon.png?raw=true)\n![screenshot2](https://github.com/hyeonsangjeon/AIsketcher/blob/main/pic/seowonjeon.png?raw=true)\n\n## Project Description\nThis function takes two inputs: an image and a prompt text, utilizing the power of multi-modal models.\nIn this project, I used Stable Diffusion, where prompts were written in English. However, for users who predominantly use other languages, it can be challenging to express the details of their input sentences. Therefore, we utilize user's language for the input prompt, and the corresponding text is machine-translated to English using Amazon Translate before being fed into the model.\n\nPrerequisite: Load the ControlNetModel and StableDiffusionModel into the StableDiffusionControlNet Pipeline and prepare the PNDMScheduler.\n```python\ncontrolnet_model = \"lllyasviel/sd-controlnet-canny\"\nsd_model = \"Lykon/DreamShaper\"\n\ncontrolnet = ControlNetModel.from_pretrained(\n    controlnet_model,\n    torch_dtype=torch.float16\n)\n\npipe = StableDiffusionControlNetPipeline.from_pretrained(\n    sd_model,\n    controlnet=controlnet,\n    torch_dtype=torch.float16\n)\n\npipe.scheduler = PNDMScheduler.from_config(pipe.scheduler.config)\npipe.enable_model_cpu_offload()\n```\n\n## Function Workflow\n\n1. Resize the input image to 800x800.\n2. Extract the edges, which are the key features of the input image, using the Canny function.\n3. If the input sentence contains the Amazon Translate dictionary (trans_info) variable, translate the sentence to English.\n4. Feed the translated prompt and the extracted edge image into the StableDiffusionControlNet Pipeline to generate a new image.\n5. Resize the output image back to the original size of the input image and display it.\n\nThis workflow allows for the generation of new images based on input images and prompts, with the option of translating the prompts to English for non-English input sentences.\n\n### Usage\n\n```bash\npip install AIsketcher\n```\n\n\n#### case1. English Prompt \n\n```python\nimport AIsketcher\nfrom PIL import Image\nimport numpy as np\nfile_name = 'hello.jpg'\n\ninput_text = 'Cute, (hungry), plump, sitting at a table by the beach, warm feeling, beautiful shining eyes, seascape'\n\nnum_steps = 50\nguidance_scale = 17\nseed =6764547109648557242 \nlow = 140\nhigh = 160\n\nimage, canny_image, out_image = AIsketcher.img2img(file_name,  input_text,  num_steps, guidance_scale, seed, low, high, pipe)\nImage.fromarray(np.concatenate([image.resize(out_image.size), out_image], axis=1))\n```\n\n#### case2. Korean Prompt without IAM AccessRole\n\n```python\nimport AIsketcher\nfrom PIL import Image\nimport numpy as np\nfile_name = 'hello.jpg'\ninput_text = '귀여운, (배가고픈), 포동포동한, 해변가 식탁에 앉은, 따뜻한 느낌, 아름답고 빛나는 눈, 바다풍경'\n\ntrans_info = {\n            'region_name' : 'us-east-1', #user region\n            'aws_access_key_id' : '{{YOUR_ACCESS_KEY}}',\n            'aws_secret_access_key' : '{{YOUR_SECRET_KEY}}',\n            'SourceLanguageCode' : 'ko',\n            'TargetLanguageCode' : 'en',\n            'iam_access' : False\n        }\n\nnum_steps = 50\nguidance_scale = 17\nseed =6764547109648557242 \nlow = 140\nhigh = 160\n\nimage, canny_image, out_image = AIsketcher.img2img(file_name,  input_text,  num_steps, guidance_scale, seed, low, high, pipe, trans_info)\n```\n\n#### case3. Korean Prompt with IAM AccessRole between SageMaker and Translate\n```python\nimport AIsketcher\nfrom PIL import Image\nimport numpy as np\nfile_name = 'hello.jpg'\ninput_text = '귀여운, (배가고픈), 포동포동한, 해변가 식탁에 앉은, 따뜻한 느낌, 아름답고 빛나는 눈, 바다풍경'\n\ntrans_info = {\n            'region_name' : 'us-east-1', #user region\n            'SourceLanguageCode' : 'ko',\n            'TargetLanguageCode' : 'en',\n            'iam_access' : True\n        }\n\nnum_steps = 50\nguidance_scale = 17\nseed =6764547109648557242 \nlow = 140\nhigh = 160\n\nimage, canny_image, out_image = AIsketcher.img2img(file_name,  input_text,  num_steps, guidance_scale, seed, low, high, pipe, trans_info)\n```\n\n\n### Default Parameters Used\ndefault_prompt\n```text\n(8k, best quality, masterpiece:1.2), (realistic, photo-realistic:1.37), ultra-detailed,\n```\nnegative_prompt\n```text\nNSFW, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))\n```\n\n| Variables      | Description                                                                                                     |\n|----------------|-----------------------------------------------------------------------------------------------------------------|\n| num_steps      | Number of steps to run the diffusion process for                                                                |  \n| guidance_scale | Creativity value adjustment, a parameter that controls how much the image generation process follows the text prompt | \n| seed           | a number used to initialize the generation in the stable diffusion model                                        |\n| low            | Canny Edge Detection lowpass filter threshold                                                                   |\n| high           | Canny Edge Detection highpass filter threshold                                                                  |\n| pipe           | PNDMScheduler                                                                                |\n| trans_info     | Amazon Translate parameters,                                                                                       |\n\n\n\n### References \n- `[1]`. Lykon/DreamShaper, Stable Diffusion model, https://huggingface.co/Lykon/DreamShaper\n- `[2]`. Text-to-Image Generation with ControlNet Conditioning, https://huggingface.co/docs/diffusers/v0.14.0/en/api/pipelines/stable_diffusion/controlnet\n- `[3]`. Controlnet - Canny Version ,https://huggingface.co/lllyasviel/sd-controlnet-canny\n- `[4]`. Amazon Translate, https://aws.amazon.com/ko/translate/\n- `[5]`. Amazon Translate, source language code, https://docs.aws.amazon.com/translate/latest/dg/what-is-languages.html","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyeonsangjeon%2Faisketcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhyeonsangjeon%2Faisketcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyeonsangjeon%2Faisketcher/lists"}