https://github.com/cj-mills/cjm-diffusers-utils
Some utility functions I frequently use with 🤗 diffusers.
https://github.com/cj-mills/cjm-diffusers-utils
Last synced: 7 months ago
JSON representation
Some utility functions I frequently use with 🤗 diffusers.
- Host: GitHub
- URL: https://github.com/cj-mills/cjm-diffusers-utils
- Owner: cj-mills
- License: mit
- Created: 2023-01-24T00:47:47.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-02-10T02:57:25.000Z (over 2 years ago)
- Last Synced: 2025-02-13T15:49:32.138Z (8 months ago)
- Language: Jupyter Notebook
- Homepage: https://cj-mills.github.io/cjm-diffusers-utils/
- Size: 9.21 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
cjm-diffusers-utils
================## Install
``` sh
pip install cjm_diffusers_utils
```## How to use
``` python
import torch
from cjm_pytorch_utils.core import get_torch_device
device = get_torch_device()
dtype = torch.float16 if device == 'cuda' else torch.float32
device, dtype
```('cuda', torch.float16)
### pil_to_latent
``` python
from cjm_diffusers_utils.core import pil_to_latent
from PIL import Image
from diffusers import AutoencoderKL
`````` python
model_name = "stabilityai/stable-diffusion-2-1"
vae = AutoencoderKL.from_pretrained(model_name, subfolder="vae").to(device=device, dtype=dtype)
`````` python
img_path = img_path = '../images/cat.jpg'
src_img = Image.open(img_path).convert('RGB')
print(f"Source Image Size: {src_img.size}")img_latents = pil_to_latent(src_img, vae)
print(f"Latent Dimensions: {img_latents.shape}")
```Source Image Size: (768, 512)
Latent Dimensions: torch.Size([1, 4, 64, 96])### latent_to_pil
``` python
from cjm_diffusers_utils.core import latent_to_pil
`````` python
decoded_img = latent_to_pil(img_latents, vae)
print(f"Decoded Image Size: {decoded_img.size}")
```Decoded Image Size: (768, 512)
### text_to_emb
``` python
from cjm_diffusers_utils.core import text_to_emb
from transformers import CLIPTextModel, CLIPTokenizer
`````` python
# Load the tokenizer for the specified model
tokenizer = CLIPTokenizer.from_pretrained(model_name, subfolder="tokenizer")
# Load the text encoder for the specified model
text_encoder = CLIPTextModel.from_pretrained(model_name, subfolder="text_encoder").to(device=device, dtype=dtype)
`````` python
prompt = "A cat sitting on the floor."
text_emb = text_to_emb(prompt, tokenizer, text_encoder)
text_emb.shape
```torch.Size([2, 77, 1024])
### prepare_noise_scheduler
``` python
from cjm_diffusers_utils.core import prepare_noise_scheduler
from diffusers import DEISMultistepScheduler
`````` python
noise_scheduler = DEISMultistepScheduler.from_pretrained(model_name, subfolder='scheduler')
print(f"Number of timesteps: {len(noise_scheduler.timesteps)}")
print(noise_scheduler.timesteps[:10])noise_scheduler = prepare_noise_scheduler(noise_scheduler, 70, 1.0)
print(f"Number of timesteps: {len(noise_scheduler.timesteps)}")
print(noise_scheduler.timesteps[:10])
```Number of timesteps: 1000
tensor([999., 998., 997., 996., 995., 994., 993., 992., 991., 990.])
Number of timesteps: 70
tensor([999, 985, 970, 956, 942, 928, 913, 899, 885, 871])### prepare_depth_mask
``` python
from cjm_diffusers_utils.core import prepare_depth_mask
`````` python
depth_map_path = '../images/depth-cat.png'
depth_map = Image.open(depth_map_path)
print(f"Depth map size: {depth_map.size}")depth_mask = prepare_depth_mask(depth_map).to(device=device, dtype=dtype)
depth_mask.shape, depth_mask.min(), depth_mask.max()
```Depth map size: (768, 512)
(torch.Size([1, 1, 64, 96]),
tensor(-1., device='cuda:0', dtype=torch.float16),
tensor(1., device='cuda:0', dtype=torch.float16))