https://github.com/olivv-cs/ComfyUI-FunPack

A set of nodes for fun!
https://github.com/olivv-cs/ComfyUI-FunPack

Last synced: 6 days ago
JSON representation

A set of nodes for fun!

Host: GitHub
URL: https://github.com/olivv-cs/ComfyUI-FunPack
Owner: olivv-cs
License: gpl-3.0
Created: 2025-06-10T13:03:36.000Z (8 days ago)
Default Branch: main
Last Pushed: 2025-06-10T14:09:43.000Z (8 days ago)
Last Synced: 2025-06-10T14:26:45.705Z (8 days ago)
Language: Python
Size: 23.4 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-comfyui - **ComfyUI-FunPack**

README

# ComfyUI-FunPack
A set of custom nodes designed for experiments with video diffusion models.
EXPERIMENTAL, and I mean it. Constantly updating, changing, adding and removing, just for sake of making something work.
You have been warned.

**FunPack CLIP Loader**

![image](https://github.com/user-attachments/assets/667bb349-c9b8-44ae-b099-3776b310b353)

Update: This node now serves as... I guess, prompt enhancer? It processes user input, adds an enhanced prompt, then does tokenizing using regular CLIP.

Inputs:
- clip_model_name - your CLIP-L model that you usually use with Hunyuan/FramePack (e.g. clip-vit-large-patch14);
- text_encoder_model_name - your instruct (or any other LLM?) model. Expects .safetensors file, if "instruct_from_pretrained" is on - ignores this;
- llm_vision_model_name - your llava-llama-3 model you usually use with Hunyuan/FramePack (e.g. llava-llama-3-8b-v1_1). Also it's possible to load any other LLM, with or without vision capabilities (I guess);
- type - select "hunyuan_video", left for compatibility;
- encoder_pretrained_path - Provide a HuggingFace path for config and tokenizer for your encoder model (or for weights as well, if encoder_from_pretrained=True);
- vision_pretrained_path - Provide a HuggingFace path for LLM+vision model (only used if vision_from_pretrained=True);
- encoder_from_pretrained - if enabled, loads encoder model weights from encoder_pretrained_path as well, ignoring local "text_encoder_model_name";
- vision_from_pretrained - if enabled, loads LLM+vision model weights from vision_pretrained_path, ignoring local "llm_vision_model_name";
- load_te - if enabled, loads your custom text encoder model. If disabled, uses only vision one (e.g. llava-llama-3-8b-v1_1);
- system_prompt - your system prompt that Instruct model is going to be using.
- top_p, top_k, temperature - these are parameters for generating an "assistant prompt". It goes here:

def tokenize(self, text):
assistant_reply = self.generate(text)
messages = [
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": text},
{"role": "assistant", "content": assistant_reply}
]
return tokenizer.apply_chat_template(messages, add_generation_prompt=False, return_tensors="pt").to("cuda")

Technically speaking, it's possible to load just any model as text encoder.

Outputs:
Just CLIP. Pass it through your nodes like you will do with regular DualCLIPLoader.

**FunPack img2latent Interpolation**

![image](https://github.com/user-attachments/assets/1f84d00b-e835-4b0a-96da-e8fb9a1c1366)

Inputs:
- images - a batch of images e.g. from Load Image, Load Video (Upload), Load Video (Path) et cetera.
- frame_count - same as frame count in HunyuanVideoEmptyLatent or similar. Actually, works better with WAN.

Outputs:
- img_batch_for_encode - latent output with interpolated image, resize if needed and put it into VAE Encode, pass as latent_image into your sampler;
- img_for_start_images - takes exactly the last image from the batch. You can pass it to your CLIP Vision and encoders as start_image. Or end_image. Who am I to tell you.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/olivv-cs/ComfyUI-FunPack

Awesome Lists containing this project

README