https://github.com/olivv-cs/ComfyUI-FunPack
A set of nodes for fun!
https://github.com/olivv-cs/ComfyUI-FunPack
Last synced: 6 days ago
JSON representation
A set of nodes for fun!
- Host: GitHub
- URL: https://github.com/olivv-cs/ComfyUI-FunPack
- Owner: olivv-cs
- License: gpl-3.0
- Created: 2025-06-10T13:03:36.000Z (8 days ago)
- Default Branch: main
- Last Pushed: 2025-06-10T14:09:43.000Z (8 days ago)
- Last Synced: 2025-06-10T14:26:45.705Z (8 days ago)
- Language: Python
- Size: 23.4 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-comfyui - **ComfyUI-FunPack**
README
# ComfyUI-FunPack
A set of custom nodes designed for experiments with video diffusion models.
EXPERIMENTAL, and I mean it. Constantly updating, changing, adding and removing, just for sake of making something work.
You have been warned.**FunPack CLIP Loader**

Update: This node now serves as... I guess, prompt enhancer? It processes user input, adds an enhanced prompt, then does tokenizing using regular CLIP.
Inputs:
- clip_model_name - your CLIP-L model that you usually use with Hunyuan/FramePack (e.g. clip-vit-large-patch14);
- text_encoder_model_name - your instruct (or any other LLM?) model. Expects .safetensors file, if "instruct_from_pretrained" is on - ignores this;
- llm_vision_model_name - your llava-llama-3 model you usually use with Hunyuan/FramePack (e.g. llava-llama-3-8b-v1_1). Also it's possible to load any other LLM, with or without vision capabilities (I guess);
- type - select "hunyuan_video", left for compatibility;
- encoder_pretrained_path - Provide a HuggingFace path for config and tokenizer for your encoder model (or for weights as well, if encoder_from_pretrained=True);
- vision_pretrained_path - Provide a HuggingFace path for LLM+vision model (only used if vision_from_pretrained=True);
- encoder_from_pretrained - if enabled, loads encoder model weights from encoder_pretrained_path as well, ignoring local "text_encoder_model_name";
- vision_from_pretrained - if enabled, loads LLM+vision model weights from vision_pretrained_path, ignoring local "llm_vision_model_name";
- load_te - if enabled, loads your custom text encoder model. If disabled, uses only vision one (e.g. llava-llama-3-8b-v1_1);
- system_prompt - your system prompt that Instruct model is going to be using.
- top_p, top_k, temperature - these are parameters for generating an "assistant prompt". It goes here:def tokenize(self, text):
assistant_reply = self.generate(text)
messages = [
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": text},
{"role": "assistant", "content": assistant_reply}
]
return tokenizer.apply_chat_template(messages, add_generation_prompt=False, return_tensors="pt").to("cuda")Technically speaking, it's possible to load just any model as text encoder.
Outputs:
Just CLIP. Pass it through your nodes like you will do with regular DualCLIPLoader.**FunPack img2latent Interpolation**

Inputs:
- images - a batch of images e.g. from Load Image, Load Video (Upload), Load Video (Path) et cetera.
- frame_count - same as frame count in HunyuanVideoEmptyLatent or similar. Actually, works better with WAN.Outputs:
- img_batch_for_encode - latent output with interpolated image, resize if needed and put it into VAE Encode, pass as latent_image into your sampler;
- img_for_start_images - takes exactly the last image from the batch. You can pass it to your CLIP Vision and encoders as start_image. Or end_image. Who am I to tell you.