Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Zuellni/ComfyUI-ExLlama-Nodes

ExLlama nodes for ComfyUI.
https://github.com/Zuellni/ComfyUI-ExLlama-Nodes

comfyui exllama stable-diffusion

Last synced: about 2 months ago
JSON representation

ExLlama nodes for ComfyUI.

Awesome Lists containing this project

README

        

# ComfyUI ExLlama Nodes
A simple local text generator for [ComfyUI](https://github.com/comfyanonymous/ComfyUI) using [ExLlamaV2](https://github.com/turboderp/exllamav2).

## Installation
Clone the repository to `custom_nodes` and install the requirements:
```
git clone https://github.com/Zuellni/ComfyUI-ExLlama-Nodes custom_nodes/ComfyUI-ExLlamaV2-Nodes
pip install -r custom_nodes/ComfyUI-ExLlamaV2-Nodes/requirements.txt
```

Use wheels for [ExLlamaV2](https://github.com/turboderp/exllamav2/releases/latest) and [FlashAttention](https://github.com/bdashore3/flash-attention/releases/latest) on Windows:
```
pip install exllamav2-X.X.X+cuXXX.torch2.X.X-cp3XX-cp3XX-win_amd64.whl
pip install flash_attn-X.X.X+cuXXX.torch2.X.X-cp3XX-cp3XX-win_amd64.whl
```

## Usage
Only EXL2, 4-bit GPTQ and FP16 models are supported. You can find them on [Hugging Face](https://huggingface.co).

To use a model with the nodes, you should clone its repository with `git` or manually download all the files and place them in a folder in `models/llm`.
For example, if you want to download the 4-bit [Llama-3.1-8B-Instruct](https://huggingface.co/turboderp/Llama-3.1-8B-Instruct-exl2), use the following command:
```
git install lfs
git clone https://huggingface.co/turboderp/Llama-3.1-8B-Instruct-exl2 -b 4.0bpw models/llm/Llama-3-8B-Instruct-exl2-4.0bpw
```

> [!TIP]
> You can add your own `llm` path to the [extra_model_paths.yaml](https://github.com/comfyanonymous/ComfyUI/blob/master/extra_model_paths.yaml.example) file and put the models there instead.

## Nodes


ExLlama Nodes


Loader
Loads models from the llm directory.



cache_bits
A lower value reduces VRAM usage, but also affects generation speed and quality.



fast_tensors
Enabling reduces RAM usage and speeds up model loading.



flash_attention
Enabling reduces VRAM usage, not supported on cards with compute capability lower than 8.0.



max_seq_len
Max context, higher value equals higher VRAM usage. 0 will default to model config.


Formatter
Formats messages using the model's chat template.



add_assistant_role
Appends assistant role to the formatted output.


Tokenizer
Tokenizes input text using the model's tokenizer.



add_bos_token
Prepends the input with a bos token if enabled.



encode_special_tokens
Encodes special tokens such as bos and eos if enabled, otherwise treats them as normal strings.


Settings
Optional sampler settings node. Refer to SillyTavern for parameters.


Generator
Generates text based on the given input.



unload
Unloads the model after each generation to reduce VRAM usage.



stop_conditions
A list of strings to stop generation on, e.g. "\n" to stop on newline. Leave empty to only stop on eos.



max_tokens
Max new tokens to generate. 0 will use available context.


Text Nodes


Convert
Strips punctuation, whitespace, and changes case for input.


Message
A message for the Formatter node. Can be chained to create a conversation.


Preview
Displays generated text in the UI.


Replace
Replaces variable names in curly brackets, e.g. {a}, with their values.


String
A string. That's it.

## Workflow
An example workflow is embedded in the image below and can be opened in ComfyUI.

![workflow](https://github.com/user-attachments/assets/359c0340-fe0e-4e69-a1b4-259c6ff5a142)