Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/adamelliotfields/chat

My first text generation UI for Web LLM 💬
https://github.com/adamelliotfields/chat

github-pages llama react tailwind web-llm

Last synced: 2 months ago
JSON representation

My first text generation UI for Web LLM 💬

Host: GitHub
URL: https://github.com/adamelliotfields/chat
Owner: adamelliotfields
License: mit
Created: 2024-02-14T19:06:50.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-09-01T13:15:34.000Z (5 months ago)
Last Synced: 2024-11-08T21:05:03.059Z (2 months ago)
Topics: github-pages, llama, react, tailwind, web-llm
Language: TypeScript
Homepage: https://aef.me/chat
Size: 588 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md
- License: license
- Codeowners: codeowners

Awesome Lists containing this project

README

        # chat

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/adamelliotfields/chat?devcontainer_path=.devcontainer/devcontainer.json&machine=basicLinux32gb)

> [!IMPORTANT]  

> No longer maintained. :cry: When I first made this, there was no UI for WebLLM. The official app at [chat.webllm.ai](https://chat.webllm.ai) is now the best UI for WebLLM and is actively maintained. Use that or one of Xenova's WebGPU [spaces](https://huggingface.co/collections/Xenova/transformersjs-demos-64f9c4f49c099d93dbc611df) instead! :llama:

React chat UI for [Web LLM](https://webllm.mlc.ai) on GitHub Pages. Built with Tailwind and Jotai. Inspired by [Perplexity Labs](https://labs.perplexity.ai).

https://github.com/adamelliotfields/chat/assets/7433025/07565763-606b-4de3-aa2d-8d5a26c83941

## Introduction

[Web LLM](https://github.com/mlc-ai/web-llm) is a project under the [MLC](https://mlc.ai) (machine learning compilation) organization. It allows you to run large language models in the browser using WebGPU and WebAssembly. Check out the [example](https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat) and read the [introduction](https://mlc.ai/chapter_introduction/index.html) to learn more.

In addition to [`@mlc-ai/web-llm`](https://www.npmjs.com/package/@mlc-ai/web-llm), the app uses TypeScript, React, Jotai, and Tailwind. It's built with Vite and SWC.

## Usage

```sh

# localhost:5173

npm install

npm start

```

## Known issues

I'm currently using Windows/Edge stable on a Lenovo laptop with a RTX 2080 6GB.

Using the demo app at [webllm.mlc.ai](https://webllm.mlc.ai), I did not have to enable any flags to get the `q4f32` quantized models to work (`f16` requires a flag). Go to [webgpureport.org](https://webgpureport.org) to inspect your system's WebGPU capabilities.

### Fetch errors

For whatever reason, I have to be behind a VPN to fetch the models from Hugging Face on Windows. 🤷‍♂️

### Cannot find global function

Usually a cache issue.

You can delete an individual cache:

```js

await caches.delete('webllm/wasm')

```

Or all caches:

```js

await caches.keys().then(keys => Promise.all(keys.map(key => caches.delete(key))))

```

## Reference

There is only 1 class you need to know to get started: [`ChatModule`](https://github.com/mlc-ai/web-llm/blob/main/src/chat_module.ts)

```ts

const chat = new ChatModule()

// callback that fires on progress updates during initialization (e.g., fetching chunks)

type ProgressReport = { progress: number; text: string; timeElapsed: number }

type Callback = (report: ProgressReport) => void

const onProgress: Callback = ({ text }) => console.log(text)

chat.setInitProgressCallback(onProgress)

// load/reload with new model

// customize `temperature`, `repetition_penalty`, `top_p`, etc. in `options`

// set system message in `options.conv_config.system`

// defaults are in conversation.ts and the model's mlc-chat-config.json

import type { ChatOptions } from '@mlc-ai/web-llm'

import config from './src/config'

const id = 'TinyLlama-1.1B-Chat-v0.4-q4f32_1-1k'

const options: ChatOptions = { temperature: 0.9, conv_config: { system: 'You are a helpful assistant.' } }

await chat.reload(id, options, config)

// generate response from prompt

// callback fired on each generation step

// returns the complete response string when resolved

type Callback = (step: number, message: string) => void

const onGenerate: Callback = (_, message) => console.log(message)

const response = await chat.generate('What would you like to talk about?', onGenerate)

// get last response (sync)

const message: string = chat.getMessage()

// interrupt generation if in progress (sync)

// resolves the Promise returned by `generate`

chat.interruptGenerate()

// check if generation has stopped (sync)

// shorthand for `chat.getPipeline().stopped()`

const isStopped: boolean = chat.stopped()

// reset chat, optionally keep stats (defaults to false)

const keepStats = true

await chat.resetChat(keepStats)

// get stats

// shorthand for `await chat.getPipeline().getRuntimeStatsText()`

const statsText: string = await chat.runtimeStatsText()

// unload model from memory

await chat.unload()

// get GPU vendor

const vendor: string = await chat.getGPUVendor()

// get max storage buffer binding size

// used to determine the `low_resource_required` flag

const bufferBindingSize: number = await chat.getMaxStorageBufferBindingSize()

// getPipeline is private (useful for debugging in dev tools)

const pipeline = chat.getPipeline()

```

## Cache management

The library uses the browser's [`CacheStorage`](https://developer.mozilla.org/en-US/docs/Web/API/CacheStorage) API to store models and their configs.

There is an exported helper function to check if a model is in the cache.

```ts

import { hasModelInCache } from '@mlc-ai/web-llm'

import config from './config'

const inCache = hasModelInCache('Phi2-q4f32_1', config) // throws if model ID is not in the config

```

## VRAM requirements

See [utils/vram_requirements](https://github.com/mlc-ai/web-llm/tree/main/utils/vram_requirements) in the Web LLM repo.

## TODO

- [ ] Dark mode

- [ ] Settings menu (temperature, system message, etc.)

- [ ] Inference on web worker

- [ ] Offline/PWA

- [ ] Cache management

- [ ] Image upload for multimodal like [LLaVA](https://llava-vl.github.io)

- [ ] Tailwind class sorting by Biome 🤞