Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/arrowban/my-little-gpt

A simple AI assistant
https://github.com/arrowban/my-little-gpt

ai llamacpp llm llms pocketbase self-hosted sveltekit turborepo

Last synced: 4 days ago
JSON representation

A simple AI assistant

Awesome Lists containing this project

README

        

# 🤖 My Little GPT

My Little GPT is a simple AI assistant that can use open-source models running locally on your computer, the [Anthropic API](https://docs.anthropic.com/en/docs/intro-to-claude), or the [OpenAI API](https://platform.openai.com/docs/overview).

## Table of contents

- [Background](#-background)
- [Getting started](#-getting-started)
- [Available models](#available-models)
- [Local installation](#-local-installation)
- [MacOS (Apple Silicon)](#macos-apple-silicon)
- [Linux, MacOS, and Windows](#linux-macos-and-windows)
- [Self-host](#️-self-host)
- [ngrok](#ngrok)
- [The code](#-the-code)

Check out the hosted version of My Little GPT [here](https://mylittlegpt.app), deployed directly from the `hosted` branch of this repository.

### Previews

#### Screenshot (mobile)


my-little-gpt-screenshot

#### Screen recording

https://github.com/arrowban/my-little-gpt/assets/155796497/5b61cdd8-74cf-4901-888a-9e020f953182

## 🧐 Background

I used to use [ChatGPT](https://chatgpt.com) through its premium subscription. Eventually I decided that I don't want to keep spending $20 a month on it. However, I still value consistent access to GPT-4 for work and whatnot, so I decided to look into self-hosting.

There are some great open-source repos for this, but I was looking for a codebase that is only as complex as I need it to be, with a chat app that has the mobile UX I want. I ultimately decided to make my own end-to-end stack – my own little GPT.

I've open-sourced the codebase and written an installation guide to make it as easy as possible for others who are interested in self-hosting it themselves. I use My Little GPT to talk to open-source models locally on my computer for free, or pay just for my usage of hosted model APIs.

Right now, My Little GPT supports a minimal, straightforward chat experience. Over time, I may continue to add new features to the app.

## 📖 Getting started

The links below will go to the [hosted version of My Little GPT](https://mylittlegpt.app), but the same paths will work for your [local version](#-local-installation).

1. [Create an account](https://mylittlegpt.app/create-account) (`/create-account`) or [login](https://mylittlegpt.app/login) (`/login`)
2. Go to [settings](https://mylittlegpt.app/settings) (`/settings`) and enter at least one of the following:
- **Local Base URL:** The base URL of any OpenAI compatible URL you want to use
- If [running locally](#-local-installation), set this value to:
- MacOS (Apple Silicon), or any other non-Docker setup: `http://localhost:8000/v1`
- CPU (Docker): `http://llama-cpp-cpu:8000/v1`
- NVIDIA GPU (Docker): `http://llama-cpp-cuda:8000/v1`
- **Anthropic API Key:** An [API key to use for the Anthropic API](https://console.anthropic.com/settings/keys)
- Required to use any Anthropic models
- **OpenAI API Key:** An [API key to use for the OpenAI API](https://platform.openai.com/api-keys)
- Required to use any OpenAI models
3. Go to the [chat page](https://mylittlegpt.app/chat) (`/chat`), select a model from the model picker in the top navbar, and send your first message!
- If chatting with a local model, **every first message** sent after starting the inference server or switching local models **may take as long as a few minutes** to be processed while the model loads into memory
- A chat title is automatically generated by the model you are sending a message to (title quality may vary)
- Feel free to change themes using the theme picker in the sidebar (themes provided by [DaisyUI](https://daisyui.com/docs/themes))

### Available models

#### API providers

It is straightforward to support any provider supported by the [Vercel AI SDK](https://sdk.vercel.ai/providers/ai-sdk-providers). Right now My Little GPT supports the following:

- **Anthropic:** `claude-3-5-sonnet-20240620`, `claude-3-opus-20240229`, `claude-3-sonnet-20240229`, `claude-3-haiku-20240307`
- **OpenAI:** `gpt-4o`, `gpt-4-turbo`, `gpt-4`, `gpt-3.5-turbo`

#### Local

The inference server can use any models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp), and comes configured with the following models:

- [Meta Llama 3.1 8B Instruct](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF)
- `llama-3.1-small`: Quantized (`q4_k_m`) to be less than 5GB in size
- `llama-3.1`: Quantized (`q8_0`) to be less than 9GB in size
- [Mistral 7B Instruct v0.3](https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF)
- `mistral-7b-small`: Quantized (`q4_k_m`) to be less than 5GB in size
- `mistral-7b`: Quantized (`q8_0`) to be less than 8GB in size
- [Qwen2 7B Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GGUF)
- `qwen2-7b-small`: Quantized (`q4_k_m`) to be less than 5GB in size
- `qwen2-7b`: Quantized (`q8_0`) to be less than 9GB in size

Edit the configuration file at `apps/llama-cpp/config.json` to add other models. Reference the [llama-cpp-python docs](https://llama-cpp-python.readthedocs.io/en/latest/server/#configuration-and-multi-model-support) for more info.

## 💻 Local installation

### Hardware requiremments

If you are just using hosted model APIs, any computer that supports Docker will probably work.

#### Local inference server

If you have at least 8GB of memory (RAM for CPU or VRAM for GPU), you should at least be able to run models that have quantized versions < 8GB in size. See [here](#local) for more information on models that come pre-configured.

**For higher inference speeds, NVIDIA GPU or M1/M2/M3 Mac (Apple Silicon) is recommended.**

### MacOS (Apple Silicon)

#### Install requirements

- [Docker Desktop](https://www.docker.com/products/docker-desktop)
- Installs [Docker](https://www.docker.com) and [Docker Compose](https://docs.docker.com/compose) along with a nice GUI
- [Node.js](https://nodejs.org)

You can install the requirements yourself from the links above, or follow these instructions:

- Install [Homebrew](https://brew.sh), then run the following in your terminal:

```bash
brew install node@20
```

- [Download and install Docker Desktop](https://www.docker.com/products/docker-desktop)
- [Install Miniforge](https://github.com/conda-forge/miniforge?tab=readme-ov-file#unix-like-platforms-mac-os--linux)
- Make sure your terminal shows that you are using the `base` environment, and that `python --version` prints a version of python >= 3.10

#### Clone repository

```bash
git clone https://github.com/arrowban/my-little-gpt.git
cd my-little-gpt
```

#### Install and build dependencies

First, set up environment variables for the chat app using `apps/web/.env.example`:

```bash
cp apps/web/.env.example apps/web/.env.local
```

Then install and build dependencies:

```bash
npm install
npm run build
```

#### Start the chat app, backend, and inference server

- Open the Docker Desktop app to start the Docker daemon
- Run the following in your terminal from the root of the repository:

```bash
npm run start
```

After everything starts up, visit http://localhost:3000/create-account to create an account on your local instance of My Little GPT. See [the getting started section above](#-getting-started) for instructions on how to use My Little GPT.

### Linux, MacOS, and Windows

All other platforms are supported via Docker.

#### Install requirements

- [Docker Desktop](https://www.docker.com/products/docker-desktop)
- Installs [Docker](https://www.docker.com) and [Docker Compose](https://docs.docker.com/compose/) along with a nice GUI
- (NVIDIA GPUs only) [CUDA](https://docs.nvidia.com/cuda) (12.5 supported out-of-the-box)
- (NVIDIA GPUs only) [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)

#### Clone repository

```bash
git clone https://github.com/arrowban/my-little-gpt.git
cd my-little-gpt
```

#### Start the chat app, backend, and inference server

- Open the Docker Desktop app to start the Docker daemon
- Run the following in your terminal from the root of the repository:

```bash
# Without a local inference server
docker compose up

# CPU
docker compose --profile cpu up

# NVIDIA GPU (CUDA)
docker compose --profile cuda up
```

After everything starts up, visit http://localhost:3000/create-account to create an account on your local instance of My Little GPT. See [the getting started section above](#-getting-started) for instructions on how to use My Little GPT.

## ☁️ Self-host

### Requirements

By default, the API endpoint for creating a user on the backend is public. Please make sure to secure your self-hosted endpoints, or use the [pocketbase admin dashboard](https://pocketbase.io/docs) to update the create rule for the `users` table to `Admin only` (`null`).

### ngrok

[ngrok](https://ngrok.com) provides one free stable domain (at the time of writing this) so you can use your own local instance of My Little GPT anywhere you want, as long as you leave it running on your computer at home (I use it on my phone most of the time).

#### Set up ngrok

1. [Create an ngrok account](https://dashboard.ngrok.com/signup)
2. Take note of your [Authtoken](https://dashboard.ngrok.com/get-started/your-authtoken)
3. Create your one free domain in the ["Domains" tab under "Cloud Edge"](https://dashboard.ngrok.com/cloud-edge/domains)
4. Set up the ngrok configuration `ngrok.yml`:

```bash
# From the root of the repository
cp apps/ngrok/template.yml ngrok.yml
```

5. In `ngrok.yml`, replace `MY_AUTHTOKEN` with your Authtoken, and `MY_DOMAIN` (in two places) with the domain that ngrok generated for you
6. (Optional, but [highly recommended](#requirements)) Edit the `web` tunnel in `ngrok.yml` following the [ngrok docs for securing endpoints using basic auth](https://ngrok.com/docs/http/basic-auth/?cty=agent-config#example-usage)
- This could look like adding the following to the `ngrok.yml` under the `web` tunnel:
```yml
basic_auth:
- MY_USERNAME:MY_PASSWORD # Use something more secure than this
```
- You would then be able to use your private instance like normal, passing basic auth headers to the website via the URL, like this: `https://MY_USERNAME:[email protected]`
- Prevents bad actors from abusing your [public endpoints](#requirements)

#### Start ngrok tunnels

##### MacOS (Apple Silicon)

```bash
npm run ngrok:web
```

##### Linux, MacOS, and Windows

```bash
# Without a local inference server
docker compose --profile ngrok-web up

# CPU
docker compose --profile cpu --profile ngrok-web up

# NVIDIA GPU (CUDA)
docker compose --profile cuda --profile ngrok-web up
```

After starting the web tunnel, you will be able to access your private instance of My Little GPT by visiting the domain generated for you by ngrok.

## 🧑‍💻 The code

Contributions are welcome!

WARNING: The development workflow has only been tested on MacOS (Apple Silicon), sorry!

### Apps and Packages

- `@my-little-gpt/llama-cpp`: [llama-cpp-python](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file) inference server
- `@my-little-gpt/ngrok`: Helper scripts for starting [ngrok](https://ngrok.com) tunnels via Docker
- `@my-little-gpt/pocketbase`: Backend for the chat app, made with [Pocketbase](https://pocketbase.io)
- `@my-little-gpt/web`: Chat app, made with [SvelteKit](https://kit.svelte.dev)
- `@my-little-gpt/eslint-config`: [ESLint](https://eslint.org) config
- `@my-little-gpt/typescript-config`: [TypeScript](https://www.typescriptlang.org) config

### Development

WARNING: `dev` script only tested on MacOS (Apple Silicon).

Install dependencies with `npm install`, then use the `npm run dev` command to start the:

- `llama-cpp` server at `http://localhost:8000`
- `pocketbase` server at `http://localhost:8080`
- `web` server at `http://localhost:5173`

Edits to the codebase will trigger a "hot reload" of the web app.

#### Other scripts

WARNING: `build`, `start`, `ngrok:web`, and `ngrok:llama-cpp` scripts only tested on MacOS (Apple Silicon).

- `format`: Format the codebase using [Prettier](https://prettier.io)
- `lint`: Run a lint check with [ESLint](https://eslint.org)
- `check`: Run a type check with [TypeScript](https://www.typescriptlang.org)
- `build`: Build and setup all apps and packages
- `build:force`: The `build` script, without using the build cache
- `start`: Start My Little GPT in production mode
- `ngrok:web`: Start My Little GPT with an ngrok tunnel to the chat app (localhost:3000)
- `ngrok:llama-cpp`: Start ngrok tunnel to the llama-cpp server (localhost:8000)