Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/icppWorld/icgpt
on-chain LLMs for the Internet Computer
https://github.com/icppWorld/icgpt
internet-computer llama2 llm
Last synced: 5 days ago
JSON representation
on-chain LLMs for the Internet Computer
- Host: GitHub
- URL: https://github.com/icppWorld/icgpt
- Owner: icppWorld
- License: mit
- Created: 2023-09-12T09:27:36.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-30T01:46:53.000Z (8 days ago)
- Last Synced: 2024-10-30T03:59:57.444Z (8 days ago)
- Topics: internet-computer, llama2, llm
- Language: JavaScript
- Homepage: https://icgpt.icpp.world
- Size: 3.13 MB
- Stars: 12
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-internet-computer - ICGPT - dApp with React front-end & C/C++ back-ends running LLMs fully on chain. [Try it here](https://icgpt.icpp.world/). (Decentralized AI / Solana)
README
# ICGPT
[Try it out](https://icgpt.icpp.world) !
---
The full application consists of 3 GitHub repositories:
1. [icgpt](https://github.com/icppWorld/icgpt) (This repo)
2. [icpp_llm](https://github.com/icppWorld/icpp_llm)
3. [llama_cpp_canister](https://github.com/onicai/llama_cpp_canister)# Setup
## Nodejs
Make sure you have nodejs installed on your system.
## Conda
[Download MiniConda](https://docs.conda.io/en/latest/miniconda.html#linux-installers) and then install it:
```bash
bash Miniconda3-xxxxx.sh
```Create a conda environment with Python 3.11:
```bash
conda create --name icgpt python=3.11
conda activate icgpt
```## git
Clone dependency repos:
```bash
git clone https://github.com/icppWorld/icpp_llm
# FOLLOW Set Up INSTRUCTIONS OF icpp_llm/llama2_c README !!!git clone https://github.com/onicai/llama_cpp_canister
# FOLLOW Set Up INSTRUCTIONS OF llama_cpp_canister README !!!
```Clone icgpt repo:
```bash
git clone [email protected]:icppWorld/icgpt.git
cd icgpt
```## Update requirements-dev.txt
We install python requirements from the icpp_llm & llama_cpp_canister repos.
Make sure that requirements-dev.txt is pointing to the correct locations.### pre-commit
Create this pre-commit script, file `.git/hooks/pre-commit`
```bash
#!/bin/bash# Apply all static auto-formatting & perform the static checks
export PATH="$HOME/miniconda3/envs/icgpt/bin:$PATH"
/usr/bin/make all-static
```and make the script executable:
```bash
chmod +x .git/hooks/pre-commit
```## toolchain & dependencies
Install the toolchain:
- The dfx release version is specified in `dfx.json`
```bash
conda activate icgpt
make install-all-ubuntu # for Ubuntu.
make install-all-mac # for Mac.
# see Makefile to replicate for other systems# ~/bin must be on path
source ~/.profile# Verify all tools are available
dfx --version# verify all other items are working
conda activate icgpt
make all-static-check
```# Development
## The backend LLM canisters
ICGPT includes LLM backend canisters from [icpp_lmm](https://github.com/icppWorld/icpp_llm) & [llama_cpp_canister](https://github.com/onicai/llama_cpp_canister)
### Setup for icpp_llm
- Clone [icpp_lmm](https://github.com/icppWorld/icpp_llm) as a sibling to this repo
- Follow instructions of [llama2_c](https://github.com/icppWorld/icpp_llm/tree/main/llama2_c) to :
- Build the wasm
- Get the model checkpointsThe following files are used by the ICGPT deployment steps:
```
# See: dfx.json
../icpp_llm/llama2_c/src/llama2.did
../icpp_llm/llama2_c/build/llama2.wasm# See: Makefile
../icpp_llm/llama2_c/scripts/upload.py
```The following models will be uploaded as ICGPT backend canisters:
```
../icpp_llm/llama2_c/stories260K/stories260K.bin
../icpp_llm/llama2_c/stories260K/tok512.bin../icpp_llm/llama2_c/tokenizers/tok4096.bin
../icpp_llm/llama2_c/models/stories15Mtok4096.bin# Charles: 42M with tok4096 (Not yet public)
../charles/models/out-09/model.bin
../charles/models/out-09/tok4096.bin
```### Setup for llama_cpp_canister
- Clone [llama_cpp_canister](https://github.com/onicai/llama_cpp_canister):
- Follow instructions of the [llama_cpp_canister](https://github.com/onicai/llama_cpp_canister) to :
- Build the wasm
- Download the GGUF model from HuggingfaceThe following files are used by the ICGPT deployment steps:
```
# See: dfx.json
../../../onicai/repos/llama_cpp_canister/src/llama_cpp.did
../../../onicai/repos/llama_cpp_canister/build/llama_cpp.wasm# See: Makefile
../../../onicai/repos/llama_cpp_canister/scripts/upload.py
```The following models will be uploaded as ICGPT backend canisters:
```
../../../onicai/repos/llama_cpp_canister/models/Qwen/Qwen2.5-0.5B-Instruct-GGUF/qwen2.5-0.5b-instruct-q8_0.gguf
```## Deploy ICGPT to local network
Once the files of the backend LLMs are in place, as described in the previous step, you can deploy everything with:
```bash
# Start the local network
dfx start --clean# In another terminal, deploy the canisters
# IMPORTANT: dfx deploy ... updates .env for local canisters
# .env is used by the frontend webpack.config.js !!!# Deploy the wasms & upload models & prime the canisters
dfx deploy llama2_260K
make upload-260K-localdfx deploy llama2_15M
make upload-15M-localdfx deploy llama2_42M
make upload-charles-42M-local
# make upload-42M-localdfx deploy llama2_110M
make upload-110M-local# llama.cpp qwen2.5 0.5b q8 (676 Mb)
dfx deploy llama_cpp_qwen25_05b_q8 -m [upgrade/reinstall] # upgrade preserves model in stable memory
dfx canister update-settings llama_cpp_qwen25_05b_q8 --wasm-memory-limit 4GiB
dfx canister status llama_cpp_qwen25_05b_q8
dfx canister call llama_cpp_qwen25_05b_q8 set_max_tokens '(record { max_tokens_query = 10 : nat64; max_tokens_update = 10 : nat64 })'
# if (re)installed:
make upload-llama-cpp-qwen25-05b-q8-local # Not needed after an upgrade, only after initial or reinstall
# else (After `dfx deploy -m upgrade`):
dfx canister call llama_cpp_qwen25_05b_q8 load_model '(record { args = vec {"--model"; "model.gguf"; } })'dfx deploy internet_identity # REQUIRED: it installs II
dfx deploy canister_frontend # REQUIRED: redeploy each time backend candid interface is modified.
# it creates src/declarations used by webpack.config.js# Note: you can stop the local network with
dfx stop
```After the deployment steps described above, the full application is now deployed to the local network, including the front-end canister, the LLM back-end canisters, and the internet_identity canister:
However, you can not run the frontend served from the local IC network, due to CORS restrictions.
Just run it locally as described in the next section, `Front-end Development`
## Test Qwen2.5 0.5B Q8_0 backend with dfx
It is handy to be able to verify the Qwen2.5 backend canister with dfx:
- Chat with the LLM:
Details how to use the Qwen models with llama.cpp:
https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html```bash
# Start a new chat - this resets the prompt-cache for this conversation
dfx canister call llama_cpp_qwen25_05b_q8 new_chat '(record { args = vec {"--prompt-cache"; "my_cache/prompt.cache"} })'# Repeat this call until the prompt_remaining is empty. KEEP SENDING THE ORIGINAL PROMPT
# Example of a longer prompt
dfx canister call llama_cpp_qwen25_05b_q8 run_update '(record { args = vec {"--prompt-cache"; "my_cache/prompt.cache"; "--prompt-cache-all"; "-sp"; "-p"; "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\ngive me a short introduction to LLMs.<|im_end|>\n<|im_start|>assistant\n"; "-n"; "512" } })'# Example of a very short prompt
dfx canister call llama_cpp_qwen25_05b_q8 run_update '(record { args = vec {"--prompt-cache"; "my_cache/prompt.cache"; "--prompt-cache-all"; "-sp"; "-p"; "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nhi<|im_end|>\n<|im_start|>assistant\n"; "-n"; "512" } })'...
# Once prompt_remaining is empty, repeat this call, with an empty prompt, until `generated_eog=true`:
dfx canister call llama_cpp_qwen25_05b_q8 run_update '(record { args = vec {"--prompt-cache"; "my_cache/prompt.cache"; "--prompt-cache-all"; "-sp"; "-p"; ""; "-n"; "512" } })'...
# Once generated_eog = true, the LLM is done generating
# this is the output after several update calls and it has reached eog:
(
variant {
Ok = record {
output = " level of complexity than the original text.<|im_end|>";
conversation = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\ngive me a short introduction to LLMs.<|im_end|>\n<|im_start|>assistant\nLLMs are large language models, or generative models, that can generate text based on a given input. These models are trained on a large corpus of text and are able to generate text that is similar to the input. They can be used for a wide range of applications, such as language translation, question answering, and text generation for various tasks. LLMs are often referred to as \"artificial general intelligence\" because they can generate text that is not only similar to the input but also has a higher level of complexity than the original text.<|im_end|>";
error = "";
status_code = 200 : nat16;
prompt_remaining = "";
generated_eog = true;
}
},
)For more details & options, see llama_cpp_canister repo.
## Front-end Development
The front-end is a react application with a webpack based build pipeline. Webpack builds with sourcemaps, so you can use the following front-end development workflow:
- Deploy the full application to the local network, as described in previous step
- Do not open the front-end deployed to the local network, but instead run the front-end with the npm development server:```bash
# from root directoryconda activate icgpt
# start the npm development server, with hot reloading
npm run start# to rebuild from scratch
npm run build
```- When you login, just create a new II, and once login completed, you will see the start screen shown at the top of this README.
- Open the browser devtools for debugging
- Make changes to the front-end code in your favorite editor, and when you save it, everything will auto-rebuild and auto-reload
### Update to latest Internet Identity
We use `latest` for all `@dfinity/...` packages in package.json, so to update to the latest version just run:
```
npm update
```### Styling with Dracula UI
All front-end color styling is done using the open source Dracula UI:
- [github](https://github.com/dracula/dracula-ui)
- [user guide](https://ui.draculatheme.com/)# Deployment to IC
Step 0: When deploying for the first time:
- Delete **canister_ids.json**, because when you forked or cloned the github repo [icgpt](https://github.com/icppWorld/icgpt), it contained the canisters used by our deployment at https://icgpt.icpp.world/
Step 1: Build the backend wasm files
- Clone [icpp_llm](https://github.com/icppWorld/icpp_llm/) and follow the instructions in [llama2_c](https://github.com/icppWorld/icpp_llm/tree/main/llama2_c) to build the wasm for each backend canister.
Step 2: Deploy the backend canisters
- Note that **dfx.json** points to the wasm files build during Step 1
```bash
# Deploy & upload models
dfx deploy --ic llama2_260K -m reinstall
make upload-260K-icdfx deploy --ic llama2_15M -m reinstall
make upload-15M-icdfx deploy --ic llama2_42M -m reinstall
make upload-charles-42M-ic
# make upload-42M-icdfx deploy --ic llama2_110M -m reinstall
make upload-110M-ic# qwen2.5 0.5b q8 (676 Mb)
dfx deploy --ic llama_cpp_qwen25_05b_q8 -m [upgrade/reinstall] # upgrade preserves model in stable memory
dfx canister --ic update-settings llama_cpp_qwen25_05b_q8 --wasm-memory-limit 4GiB
dfx canister --ic status llama_cpp_qwen25_05b_q8
dfx canister --ic call llama_cpp_qwen25_05b_q8 set_max_tokens '(record { max_tokens_query = 10 : nat64; max_tokens_update = 10 : nat64 })'
# To be able to upload the model, I change the
# [compute allocation](https://internetcomputer.org/docs/current/developer-docs/smart-contracts/maintain/settings#compute-allocation)
dfx canister update-settings --ic llama_cpp_qwen25_05b_q8 --compute-allocation 1 # (costs a rental fee)
dfx canister status --ic llama_cpp_qwen25_05b_q8
#
# After `dfx deploy -m reinstall`:
make upload-llama-cpp-qwen25-05b-q8-ic # Not needed after an upgrade, only after initial or reinstall
#
# After `dfx deploy -m upgrade`:
dfx canister --ic call llama_cpp_qwen25_05b_q8 load_model '(record { args = vec {"--model"; "model.gguf"; } })'#--------------------------------------------------------------------------
# IMPORTANT: ic-py might throw a timeout => patch it here:
# Ubuntu:
# /home//miniconda3/envs//lib/python3.11/site-packages/httpx/_config.py
# Mac:
# /Users//miniconda3/envs//lib/python3.11/site-packages/httpx/_config.py
# DEFAULT_TIMEOUT_CONFIG = Timeout(timeout=5.0)
DEFAULT_TIMEOUT_CONFIG = Timeout(timeout=99999999.0)
# And perhaps here:
# Ubuntu:
# /home//miniconda3/envs//lib/python3.11/site-packages/httpcore/_backends/sync.py #L28-L29
# Mac:
# /Users//miniconda3/envs//lib/python3.11/site-packages/httpcore/_backends/sync.py #L28-L29
#
class SyncStream(NetworkStream):
def __init__(self, sock: socket.socket) -> None:
self._sock = sockdef read(self, max_bytes: int, timeout: typing.Optional[float] = None) -> bytes:
exc_map: ExceptionMapping = {socket.timeout: ReadTimeout, OSError: ReadError}
with map_exceptions(exc_map):
# PATCH AB
timeout = 999999999
# ENDPATCH
self._sock.settimeout(timeout)
return self._sock.recv(max_bytes)
# ------------------------------------------------------------------------```
Note: Downloading the log file
You can download the `main.log` file from the canister with the command:
```
# For example, this is for the qwen2.5 q8_0 canister running on the IC in ICGPT
make download-log-llama-cpp-qwen25-05b-q8-ic
```Step 3: deploy the frontend
- Now that the backend is in place, the frontend can be deployed
```bash
# from root directory
conda activate icgptdfx identity use
# This deploys just the frontend!
dfx deploy --ic canister_frontend
```## Verify
```bash
scripts/ready.sh --network [local/ic]
```## Check cycle balance
```bash
scripts/balance.sh --network [local/ic]
```## Top up cycles
```bash
# Edit the value of TOPPED_OFF_BALANCE_T in the script.
scripts/top-off.sh --network [local/ic]
```# Appendix A - NOTES
## process.env.CANISTER*ID*
The generated declarations and in our own front-end code the canister Ids are defined with `process.env.CANISTER_ID_`.
The way that these environment variables are created is:
- The command `dfx deploy` maintains a section in the file `.env` where it stores the canister id for every deployed canister.
- The commands `npm build/run` use `webpack.config.js`, where the `webpack.EnvironmentPlugin` is used to define the values.## Internet Identity
icgpt is using internet identity for authentication.
When deploying locally, the internet_identity canister will be installed automatically during the `make dfx-deploy-local` or `dfx deploy --network local` command. It uses the instructions provided in `dfx.json`.
When deploying to IC, it will NOT be deployed.
For details, see this [forum post](https://forum.dfinity.org/t/problem-insalling-internet-identity-in-local-setup/20417/18).