https://github.com/linode/ai-quickstart-anythingllm
One-click automated AI inference and RAG stack (vLLM + AnythingLLM + pgvector) on Akamai Cloud (Linode) RTX 4000 ada GPU instance.
https://github.com/linode/ai-quickstart-anythingllm
Last synced: 21 days ago
JSON representation
One-click automated AI inference and RAG stack (vLLM + AnythingLLM + pgvector) on Akamai Cloud (Linode) RTX 4000 ada GPU instance.
- Host: GitHub
- URL: https://github.com/linode/ai-quickstart-anythingllm
- Owner: linode
- License: apache-2.0
- Created: 2025-12-06T11:43:29.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-12-15T07:37:59.000Z (6 months ago)
- Last Synced: 2026-02-11T01:36:15.783Z (4 months ago)
- Language: Shell
- Size: 121 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Akamai Cloud - AI Quickstart : RAG Stack with AnythingLLM
Automated deployment script to run your private, self-hosted AI workspace on Akamai Cloud GPU instances. This stack combines vLLM for high-performance LLM inference with AnythingLLM - a full-stack application for building private AI assistants with RAG (Retrieval Augmented Generation), document chat, and agent capabilities.
-----------------------------------------
## 🚀 Quick Start
Just run this single command:
```bash
curl -fsSL https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/deploy.sh | bash
```
That's it! The script will download required files and guide you through the interactive deployment process.
## ✨ Features
- **Fully Automated Deployment**: Handles instance creation with real-time progress tracking
- **Ready to use AI Stack**: vLLM for GPU-accelerated inference + AnythingLLM for enterprise AI workspace
- **RAG & Document Chat**: Upload documents and chat with your data using AnythingLLM's built-in vector database
- **AI Agents**: Build custom AI agents with tools and workflows
- **Cross-Platform Support**: Works on macOS, Linux, and Windows (Git Bash/WSL)
-----------------------------------------
## 🏗️ What Gets Deployed

### Linode GPU Instance with
- Ubuntu 24.04 LTS with NVIDIA drivers
- Docker & NVIDIA Container Toolkit
- Systemd service for automatic startup on reboot
### Docker Containers
| | Service | Description |
|:--:|:--|:--|
|
| **Caddy** | Reverse proxy with automatic HTTPS (port 80/443) |
|
| **vLLM** | High-throughput LLM inference engine with OpenAI-compatible API (port 8000, internal) |
|
| **Text Embeddings Inference** | GPU-accelerated embedding service using BAAI/bge-m3 model (port 8001, internal) |
|
| **pgvector** | PostgreSQL with vector similarity search extension for RAG storage (port 5432, internal) |
|
| **AnythingLLM** | Full-stack AI application with RAG, document chat, agents, and multi-user support (port 3001, internal) |
### Models
#### LLM: openai/gpt-oss-20b
[gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) is OpenAI's first fully open-source LLM, released under Apache 2.0 license. Key characteristics:
- **20B parameters**: Fits on a single RTX 4000 Ada GPU (20GB VRAM)
- **High benchmark scores**: Competitive with larger models on reasoning and instruction-following tasks
- **High throughput**: Optimized for fast token generation with vLLM inference engine
#### Embedding: BAAI/bge-m3
[BGE-M3](https://huggingface.co/BAAI/bge-m3) is a state-of-the-art multilingual embedding model. Key strengths:
- **Multilingual**: Supports 100+ languages with strong cross-lingual retrieval
- **Multi-functionality**: Supports dense, sparse (lexical), and multi-vector retrieval in one model
- **Top performance**: Ranked #1 on MTEB multilingual leaderboard at release
### What is AnythingLLM?
[AnythingLLM](https://anythingllm.com/) is an open-source, full-stack application that turns any document, resource, or content into context for any LLM. Key features include:
- **Document Intelligence**: Upload PDFs, Word docs, websites, and more - chat with your data instantly
- **Vector Database**: Uses pgvector (PostgreSQL) for scalable, production-grade vector storage
- **Multi-user Workspaces**: Create isolated workspaces for different projects or teams
- **Privacy-First**: All data stays on your infrastructure - nothing leaves your server
### RAG Pipeline Components
This deployment includes a complete RAG (Retrieval Augmented Generation) pipeline:
- **Text Embeddings Inference**: Hugging Face's [TEI](https://github.com/huggingface/text-embeddings-inference) service running the [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) multilingual embedding model
- **pgvector**: PostgreSQL extension for efficient vector similarity search, enabling fast document retrieval at scale
-----------------------------------------
## 📋 Requirements
### Akamai Cloud Account
- Active Linode account with GPU access enabled
### Local System Requirements
- **Required**: bash, curl, ssh, jq
- **Note**: jq will be auto-installed if missing
-----------------------------------------
## 🚦 Getting Started
### 1. Option A: Single Command Execution
No installation required - just run:
```bash
curl -fsSL https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/deploy.sh | bash
```
### 1. Option B: Download and Run
Download the script and run locally:
```bash
curl -fsSLO https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/deploy.sh
bash deploy.sh
```
### 1. Option C: Clone Repository
If you prefer to inspect or customize the scripts:
```bash
git clone https://github.com/linode/ai-quickstart-anythingllm
cd ai-quickstart-anythingllm
./deploy.sh
```
> [!NOTE]
> if you like to add more services check out docker compose template file
> ```
> vi /template/docker-compose.yml
> ```
>
### 2. Follow Interactive Prompts
The script will ask you to:
- Choose a region (e.g., us-east, eu-west)
- Select GPU instance type
- Provide instance label
- Select or generate SSH keys
- Confirm deployment
### 3. Wait for Deployment
The script automatically:
- Creates GPU instance in your Linode account
- Monitors cloud-init installation progress
- Waits for AnythingLLM health check
- Waits for vLLM model loading
### 4. Access Your Services
Once complete, you'll see:
```
🎉 Setup Complete!
✅ Your AI LLM instance is now running!
🌐 Access URLs:
AnythingLLM: https://.ip.linodeusercontent.com
🔐 Access Credentials:
SSH: ssh -i /path/to/your/key root@
```
### Configuration files in GPU Instance
```
# Install script called by cloud-init service
/opt/ai-quickstart-anythingllm/install.sh
# docker compose file called by systemctl at startup
/opt/ai-quickstart-anythingllm/docker-compose.yml
# Caddy reverse proxy configuration
/opt/ai-quickstart-anythingllm/Caddyfile
# service definition
/etc/systemd/system/ai-quickstart-anythingllm.service
```
-----------------------------------------
## 🗑️ Delete Instance
To delete a deployed instance:
```bash
# Remote execution
curl -fsSL https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/delete.sh | bash -s --
# Or download script and run
curl -fsSLO https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/delete.sh
bash delete.sh
```
The script will show instance details and ask for confirmation before deletion.
-----------------------------------------
## 📁 Project Structure
```
ai-quickstart-anythingllm/
├── deploy.sh # Main deployment script
├── delete.sh # Instance deletion script
└── template/
├── cloud-init.yaml # Cloud-init configuration
├── docker-compose.yml # Docker Compose configuration
├── Caddyfile # Caddy reverse proxy configuration
└── install.sh # Post-boot installation script
```
-----------------------------------------
## 🔒 Security
**⚠️ IMPORTANT**: By default, ports 80 and 443 are exposed to the internet
### Immediate Security Steps
1. **Configure Cloud Firewall** (Recommended)
- Create Linode Cloud Firewall
- Restrict access to ports 80/443 by source IP
- Allow SSH (port 22) from trusted IPs only
2. **SSH Security**
- SSH key authentication required
- Root password provided for emergency console access only
-----------------------------------------
## 🛠️ Useful Commands
```bash
# SSH into your instance
ssh -i /path/to/your/key root@
# Check container status
docker ps -a
# Check Docker containers log
cd /opt/ai-quickstart-anythingllm && docker compose logs -f
# Check systemd service status
systemctl status ai-quickstart-anythingllm.service
# View systemd service logs
journalctl -u ai-quickstart-anythingllm.service -n 100
# Check cloud-init logs
tail -f /var/log/cloud-init-output.log -n 100
# Restart all services
systemctl restart ai-quickstart-anythingllm.service
# Check NVIDIA GPU status
nvidia-smi
# Check vLLM loaded models
curl http://localhost:8000/v1/models
# Check AnythingLLM health
curl http://localhost:3001/api/ping
# Check container logs
docker logs vllm
docker logs anythingllm
docker logs embedding
docker logs pgvector
# Check embedding service health
curl http://localhost:8001/health
# Check pgvector status
docker exec pgvector pg_isready -U anythingllm
```
## 🤝 Contributing
Issues and pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.
## 📄 License
This project is licensed under the Apache License 2.0.