https://github.com/lreimer/k8s-native-java-ai

Demos for Java-based AI on Kubernetes from Development to Deployment
https://github.com/lreimer/k8s-native-java-ai

envoy kubernetes langchain4j llm quarkus weaviate

Last synced: 19 days ago
JSON representation

Demos for Java-based AI on Kubernetes from Development to Deployment

Host: GitHub
URL: https://github.com/lreimer/k8s-native-java-ai
Owner: lreimer
License: mit
Created: 2024-12-02T10:31:48.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-03-31T15:40:20.000Z (8 months ago)
Last Synced: 2025-04-06T22:51:34.286Z (7 months ago)
Topics: envoy, kubernetes, langchain4j, llm, quarkus, weaviate
Language: Java
Homepage:
Size: 514 KB
Stars: 2
Watchers: 1
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Java-based AI on Kubernetes

Demo repository for Java-based AI on Kubernetes from Development to Deployment.

## Setup

```bash
# create the Kubernetes cluster in GCP with GPU support
# bootstrap AI platform components and services using Flux2
make create-gke-cluster
make bootstrap-flux2

# required to configure Config Connector with Google Cloud ProjectID
kubectl annotate namespace default cnrm.cloud.google.com/project-id="cloud-native-experience-lab"
```

## External Secrets using Google Cloud Security Manager

```bash
# credentials to access certain GCP infrastructure components are stored externally
# make sure that the Google Cloud Security Manager API is enabled in your project
make create-gke-es-sa

# if required change and apply the ClusterSecretStore CRD
# kubectl apply -f infrastructure/platform/external-secrets/secret-store.yaml

# this is how to create secrets in the Security Manager
gcloud secrets create external-secrets-sa --data-file=external-secrets-sa.json --replication-policy=automatic

# to demonstrate the secrets synchronization
kubectl apply -f infrastructure/platform/external-secrets/sa-secret.yaml
kubectl get secret gcp-sa-credentials -o jsonpath='{.data.external-secrets-sa\.json}' | base64 -d
```

## Building a chat service with Quarkus and OpenAI

```bash
# use the Quarkus starter to create a service skeleton
# select desired build system and dependencies
open https://code.quarkus.io

# for local development use the following commands
cd openai-chat-service
export QUARKUS_LANGCHAIN4J_OPENAI_API_KEY=$OPENAI_API_KEY
./gradlew quarkusDev

# interact with the service locally
http get localhost:8080/api/ask q=="Was macht QAware?"
http get localhost:8080/api/ask q=="What does QAware do?"
http get localhost:8080/api/ask q=="Was macht Microsoft?"
http get localhost:8080/api/ask q=="What is the sum of 40 and 2?"
http get localhost:8080/api/ask q=="What does QAware do? Send email to mlr@qaware.de with subject Information and response as message."

# this here is managed by Flux2
kubectl apply -k infrastructure/services/openai-chat-service/
kubectl get all
```

## Building an OpenAI Proxy using Envoy

The access to the OpenAI API is provided using a cluster internal Envoy based proxy.

```bash
# in order for the proxy to work ou have to manually create a Kubernetes secret
# that contains an OPENAI_API_KEY environment variable
kubectl create secret generic openai-api-key --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY

# to test the proxy, issue the following curl command
# exchange localhost with the actual LoadBalancer IP
curl http://localhost:10000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
```

## Building a chat service with Quarkus and Ollama

```bash
# this is 99% similar to the instructions of using Quarkus and OpenAI
# the only difference, use
# 'io.quarkiverse.langchain4j:quarkus-langchain4j-ollama:0.22.0'
# instead of
# 'io.quarkiverse.langchain4j:quarkus-langchain4j-openai:0.22.0'

# for local development use the following commands
ollama serve
ollama run llama3.1

cd ollama-chat-service
./gradlew quarkusDev

# interact with the service locally
http get localhost:8080/api/ask q=="Was macht die QAware GmbH?"

# this here is managed by Flux2
kubectl apply -k infrastructure/services/openai-chat-service/
kubectl get all
```

## Building AI Agent using Google ADK (Java)

```bash
cd hello-time-agent

mvn compile exec:java -Dexec.mainClass="com.example.agent.AgentCliRunner"

mvn compile exec:java \
-Dexec.mainClass="com.google.adk.web.AdkWebServer" \
-Dexec.args="--adk.agents.source-dir=target --server.port=8080"
```

## Deploying custom LLMs using Ollama Operator

```bash
# model deployment using CLI
kollama deploy llama3.1
kollama expose llama3.1 --service-name=ollama-model-llama31-lb --service-type=LoadBalancer

# model deployment via CRD
kubectl apply -f infrastructure/models/phi3.yaml
kollama expose phi3 --service-type LoadBalancer

# to start a chat with ollama
# exchange localhost with the actual LoadBalancer IP
OLLAMA_HOST=localhost:11434 ollama run phi3
OLLAMA_HOST=localhost:11434 ollama run llama3.1

# call the chat API of Ollama or OpenAI
# curl http://localhost:11434/v1/chat/completions
curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1",
"messages": [
{
"role": "user",
"content": "Say this is a test!"
}
]
}'
```

## Deploying Langchain4J Easy RAG Chatbot

```bash
# see https://docs.quarkiverse.io/quarkus-langchain4j/dev/easy-rag.html
# see https://github.com/quarkiverse/quarkus-langchain4j/tree/main/samples/chatbot-easy-rag
```

## Deploying Airbyte

```bash
# make sure to create the Airbyte service account
make create-gke-airbyte-sa

# bind the SA to your personal gCP user (optional)
gcloud iam service-accounts add-iam-policy-binding airbyte@$GCP_PROJECT.iam.gserviceaccount.com --member="user:mario-leander.reimer@qaware.de" --role="roles/iam.serviceAccountUser"

# now create the Airbyte secret with the relevant GCP credentials.
kubectl create secret generic airbyte-config-secrets -n airbyte --from-file=airbyte.json
```

## Planing Kubernetes Cluster Topology

```bash
# useful information to setup K8s cluster topology with GPUs

# see https://cloud.google.com/compute/docs/gpus?hl=de
# see https://cloud.google.com/compute/all-pricing?hl=de
# see https://cloud.google.com/compute/gpus-pricing?hl=de#other-gpu-models

# see https://cloud.google.com/compute/docs/gpus/create-gpu-vm-general-purpose?hl=de
# see https://cloud.google.com/compute/docs/disks/local-ssd?hl=de
```

## Maintainer

M.-Leander Reimer (@lreimer),

## License

This software is provided under the MIT open source license, read the `LICENSE`
file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lreimer/k8s-native-java-ai

Awesome Lists containing this project

README