Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tzolov/ai-observability-demo

Last synced: 4 days ago
JSON representation

Host: GitHub
URL: https://github.com/tzolov/ai-observability-demo
Owner: tzolov
Created: 2024-06-09T06:37:43.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-08-25T06:57:35.000Z (3 months ago)
Last Synced: 2024-08-25T07:55:08.371Z (3 months ago)
Language: Java
Size: 1.42 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Read Me First

## RAG Example

Chat client call with Question/Answering (aka RAG) and ChatMemory configurations:

```java
var response = chatClient.prompt()
.user("How does Carina work?")
.advisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
.advisors(new PromptChatMemoryAdvisor(chatMemory))
.call()
.chatResponse();

```

The traces would looks like:

![ChatClient with RAG and ChatMemory](/doc/rag_with_memory.png "ChatClient with RAG and ChatMemory")

On call the `chat-client` calls:
* the `Before` `question-answer-advisor`.
Internally it uses the `pg-vector-store` to retrieve the similar documents and the `open-ai-embedding-model` to encode the input user question. Internally it uses the OpenAiApi REST client.
* the `Before` `prompt-chat-memory-advisor`to retrieve the history and store the user message.
* then the chat-client uses the `openai-chat-model` (gpt-4o) perform the chatcompletion request. Later delegates to inner REST client (OpenAiApi).
* finally it visits the `After` QA and memory advisors and returns the response.

## Function Calling Examples

Call the function `paymentStatus` for 3 different transaction.
But enforce non-parallel mode. So for each transaction the LLM will have to return and separate tool call massage.

```java
String response = chatClient.prompt()
.options(OpenAiChatOptions.builder()
.withParallelToolCalls(false).build())
.user("What is the status of my payment transactions 001, 002 and 003?")
.functions("paymentStatus")
.call()
.content();

```

This produces traces like:

![Function Calling non-parallel](/doc/funciton_calling_sequential.png "Function Calling non-parallel")

Diagram shows that 3 consecutive tool calls are performed before the final result.

If the parallel calling is enabled the diagram looks like this:

![Function Calling parallel](/doc/function_calling_parallel.png "Function Calling parallel")

## Random links

* https://docs.micrometer.io/micrometer/reference/observation/introduction.html
* https://programmingtechie.com/2023/09/09/spring-boot3-observability-grafana-stack/
* https://www.baeldung.com/spring-boot-3-observability
* https://medium.com/@ahmadalammar/comprehensive-observability-in-spring-boot-using-opentelemetry-prometheus-grafana-tempo-and-067196eee539
* https://medium.com/@ahmadalammar/comprehensive-observability-in-spring-boot-using-opentelemetry-prometheus-grafana-tempo-and-4d50c2f2b711
* https://medium.com/@ahmadalammar/comprehensive-observability-in-spring-boot-using-opentelemetry-prometheus-grafana-tempo-and-e08842ae96be

## Observability

- For streaming make sure to enable the context propagation:
`spring.reactor.context-propagation=auto`

- For percentiles histograms enable for the target metric
`management.metrics.distribution.percentiles-histogram.=true`

For example for AI model calls:
`management.metrics.distribution.percentiles-histogram.gen_ai.client.operation=true`