https://github.com/smellslikeml/dolla_llama

Never forget the resource that helps to close that sales call! Power a real-time speech-to-text agent with retrieval augmented generation based on webscraped customer use-cases.
https://github.com/smellslikeml/dolla_llama

agent copilot llama llms rag stt

Last synced: 5 months ago
JSON representation

Never forget the resource that helps to close that sales call! Power a real-time speech-to-text agent with retrieval augmented generation based on webscraped customer use-cases.

Host: GitHub
URL: https://github.com/smellslikeml/dolla_llama
Owner: smellslikeml
Created: 2023-10-24T16:02:22.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-01-23T06:17:31.000Z (over 2 years ago)
Last Synced: 2026-01-12T18:57:45.205Z (5 months ago)
Topics: agent, copilot, llama, llms, rag, stt
Language: Python
Homepage: https://twitter.com/smellslikeml/status/1699471420076745104
Size: 956 KB
Stars: 13
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Dolla Llama: Real-Time Co-Pilot for Closing the Deal

Dolla Llama

Implements speech-to-text (STT) and retrieval-augmented generation (RAG) to assist live sales calls.

## 🌟 Features:
- STT with Whisper.cpp and llama.cpp for your LLM
- Custom embeddings for your text corpus using SentenceTransformers
- Indexing documents + embeddings with ElasticSearch

## Table of Contents
1. [Getting Started](#getting-started)
2. [Creating Custom Embeddings](#creating-custom-embeddings)
3. [Indexing with ElasticSearch](#indexing-with-elasticsearch)
4. [Interface with Gradio](#interface-with-gradio)
5. [Next Steps](#next-steps)

## Getting Started
This demo assumes you have:

- [docker](https://docs.docker.com/engine/install/) and [docker-compose](https://docs.docker.com/compose/install/) installed
- Familiarity with [RAG](https://stackoverflow.blog/2023/10/18/retrieval-augmented-generation-keeping-llms-relevant-and-current/) and its applications

### Setup
Make sure to convert your Llama model to `gguf` format with [llama.cpp](https://github.com/ggerganov/llama.cpp) for serving using their instructions.
Then save the model in a local directory named `models/`

Launch with:
```
docker-compose up
```

And navigate to `http://localhost:8090`

## Creating Custom Embeddings

By fine-tuning with SentenceTransformers, we can generate text embeddings locally for matching with documents in our Elasticsearch index.

The [scraper/main.py](scraper/main.py) script scrapes a list of sites to index. You can update the links in `scraper/config.json`

## Indexing with ElasticSearch

Using Elasticsearch, we can index and tag documents for filtering and customization of the relevance scoring.

The [scraper/main.py](scarper/main.py) script also handles this after scraping.

## Interface with Gradio

With Gradio, you press a button to begin and read suggestions in the chatbox.

The [app/app.py](app/app.py) contains the logic to run whisper for speech-to-text, run queries on the elasticsearch index, and launch the front-end.

## Next Steps

* Fine-tune an LLM for your usecase
* Add additional indices for query/retrieval
* Try a container orchestrator like k8s for robust distributed deployments

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/smellslikeml/dolla_llama

Awesome Lists containing this project

README