https://github.com/herrera-luis/vision-core-ai

Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
https://github.com/herrera-luis/vision-core-ai

bakllava llamacpp llava whisper-ai

Last synced: 6 months ago
JSON representation

Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.

Host: GitHub
URL: https://github.com/herrera-luis/vision-core-ai
Owner: herrera-luis
Created: 2023-11-05T18:06:26.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-11-06T12:25:25.000Z (almost 2 years ago)
Last Synced: 2025-04-24T05:09:01.699Z (6 months ago)
Topics: bakllava, llamacpp, llava, whisper-ai
Language: Python
Homepage:
Size: 12.7 KB
Stars: 46
Watchers: 3
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Vision Core AI

Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.

## Step 1: Install Llama C++ and package dependencies on your machine

Clone the Llama C++ repository from GitHub:
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
```
### On macOS:
Build with make:
```
make
```
Or, if you prefer cmake:
```
cmake --build . --config Release
```

### macOS requirements
you need to install these dependencies in your computer: ffmpeg and portaudio

```bash

brew install ffmpeg portaudio

```

Also be sure to provide permissions to the terminal in the Security & Privacy > Privacy options

## Step 2: Download the Model!
1. Download from Hugging Face - [mys/ggml_bakllava-1](https://huggingface.co/mys/ggml_bakllava-1/tree/main) this 2 files:
* ggml-model-q4_k.gguf (or any other quantized model) - only one is required!
* mmproj-model-f16.gguf

2. Copy the paths of those 2 files.
3. Run this in the llama.cpp repository (replace YOUR_PATH with the paths to the files you downloaded):

#### macOS
```
./server -m YOUR_PATH/ggml-model-q4_k.gguf --mmproj YOUR_PATH/mmproj-model-f16.gguf -ngl 1
```
#### Windows
```
server.exe -m REPLACE_WITH_YOUR_PATH\ggml-model-q4_k.gguf --mmproj REPLACE_WITH_YOUR_PATH\mmproj-model-f16.gguf -ngl 1

```
4. The llama server is now up and running!

⚠️ NOTE: Keep the server running in the background.
5. Let's run the script to use the webcam and microphone

## Step 3: Running the Demo
Open a new terminal window and clone the demo app:
```
git clone https://github.com/herrera-luis/vision-core-ai.git
cd vision-core-ai
```

### Install python dependencies

```bash

pip install -r requirements.txt
```

### Run the main script

```bash
python main.py
```

## How to interact with the app

When the application is running you need to press the keys `i` or `c` to enable the recording and a second time the same key to stop it

* `i` will use your webcam
* `c` will use chat

## Related project:

* [realtime-bakllava](https://github.com/Fuzzy-Search/realtime-bakllava)
* [llama.cpp](https://github.com/ggerganov/llama.cpp)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/herrera-luis/vision-core-ai

Awesome Lists containing this project

README