https://github.com/herrera-luis/vision-core-ai
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
https://github.com/herrera-luis/vision-core-ai
bakllava llamacpp llava whisper-ai
Last synced: 6 months ago
JSON representation
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
- Host: GitHub
- URL: https://github.com/herrera-luis/vision-core-ai
- Owner: herrera-luis
- Created: 2023-11-05T18:06:26.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-06T12:25:25.000Z (almost 2 years ago)
- Last Synced: 2025-04-24T05:09:01.699Z (6 months ago)
- Topics: bakllava, llamacpp, llava, whisper-ai
- Language: Python
- Homepage:
- Size: 12.7 KB
- Stars: 46
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Vision Core AI
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
## Step 1: Install Llama C++ and package dependencies on your machine
Clone the Llama C++ repository from GitHub:
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
```
### On macOS:
Build with make:
```
make
```
Or, if you prefer cmake:
```
cmake --build . --config Release
```### macOS requirements
you need to install these dependencies in your computer: ffmpeg and portaudio```bash
brew install ffmpeg portaudio
```
Also be sure to provide permissions to the terminal in the Security & Privacy > Privacy options
## Step 2: Download the Model!
1. Download from Hugging Face - [mys/ggml_bakllava-1](https://huggingface.co/mys/ggml_bakllava-1/tree/main) this 2 files:
* ggml-model-q4_k.gguf (or any other quantized model) - only one is required!
* mmproj-model-f16.gguf2. Copy the paths of those 2 files.
3. Run this in the llama.cpp repository (replace YOUR_PATH with the paths to the files you downloaded):#### macOS
```
./server -m YOUR_PATH/ggml-model-q4_k.gguf --mmproj YOUR_PATH/mmproj-model-f16.gguf -ngl 1
```
#### Windows
```
server.exe -m REPLACE_WITH_YOUR_PATH\ggml-model-q4_k.gguf --mmproj REPLACE_WITH_YOUR_PATH\mmproj-model-f16.gguf -ngl 1```
4. The llama server is now up and running!
⚠️ NOTE: Keep the server running in the background.
5. Let's run the script to use the webcam and microphone## Step 3: Running the Demo
Open a new terminal window and clone the demo app:
```
git clone https://github.com/herrera-luis/vision-core-ai.git
cd vision-core-ai
```### Install python dependencies
```bash
pip install -r requirements.txt
```### Run the main script
```bash
python main.py
```## How to interact with the app
When the application is running you need to press the keys `i` or `c` to enable the recording and a second time the same key to stop it
* `i` will use your webcam
* `c` will use chat## Related project:
* [realtime-bakllava](https://github.com/Fuzzy-Search/realtime-bakllava)
* [llama.cpp](https://github.com/ggerganov/llama.cpp)