https://github.com/skitsanos/streamlit-llama-vision
This application is a Streamlit-based web tool designed to interact with the Llama Vision model
https://github.com/skitsanos/streamlit-llama-vision
chat-application chatbot llama llama3 llama3-vision streamlit
Last synced: 6 months ago
JSON representation
This application is a Streamlit-based web tool designed to interact with the Llama Vision model
- Host: GitHub
- URL: https://github.com/skitsanos/streamlit-llama-vision
- Owner: skitsanos
- Created: 2025-01-18T09:17:40.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-01-18T09:18:50.000Z (9 months ago)
- Last Synced: 2025-03-29T00:51:10.482Z (7 months ago)
- Topics: chat-application, chatbot, llama, llama3, llama3-vision, streamlit
- Language: Python
- Homepage: https://gedankrayze.com/
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Llama Vision Debugger
This is a Streamlit-based web application designed to interact with the **Llama Vision model**, a multimodal AI model capable of processing both text and images. The application allows users to upload or paste an image, ask questions or provide prompts related to the image, and receive responses from the Llama Vision model.
#### Key Features:
- **Image Input**: Upload an image file (JPG, JPEG, PNG) or paste an image directly from your clipboard.
- **Image Display**: View the uploaded or pasted image in the application.
- **Chat Interface**: Enter text prompts to ask questions or provide instructions about the image.
- **Multimodal AI**: Send the image and prompt to the Llama Vision model (`llama3.2-vision`) for processing and receive detailed responses.
- **Error Handling**: Includes error messages for invalid inputs or processing failures.#### How It Works:
1. Provide an image by uploading a file or pasting from the clipboard.
2. Enter a text prompt related to the image.
3. The application processes the image and sends it along with the prompt to the Llama Vision model.
4. The model's response is displayed in the application.#### Example Use Cases:
- Describe the contents of an image.
- Identify objects or features in an image.
- Answer questions about the image (e.g., "What is the color of the car?" or "How many people are in the picture?").
- Debug and test the capabilities of the Llama Vision model.#### Technical Details:
- Built with **Streamlit** for the web interface.
- Uses **Pillow (PIL)** for image handling and **base64** for encoding image data.
- Integrates with the **Ollama** library to interact with the Llama Vision model.
- Supports image pasting via the `streamlit_paste_button` component.#### Requirements:
- Python 3.x
- Libraries: `streamlit`, `Pillow`, `ollama`, `streamlit_paste_button`#### How to Run:
1. Clone the repository.
2. Install the required libraries:
```bash
pip install streamlit Pillow ollama streamlit_paste_button
```
3. Run the Streamlit application:
```bash
streamlit run app.py
```
4. Open the provided URL in your browser to use the application.#### Notes:
- Ensure the Llama Vision model (`llama3.2-vision`) is available via the Ollama library.
- An image must be provided before sending a prompt.---
This tool is ideal for exploring the capabilities of multimodal AI models and debugging their performance with image and text inputs.