{"id":14964795,"url":"https://github.com/lxe/llavavision","last_synced_at":"2025-04-05T09:10:28.159Z","repository":{"id":205711196,"uuid":"714847075","full_name":"lxe/llavavision","owner":"lxe","description":"A simple \"Be My Eyes\" web app with a llama.cpp/llava backend","archived":false,"fork":false,"pushed_at":"2023-11-28T07:49:04.000Z","size":28551,"stargazers_count":489,"open_issues_count":2,"forks_count":32,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-03-29T08:09:53.696Z","etag":null,"topics":["ai","artificial-intelligence","computer-vision","llama","llamacpp","llm","local-llm","machine-learning","multimodal","webapp"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lxe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-06T00:48:45.000Z","updated_at":"2025-03-16T06:41:43.000Z","dependencies_parsed_at":"2023-11-28T08:48:35.981Z","dependency_job_id":null,"html_url":"https://github.com/lxe/llavavision","commit_stats":{"total_commits":5,"total_committers":2,"mean_commits":2.5,"dds":"0.19999999999999996","last_synced_commit":"623b00b02d7482aebdf00e747a6e45fb441d1990"},"previous_names":["lxe/llavavision"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lxe%2Fllavavision","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lxe%2Fllavavision/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lxe%2Fllavavision/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lxe%2Fllavavision/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lxe","download_url":"https://codeload.github.com/lxe/llavavision/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247312085,"owners_count":20918344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artificial-intelligence","computer-vision","llama","llamacpp","llm","local-llm","machine-learning","multimodal","webapp"],"created_at":"2024-09-24T13:33:47.614Z","updated_at":"2025-04-05T09:10:28.121Z","avatar_url":"https://github.com/lxe.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLaVaVision\n\n![Screenshot](screenshot.gif)\n\nA simple \"Be My Eyes\" web app with a llama.cpp/llava backend created in about an hour using ChatGPT, Copilot, and some minor help from me, [@lxe](https://twitter.com/lxe). It describes what it sees using [SkunkworksAI BakLLaVA-1](https://huggingface.co/SkunkworksAI/BakLLaVA-1) model via [llama.cpp](https://github.com/ggerganov/llama.cpp) and narrates the text using [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API).\n\nInspired by [Fuzzy-Search/realtime-bakllava](https://github.com/Fuzzy-Search/realtime-bakllava).\n\n## Getting Started\n\nYou will need a machine with about ~5 GB of RAM/VRAM for the q4_k version.\n\n### Set up the llama.cpp server\n\n(Optional) Install the CUDA toolkit:\n\n```shell\nsudo apt install nvidia-cuda-toolkit\n```\n\nBuild llama.cpp (build instructions for various platforms at [llama.cpp build](https://github.com/ggerganov/llama.cpp#build)):\n\n```shell\ngit clone https://github.com/ggerganov/llama.cpp\ncd llama.cpp\nmkdir build\ncd build\ncmake .. -DLLAMA_CUBLAS=ON # Remove the flag if CUDA is unavailable\ncmake --build . --config Release\n```\n\nDownload the models from [ggml_bakllava-1](https://huggingface.co/mys/ggml_bakllava-1/tree/main):\n\n```shell\nwget https://huggingface.co/mys/ggml_bakllava-1/resolve/main/mmproj-model-f16.gguf\nwget https://huggingface.co/mys/ggml_bakllava-1/resolve/main/ggml-model-q4_k.gguf # Choose another quant if preferred\n```\n\nStart the server (server options detailed [here](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md)):\n\n```shell\n./bin/server -m ggml-model-q4_k.gguf --mmproj mmproj-model-f16.gguf -ngl 35 -ts 100,0 # For GPU-only, single GPU\n# ./bin/server -m ggml-model-q4_k.gguf --mmproj mmproj-model-f16.gguf # For CPU\n```\n\n### Launch LLaVaVision\n\nClone and set up the environment:\n\n```shell\ngit clone https://github.com/lxe/llavavision\ncd llavavision\npython3 -m venv venv\n. ./venv/bin/activate\npip install -r requirements.txt\n```\n\nCreate dummy certificates and start the server. HTTPS is required for mobile video functionality:\n\n```shell\nopenssl req -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out cert.pem -keyout key.pem\nflask run --host=0.0.0.0 --key key.pem --cert cert.pem --debug\n```\n\nAccess https://your-machine-ip:5000 from your mobile device. Optionally, start a local tunnel with ngrok or localtunnel:\n\n```shell\nnpx localtunnel --local-https --allow-invalid-cert --port 5000\n```\n\n## Acknowledgements and Inspiration\n\n- [Fuzzy-Search/realtime-bakllava](https://github.com/Fuzzy-Search/realtime-bakllava)\n- [Multimodal LLama.cpp](https://github.com/ggerganov/llama.cpp/issues/3332)\n- [llava-vl.github.io](https://llava-vl.github.io/)\n- [SkunkworksAI/BakLLaVA-1](https://huggingface.co/SkunkworksAI/BakLLaVA-1)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flxe%2Fllavavision","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flxe%2Fllavavision","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flxe%2Fllavavision/lists"}