{"id":18000275,"url":"https://github.com/psychip/machina","last_synced_at":"2025-05-15T11:06:09.967Z","repository":{"id":257822842,"uuid":"869199614","full_name":"PsyChip/machina","owner":"PsyChip","description":"OpenCV+YOLO+LLAVA powered video surveillance system","archived":false,"fork":false,"pushed_at":"2025-02-23T20:33:04.000Z","size":324989,"stargazers_count":755,"open_issues_count":1,"forks_count":35,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-14T19:57:10.278Z","etag":null,"topics":["camera","llava","ollama-api","opencv","python","rtsp","yolo"],"latest_commit_sha":null,"homepage":"https://psychip.net","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PsyChip.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-07T22:16:17.000Z","updated_at":"2025-04-13T20:57:39.000Z","dependencies_parsed_at":"2025-03-10T05:28:10.651Z","dependency_job_id":"7c94639d-a389-4cc2-b8f0-cb596cfde724","html_url":"https://github.com/PsyChip/machina","commit_stats":{"total_commits":14,"total_committers":1,"mean_commits":14.0,"dds":0.0,"last_synced_commit":"89deccde268ad5b448135ef706a7745e0320baa7"},"previous_names":["psychip/machina"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PsyChip%2Fmachina","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PsyChip%2Fmachina/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PsyChip%2Fmachina/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PsyChip%2Fmachina/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PsyChip","download_url":"https://codeload.github.com/PsyChip/machina/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254328385,"owners_count":22052632,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["camera","llava","ollama-api","opencv","python","rtsp","yolo"],"created_at":"2024-10-29T23:10:54.986Z","updated_at":"2025-05-15T11:06:09.919Z","avatar_url":"https://github.com/PsyChip.png","language":"Python","funding_links":["https://ko-fi.com/psychip"],"categories":[],"sub_categories":[],"readme":"## MACHINA\r\nCCTV viewer with realtime object tagger [WIP]\r\n\r\n![partial screenshot](demo.png)\r\n\r\n### Uses\r\n- [LLAVA](https://llava-vl.github.io)\r\n- [YOLO 11](https://github.com/ultralytics/ultralytics)\r\n- [OpenCV](https://opencv.org)\r\n- [FAISS](https://github.com/facebookresearch/faiss)\r\n- [BLIP](https://github.com/salesforce/BLIP)\r\n- [CLIP](https://huggingface.co/openai/clip-vit-large-patch14)\r\n\r\n### V2 update\r\nNow it's able to generate realtime scene captions by using CLIP+BLIP together,\r\nBLIP generates captions from a cctv stream on each 30 frames, CLIP matches\r\nthe pre-generated text on every 10 frames.\r\n\r\nTested on RTX 3060, got 600ms avg for captioning and 47ms for caption matching\r\n-----------------\r\n\r\n### How it works\r\nSimply it connects to a high-resolution RTSP stream in a separate thread,\r\nqueues the frames into memory as it is and resamples it for processing.\r\n\r\nYOLO takes this frame, application gives a specific id based on it's coordinates,\r\nsize and timestamp then tries to match the same object on every iteration.\r\n\r\nAnother thread runs in background, iterates that object array continuously and\r\nmakes LLM requests to Ollama server for object tagging\r\n\r\n### Object matching\r\nIt calculates the center of every detection box, pinpoint on screen and gives 16px\r\ntolerance on all directions. Script tries to find closest object as fallback and\r\ncreates a new object in memory in last resort.\r\nYou can observe persistent objects in ```/elements``` folder \r\n\r\n### Test Environment\r\nEvery input frame resampled to 640x480 for processing, got avg 20ms interference time\r\nwith yolo 11 small model (yolo11s.pt) on Geforce GTX 1060 which is almost 7 years old\r\ngraphics card. Other models available in \"models\" directory\r\n\r\nStream delays by 1-2 seconds on every 10~ minutes due to network conditions, script also\r\nhave a frame skip mechanism on 3 seconds of detection idle.\r\n\r\n### Prerequisites\r\nMake sure you have all Visual C++ redistributables if you're running on windows\r\nhttps://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170\r\n\r\n### Installation\r\n- Install Python 3.12.x\r\n- Clone the repository\r\n- Install [ollama](https://ollama.com/) server\r\n- Pull the LLAVA model by running ```ollama run llava```\r\n- Install the dependencies by running ```pip install -r requirements.txt```\r\n- Remove pytorch cpu version and install the cuda version\r\n- Open ```app.py``` and set your rtmp stream address at line 18\r\n- Run the script ```py app.py```\r\n\r\n```sh\r\ngit clone https://github.com/PsyChip/machina\r\ncd machina\r\npip install -r requirements.txt\r\npip uninstall torch torchvision torchaudio\r\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\r\npy app.py\r\n```\r\n\r\n### Notes\r\n- It's okay to stick with cpu version of FAISS if you're using yolo nano/small/medium model\r\n- Change ```vsize``` value depending on your chosen yolo model, you need to delete the index when\r\nchanging vector size.\r\n- CUDA enabled torch is an absolute necessity for real time interference\r\n- Pretrained models of yolo is not so accurate on low-res streams, it's highly recommended to train\r\nyour own model by using object images from your ```/elements``` folder\r\n\r\n### Usage\r\n- S : snapshot, actual image from input stream\r\n- C : caption scene, save to folder along with snapshot\r\n- R : start/stop recording. it records what you see.\r\n- Q : quit app\r\n- left mouse: select\r\n- middle mouse: zoom\r\n- right mouse: pan\r\n\r\n### Project direction\r\nThis is a living project, trying to create a *complete* headless security system by\r\ntaking advantage of open source object detection models on my spare time.\r\n\r\n### TODO\r\n- Additional UI Layer\r\n- RTS style object selection box and detailed information about selected object(s)\r\n- People crowd, car crash, police, ambulance, running human detection [request]\r\n- Webhook callbacks on new object/disappeared object/movement after long stay\r\n\r\nFeel free to contribute with code, ideas or even maybe a little bit support\r\nvia ko-fi or bitcoin. I'll prioritize the feature requests for every $10 donation \r\n\r\n- [https://ko-fi.com/psychip](https://ko-fi.com/psychip)\r\n- BTC: ```bc1qlq067vldngs37l5a4yjc4wvhyt89wv3u68dsuv```\r\n\r\nCreated by PsyChip\r\n```root@psychip.net```\r\n\r\n.eof","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpsychip%2Fmachina","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpsychip%2Fmachina","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpsychip%2Fmachina/lists"}