{"id":13578552,"url":"https://github.com/floe/backscrub","last_synced_at":"2025-04-13T04:59:01.859Z","repository":{"id":37384820,"uuid":"253624294","full_name":"floe/backscrub","owner":"floe","description":"Virtual Video Device for Background Replacement with Deep Semantic Segmentation","archived":false,"fork":false,"pushed_at":"2023-01-04T18:48:22.000Z","size":15283,"stargazers_count":738,"open_issues_count":40,"forks_count":87,"subscribers_count":26,"default_branch":"main","last_synced_at":"2025-04-13T04:58:56.207Z","etag":null,"topics":["body-pix","bodypix","cpp","deep-learning","deeplab","deeplabv3","mediapipe","opencv","python","tensorflow","tflite","video"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/floe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-06T21:53:59.000Z","updated_at":"2025-04-09T12:44:29.000Z","dependencies_parsed_at":"2023-02-02T20:01:27.620Z","dependency_job_id":null,"html_url":"https://github.com/floe/backscrub","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/floe%2Fbackscrub","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/floe%2Fbackscrub/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/floe%2Fbackscrub/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/floe%2Fbackscrub/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/floe","download_url":"https://codeload.github.com/floe/backscrub/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248665759,"owners_count":21142123,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["body-pix","bodypix","cpp","deep-learning","deeplab","deeplabv3","mediapipe","opencv","python","tensorflow","tflite","video"],"created_at":"2024-08-01T15:01:31.748Z","updated_at":"2025-04-13T04:59:01.836Z","avatar_url":"https://github.com/floe.png","language":"C++","funding_links":[],"categories":["C++","\u003ca name=\"cpp\"\u003e\u003c/a\u003eC++"],"sub_categories":[],"readme":"# BackScrub\n(or The Project Formerly Known As DeepBackSub)\n\n## Virtual Video Device for Background Replacement with Deep Semantic Segmentation\n\n![Screenshots with my stupid grinning face](backgrounds/screenshot.jpg)\n(Credits for the nice backgrounds to [Mary Sabell](https://dribbble.com/shots/4686178-Bauhaus-Poster) and [PhotoFunia](https://photofunia.com/effects/retro-wave))\n\n## Maintainers\n\n  * Phil Ashby ([@phlash](https://github.com/phlash))\n  * Benny Baumann ([@BenBE](https://github.com/BenBE))\n  * Florian Echtler ([@floe](https://github.com/floe))\n\n## License\n\nbackscrub is licensed under the Apache License 2.0. See LICENSE file for details.\n\n## Building\n\nInstall dependencies (`sudo apt install libopencv-dev build-essential v4l2loopback-dkms curl`).\n\nClone this repository with `git clone --recursive https://github.com/floe/backscrub.git`.\nTo speed up the checkout you can additionally pass `--depth=1` to `git clone`.\nThis is okay, if you only want to download and build the code, however, for development it is not recommended.\n\nUse `cmake` to build the project: create a subfolder (e.g. `build`), change to that folder and run: `cmake .. \u0026\u0026 make -j $(nproc || echo 4)`.\n\n**Deprecated**: Another option to build everything is to run `make` in the root directory of the repository. While this will download and build all dependencies, it comes with a few drawbacks like missing support for XNNPACK. Also this might break with newer versions of Tensorflow Lite as upstream support for this option has been removed. Use at you own risk.\n\n## Usage\n\nFirst, load the v4l2loopback module (extra settings needed to make Chrome work):\n```\nsudo modprobe v4l2loopback devices=1 max_buffers=2 exclusive_caps=1 card_label=\"VirtualCam\" video_nr=10\n```\nThen, run backscrub (-d -d for full debug, -c for capture device, -v for virtual device, -b for wallpaper):\n```\n./backscrub -d -d -c /dev/video0 -v /dev/video10 -b ~/wallpapers/forest.jpg\n```\n\nSome cameras (like e.g. `Logitec Brio`) need to switch the video source to `MJPG` by passing `-f MJPG` in order for higher resolutions to become available for use.\n\nFor regular usage, setup a configuration file `/etc/modprobe.d/v4l2loopback.conf`:\n```\n# V4L loopback driver\noptions v4l2loopback max_buffers=2\noptions v4l2loopback exclusive_caps=1\noptions v4l2loopback video_nr=10\noptions v4l2loopback card_label=\"VirtualCam\"\n```\nTo auto-load the driver on startup, create `/etc/modules-load.d/v4l2loopback.conf` with the following content:\n```\nv4l2loopback\n```\n\n## Requirements\n\nTested with the following dependencies:\n\n  - Ubuntu 20.04, x86-64\n    - Linux kernel 5.6 (stock package)\n    - OpenCV 4.2.0 (stock package)\n    - V4L2-Loopback 0.12.5 (stock package)\n    - Tensorflow Lite 2.5.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.5.0/tensorflow/lite))\n  - Ubuntu 18.04.5, x86-64\n    - Linux kernel 4.15 (stock package)\n    - OpenCV 3.2.0 (stock package)\n    - V4L2-Loopback 0.10.0 (stock package)\n    - Tensorflow Lite 2.1.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.1.0/tensorflow/lite))\n  \nTested with the following software:\n\n  - Firefox \n    - 90.0.2 (works)\n    - 84.0   (works)\n    - 76.0.1 (works)\n    - 74.0.1 (works)\n  - Skype \n    - 8.67.0.96 (works)\n    - 8.60.0.76 (works)\n    - 8.58.0.93 (works)\n  - guvcview\n    - 2.0.6 (works with parameter `-c read`)\n    - 2.0.5 (works with parameter `-c read`)\n  - Microsoft Teams\n    - 1.3.00.30857 (works)\n    - 1.3.00.5153 (works)\n    - 1.4.00.26453 (works)\n  - Chrome\n    - 87.0.4280.88 (works)\n    - 81.0.4044.138 (works)\n  - Zoom - yes, I'm a hypocrite, I tested it with Zoom after all :-)\n    - 5.4.54779.1115 (works)\n    - 5.0.403652.0509 (works)\n\n## Background\n\nIn these modern times where everyone is sitting at home and skype-ing/zoom-ing/webrtc-ing all the time, I was a bit annoyed about always showing my messy home office to the world. Skype has a \"blur background\" feature, but that starts to get boring after a while (and it's less private than I would personally like). Zoom has some background substitution thingy built-in, but I'm not touching that software with a bargepole (and that feature is not available on Linux anyway). So I decided to look into how to roll my own implementation without being dependent on any particular video conferencing software to support this.\n\nThis whole shebang involves three main steps with varying difficulty:\n  - find person in video (hard)\n  - replace background (easy)\n  - pipe data to virtual video device (medium)\n\n## Finding person in video\n\n### Attempt 0: Depth camera (Intel Realsense)\n\nI've been working a lot with depth cameras previously, also for background segmentation (see [SurfaceStreams](https://github.com/floe/surface-streams)), so I just grabbed a leftover RealSense camera from the lab and gave it a shot. However, the depth data in a cluttered office environment is quite noisy, and no matter how I tweaked the camera settings, it could not produce any depth data for my hair...? I looked like a medieval monk who had the top of his head chopped off, so ... next.\n\n### Attempt 1: OpenCV BackgroundSubtractor\n\nSee https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html for tutorial.\nShould work OK for mostly static backgrounds and small moving objects, but does not work for a mostly static person in front of a static background. Next.\n\n### Attempt 2: OpenCV Face Detector\n\nSee https://docs.opencv.org/3.4/db/d28/tutorial_cascade_classifier.html for tutorial.\nWorks okay-ish, but obviously only detects the face, and not the rest of the person. Also, only roughly matches an ellipse which is looking rater weird in the end. Next.\n\n### Attempt 3: Deep learning!\n\nI've heard good things about this deep learning stuff, so let's try that. I first had to find my way through a pile of frameworks (Keras, Tensorflow, PyTorch, etc.), but after I found a ready-made model for semantic segmentation based on Tensorflow Lite ([DeepLab v3+](https://tfhub.dev/tensorflow/lite-model/deeplabv3/1/default/1)), I settled on that.\n\nI had a look at the corresponding [Python example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py), [C++ example](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/examples/label_image), and [Android example](https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation/android), and based on those, I first cobbled together a [Python demo](https://github.com/floe/backscrub/blob/master/deepseg.py). That was running at about 2.5 FPS, which is really excruciatingly slow, so I built a [C++ version](https://github.com/floe/backscrub/blob/master/deepseg.cc) which manages 10 FPS without too much hand optimization. Good enough.\n\nI've also tested a TFLite-converted version of the [Body-Pix model](https://blog.tensorflow.org/2019/11/updated-bodypix-2.html), but the results haven't been much different to DeepLab for this use case.\n\nMore recently, Google has released a model specifically trained for [person segmentation that's used in Google Meet](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html). This has way better performance than DeepLab, both in terms of speed and of accuracy, so this is now the default. It needs one custom op from the MediaPipe framework, but that was quite easy to integrate. Thanks to @jiangjianping for pointing this out in the [corresponding issue](https://github.com/floe/backscrub/issues/28).\n\n## Replace Background\n\nThis is basically one line of code with OpenCV: `bg.copyTo(raw,mask);` Told you that's the easy part.\n\n## Virtual Video Device\n\nI'm using [v4l2loopback](https://github.com/umlaeute/v4l2loopback) to pipe the data from my userspace tool into any software that can open a V4L2 device. This isn't too hard because of the nice examples, but there are some catches, most notably color space. It took quite some trial and error to find a common pixel format that's accepted by Firefox, Skype, and guvcview, and that is [YUYV](https://www.linuxtv.org/downloads/v4l-dvb-apis-old/V4L2-PIX-FMT-YUYV.html). Nicely enough, my webcam can output YUYV directly as raw data, so that does save me some colorspace conversions.\n\n## End Result\n\nThe dataflow through the whole program is roughly as follows:\n\n  - init\n    - load background.png, convert to YUYV\n    - initialize TFLite, register custom op\n    - load Google Meet segmentation model\n    - setup V4L2 Loopback device (w,h,YUYV)\n  - loop\n    - grab raw YUYV image from camera\n    - extract portrait ROI in center\n      - downscale ROI to 144 x 256 (*)\n      - convert to RGB float32 (*)\n      - run Google Meet segmentation model \n      - convert result to binary mask using softmax\n      - denoise mask using erode/dilate\n    - upscale mask to raw image size\n    - copy background over raw image with mask (see above)\n    - `write()` data to virtual video device\n\n(*) these are required input parameters for this model\n\n## Limitations/Extensions\n\nAs usual: pull requests welcome.\n\nSee [Issues](https://github.com/floe/backscrub/issues) and [Pull Requests](https://github.com/floe/backscrub/pulls) for currently discussed/in-progress extensions, and also check out the `experimental` branch.\n\n## Fixed\n  \n  - The project name isn't catchy enough. Help me find a nice [backronym](https://en.wikipedia.org/wiki/Backronym).\n  - Resolution is currently hardcoded to 640x480 (lowest common denominator).\n  - Only works with Linux, because that's what I use.\n  - Needs a webcam that can produce raw YUYV data (but extending to the common YUV420 format should be trivial)\n  - Should probably do a erosion (+ dilation?) operation on the mask.\n  - Background image size needs to match camera resolution (see issue #1).\n  - CPU hog: maxes out two cores on my 2.7 GHz i5 machine for just VGA @ 10 FPS. Fixed via Google Meet segmentation model.\n  - Uses stock Deeplab v3+ network. Maybe re-training with only \"person\" and \"background\" classes could improve performance? Fixed via Google Meet segmentation model.\n\n## Other links\n\nFirefox preferred formats: https://searchfox.org/mozilla-central/source/third_party/libwebrtc/webrtc/modules/video_capture/linux/video_capture_linux.cc#142-159\n\n## Feeding obs-studio\n\n[We have been notified](https://github.com/floe/backscrub/issues/105) that some snap packaged versions of `obs-studio` are unable to detect/use a virtual camera as provided by `backscrub`. Please check the details for workarounds if this applies to you.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffloe%2Fbackscrub","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffloe%2Fbackscrub","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffloe%2Fbackscrub/lists"}