https://github.com/lukereichold/visual-speech-separation
Flask app to demo multimodal deep learning speech separation in videos via TensorFlow Serving
https://github.com/lukereichold/visual-speech-separation
3d-convolutional-network computer-vision convolutional-neural-networks deep-learning flask multimodal-deep-learning multisensory speech-separation tensorflow-serving
Last synced: about 1 month ago
JSON representation
Flask app to demo multimodal deep learning speech separation in videos via TensorFlow Serving
- Host: GitHub
- URL: https://github.com/lukereichold/visual-speech-separation
- Owner: lukereichold
- License: apache-2.0
- Created: 2020-01-19T21:55:19.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T03:28:33.000Z (over 3 years ago)
- Last Synced: 2025-10-11T07:52:59.858Z (6 months ago)
- Topics: 3d-convolutional-network, computer-vision, convolutional-neural-networks, deep-learning, flask, multimodal-deep-learning, multisensory, speech-separation, tensorflow-serving
- Language: Python
- Homepage:
- Size: 20.8 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# About Basis
Basis is a proof-of-concept web app which provides an interactive demonstration of separating on/off-screen audio sources for a given video.
It leverages the [speech separation model created by Andrew Owens et al.](http://andrewowens.com/multisensory/) used for separating on / off-screen audio sources. This project is based upon [open-source code and models](https://github.com/andrewowens/multisensory) licensed under the Apache License 2.0.
I built this as an opportunity to learn:
- Implementation details of a "legacy" TensorFlow 1.x model
- How to freeze, inspect, and host a model using TF-Serving
- How to perform real-time inferencing on video from a public web app