Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aflorithmic/viseme-to-video
Creates video from TTS output and viseme images.
https://github.com/aflorithmic/viseme-to-video
Last synced: 3 months ago
JSON representation
Creates video from TTS output and viseme images.
- Host: GitHub
- URL: https://github.com/aflorithmic/viseme-to-video
- Owner: aflorithmic
- License: mit
- Created: 2022-05-31T17:33:00.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-06-18T23:30:06.000Z (over 2 years ago)
- Last Synced: 2024-06-28T08:38:15.266Z (5 months ago)
- Language: Python
- Size: 1.41 MB
- Stars: 11
- Watchers: 1
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![viseme-to-video](https://user-images.githubusercontent.com/60350867/171479529-1d754e88-0934-45cd-a9ce-796e7aaa6534.png)
[![](https://circleci.com/gh/aflorithmic/viseme-to-video.svg?style=svg)](https://app.circleci.com/pipelines/github/aflorithmic/viseme-to-video?branch=main&filter=all) ![contributions-welcome](https://img.shields.io/badge/contributions-welcome-ff69b4) ![GitHub](https://img.shields.io/github/license/aflorithmic/viseme-to-video)
This Python module creates video from viseme images and TTS audio output. I created this for testing the sync accuracy between synthesised audio and duration predictions extracted from FastSpeech2 hidden states.
https://user-images.githubusercontent.com/60350867/172639184-0696ffbc-ca98-49b5-9831-33c420b0a5d9.mp4
https://user-images.githubusercontent.com/60350867/172639478-d795896e-88d1-4581-84dc-3ad01e7dfd7e.mp4
## Running viseme-to-video
To use this module, first install dependencies using by running the command:
`pip install -r requirements.txt`
The tool can be run directly from the command line using the command:`python viseme_to_video.py`
## Repo contentsThis repo contains the following resources:
### **image/**
Two image sets:
-- **speaker1/** from [Occulus developer doc 'Viseme reference'](https://developer.oculus.com/documentation/unity/audio-ovrlipsync-viseme-reference/ )
-- **mouth1/** adapted from icSpeech guide ['Mouth positions for English pronunciation'](https://icspeech.com/mouth-positions.html)A different viseme image directory can be specified on the command line using the flag `--im_dir`.
### **metadata/**
**24.json**: A viseme metadata JSON file we produced during FastSpeech2 inference by:- extracting the phoneme sequence produced by the text normalisation frontend module
- mapping this to a sequence of visemes
- extracting hidden state durations (in n frames) from FS2
- converting durations from frames to milliseconds using the formula
- writing this information (phoneme, viseme, duration, offset)The tool will automatically generate video for all JSON metadata files stored in the `metadata/` folder.
### **map/**
**viseme_map.json**: A JSON file containing mappings between the visemes in viseme metadata files and the image filenames. Mapping visemes was necessary since the viseme set we use to generate our metadata files contained upper/lower-case distinctions, which file naming doesn't support. (I.e. you can't have two files named 't.jpeg' and 'T.jpeg' stored in the same folder.)A different mapping file can be specified on the command line using the flag `--map`.
### **audio/**
**24.wav** - An audio sample generated from FastSpeech2 ([using kan-bayashi's ESPnet framework](https://github.com/espnet/espnet)). This sample uses a [Harvard sentence](https://harvardsentences.com/) as text input (list 3, sentence 5: 'The beauty of the view stunned the young boy').Audio can be toggled on/off with the argument `--no_audio`.