{"id":20737454,"url":"https://github.com/candlewill/ossian","last_synced_at":"2025-09-09T01:43:10.484Z","repository":{"id":71816621,"uuid":"111872022","full_name":"candlewill/Ossian","owner":"candlewill","description":"Ossian: A simple language-independent Text-to-speech frontend","archived":false,"fork":false,"pushed_at":"2018-03-01T05:20:24.000Z","size":886,"stargazers_count":17,"open_issues_count":2,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-08-10T21:46:34.417Z","etag":null,"topics":["frontend","text-to-speech","tts"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/candlewill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-11-24T03:29:02.000Z","updated_at":"2023-10-27T07:25:56.000Z","dependencies_parsed_at":"2023-06-11T01:00:17.831Z","dependency_job_id":null,"html_url":"https://github.com/candlewill/Ossian","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/candlewill/Ossian","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/candlewill%2FOssian","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/candlewill%2FOssian/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/candlewill%2FOssian/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/candlewill%2FOssian/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/candlewill","download_url":"https://codeload.github.com/candlewill/Ossian/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/candlewill%2FOssian/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274232023,"owners_count":25245856,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["frontend","text-to-speech","tts"],"created_at":"2024-11-17T06:14:33.396Z","updated_at":"2025-09-09T01:43:10.467Z","avatar_url":"https://github.com/candlewill.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ossian + DNN demo\n\nOssian is a collection of Python code for building text-to-speech (TTS) systems, with an emphasis on easing research into building TTS systems with minimal expert supervision. Work on it started with funding from the [EU FP7 Project Simple4All](http://simple4all.org), and this repository contains a version which is considerable more up-to-date than that previously available. In particular, the original version of the toolkit relied on [HTS](http://hts.sp.nitech.ac.jp/) to perform acoustic modelling. Although it is still possible to use HTS, it now supports the use of neural nets trained with the [Merlin toolkit](https://github.com/CSTR-Edinburgh/merlin) as duration and acoustic models.  All\ncomments and feedback about ways to improve it are very welcome.\n\nHere is some Chinese document. 一些中文文档和总结可以发现于：[Chinese Ossian Doc](https://gist.github.com/candlewill/8141bbe9d6c4c6224be8d3b4c07723eb).\n\n## Dependencies\n\n**Perl 5** is required.\n\n**Python 2.7** is required.\n\nUse the ```pip``` package installer -- within a [Python ```virtualenv```](https://virtualenv.pypa.io/en/stable/) as necessary -- to get some necessary packages:\n\n```\npip install numpy\npip install scipy\npip install configobj\npip install scikit-learn\npip install regex\npip install lxml\npip install argparse\n```\n\nWe will use the Merlin toolkit to train neural networks, creating the following dependencies:\n\n```\npip install bandmat \npip install theano\npip install matplotlib\n```\n\nWe will use `sox` to process speech data:\n\n```\napt-get install sox\n```\n\n## Getting the tools\n\n\nClone the Ossian github repository as follows:\n\n```\ngit clone https://github.com/candlewill/Ossian.git\n```\n\nThis will create a directory called ```./Ossian```; \nthe following discussion assumes that an environment\nvariable ```$OSSIAN``` is set to point to this directory.\n\n### Install from scratch\n\nOssian relies on the [Hidden Markov Model Toolkit (HTK)](http://htk.eng.cam.ac.uk) and [HMM-based Speech Synthesis System (HTS)](http://hts.sp.nitech.ac.jp/)\nfor alignment and (optionally) acoustic modelling -- here are some notes on obtaining and compiling the necessary tools. \nTo get a copy of the HTK source code it\nis necessary to register on the [HTK website](http://htk.eng.cam.ac.uk/register.shtml) to obtain a \nusername and password. It is here assumed that these have been obtained and the environment\nvariables ```$HTK_USERNAME``` and ```$HTK_PASSWORD``` point to them.\n\n\nRunning the following script will download and install the necessary tools (including Merlin):\n\n```\n./scripts/setup_tools.sh $HTK_USERNAME $HTK_PASSWORD\n```\n\nThe script `./scripts/setup_tools.sh` will do the following things:\n\n* clones down the Merlin repo to `$OSSIAN/tools/merlin`, and resets its head to `8aed278`\n* cd into the `merlin/tools/WORLD/` folder, and build it, then copy `analysis` and `synth` into `$OSSIAN/tools/bin/`:\n    ```shell\n    cd $OSSIAN/tools/merlin/tools/WORLD/\n    make -f makefile\n    make -f makefile analysis\n    make -f makefile synth\n    mkdir -p $OSSIAN/tools/bin/\n    cp $OSSIAN/tools/merlin/tools/WORLD/build/{analysis,synth} $OSSIAN/tools/bin/\n    ```\n* Download HTK, HDecode, HTS, and apply HTS patch. Build HTK, and install it to `$OSSIAN/tools/` folder.\n* Download hts-engine, and install it to `$OSSIAN/tools/`\n* Download SPTK, and install it to `$OSSIAN/tools/`\n* The `g2p-r1668-r3` and `corenlp-python` packages would be installed if you changed the value of `SEQUITUR`, `STANFORD` from 0 to 1.\n\nAs all the tools are installed into `$OSSIAN/tools/` directory, the `$OSSIAN/tools/bin` directory would include all the binaries used by Ossian.\n\n\n### Install from pre-built\n\nIf you have installed the above mentioned tools manually and don't want to install from scratch, you can make soft link to tell the Ossian where you have installed these tools.\n\n```shell\n# 1 Mannuly clone the merlin repo\n# 2 Downlaod WORLD, HTK, HDecode, HTS, HTS-engine, SPTK, build and install.\n# 3 Copy all of the binaries into one folder. E.g., bin.\n\n# 3 Where is your merlin dir\nexport merlin_dir=/home/dl80/heyunchao/Programs/Ossian/tools/merlin\n# 4 Where is the bin direcotry inculuding all the binaries\nexport bin_dir=/home/dl80/heyunchao/Programs/Ossian/tools/bin\n\n# 5 Create soft link in your Ossian/tools direcotry\ncd /home/dl80/heyunchao/Programs/MyOssian_Github/tools\nln -s $merlin_dir merlin\nln -s $bin_dir bin\n```\n\nWe provide a pre-built binary collection here [Ossian_required_bin.tar](https://cnbj1.fds.api.xiaomi.com/tts/ExternalLink/Github/Ossian_required_bin.tar.gz). Download and move to the `$bin_dir` directory, if someone doesn't want to build for scratch. \n\n## Acquire some data\n\nOssian expects its training data to be in the directories:\n\n```\n ./corpus/\u003cOSSIAN_LANG\u003e/speakers/\u003cDATA_NAME\u003e/txt/*.txt\n ./corpus/\u003cOSSIAN_LANG\u003e/speakers/\u003cDATA_NAME\u003e/wav/*.wav\n```\n\nText and wave files should be numbered consistently with each other. ```\u003cOSSIAN_LANG\u003e``` and ```\u003cDATA_NAME\u003e``` are both arbitrary strings, but it is sensible to choose ones which make obvious sense. \n\nDownload and unpack this toy (Romanian) corpus for some guidance:\n\n```\ncd $OSSIAN\nwget https://www.dropbox.com/s/uaz1ue2dked8fan/romanian_toy_demo_corpus_for_ossian.tar?dl=0\ntar xvf romanian_toy_demo_corpus_for_ossian.tar\\?dl\\=0\n```\n\nThis will create the following directory structures:\n\n```\n./corpus/rm/speakers/rss_toy_demo/\n./corpus/rm/text_corpora/wikipedia_10K_words/\n```\n\nLet's start by building some voices on this tiny dataset. The results will sound bad, but if you can get it to speak, no matter how badly, the tools are working and you can retrain on more data of your own choosing. Below are instructions on how to train HTS-based and neural network based voices on this data. \n\nYou can download 1 hour sets of data in various languages we prepared here: http://tundra.simple4all.org/ssw8data.html\n\n## DNN-based voice using a naive recipe\n\nOssian trains voices according to a given 'recipe' -- the recipe specifies a sequence of processes which are applied to an utterance to turn it from text into speech, and is given in a file called ```$OSSIAN/recipes/\u003cRECIPE\u003e.cfg``` (where ```\u003cRECIPE\u003e``` is the name of a the specific recipe you are using). We will start with a recipe called ```naive_01_nn```. If you want to add components to the synthesiser, the best way to start will be to take the file for an existing recipe, copy it to a file with a new name and modify it.\n\nThe recipe ```naive_01_nn``` is a language independent recipe which naively uses letters as acoustic modelling units. It will work reasonably for languages with sensible orthographies (e.g. Romanian) and less well for e.g. English.\n\nOssian will put all files generated during training on the data ```\u003cDATA_NAME\u003e``` in language ```\u003cOSSIAN_LANG\u003e``` according to recipe ```\u003cRECIPE\u003e``` in a directory called:\n\n```\n $OSSIAN/train/\u003cOSSIAN_LANG\u003e/speakers/\u003cDATA_NAME\u003e/\u003cRECIPE\u003e/\n```\n\nWhen if has successfully trained a voice, the components needed at synthesis are copied to:\n\n```\n $OSSIAN/voices/\u003cOSSIAN_LANG\u003e/\u003cDATA_NAME\u003e/\u003cRECIPE\u003e/\n```\n\nAssuming that we want to start by training a voice from scratch, we might want to check that these locations do not already exist for our combination of data/language/recipe:\n\n```\nrm -r $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/ $OSSIAN/voices/rm/rss_toy_demo/naive_01_nn/\n```\n\nThen to train, do this:\n\n```\ncd $OSSIAN\npython ./scripts/train.py -s rss_toy_demo -l rm naive_01_nn\n```\n\nAs various messages printed during training will inform you, training of the neural networks themselves which will be used for duration and acoustic modelling is not directly supported within Ossian. The data and configs needed to train networks for duration and acoustic model are prepared by the above command line, but the Merlin toolkit needs to be called separately to actually train the models. The NNs it produces then need to be converted back to a suitable format for Ossian. This is a little messy, but better integration between Ossian and Merlin is an ongoing area of development. \n\nHere's how to do this -- these same instructions will have been printed when you called ```./scripts/train.py``` above. First, train the duration model:\n\n```\ncd $OSSIAN\nexport THEANO_FLAGS=\"\"; python ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor/config.cfg\n```\n\nFor this toy data, training on CPU like this will be quick. Alternatively, to use GPU for training, do:\n\n```\n./scripts/util/submit.sh ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor/config.cfg\n```\n\nIf training went OK, then you can export the trained model to a better format for Ossian. The basic problem is that the NN-TTS tools store the model as a Python pickle file -- if this is made on a GPU machine, it can only be used on a GPU machine. This script converts to a more flexible format understood by Ossian -- call it with the same config file you used for training and the name of a directory when the new format should be put:\n\n```\npython ./scripts/util/store_merlin_model.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/duration_predictor/config.cfg $OSSIAN/voices/rm/rss_toy_demo/naive_01_nn/processors/duration_predictor\n```\n\nWhen training the duration model, there will be loads of warnings saying ```WARNING: no silence found!``` --  theses are not a problem and can be ignored.\n\nSimilarly for the acoustic model:\n\n```\ncd $OSSIAN\nexport THEANO_FLAGS=\"\"; python ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor/config.cfg\n```\n\nOr:\n\n```\n./scripts/util/submit.sh ./tools/merlin/src/run_merlin.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor/config.cfg\n```\n\nThen:\n\n```\npython ./scripts/util/store_merlin_model.py $OSSIAN/train/rm/speakers/rss_toy_demo/naive_01_nn/processors/acoustic_predictor/config.cfg $OSSIAN/voices/rm/rss_toy_demo/naive_01_nn/processors/acoustic_predictor\n```\n\n\n\nIf training went OK, you can synthesise speech. There is an example Romanian sentence in ```$OSSIAN/test/txt/romanian.txt``` -- we will synthesise a wave file for it in ```$OSSIAN/test/wav/romanian_toy_naive.wav``` like this:\n\n```\nmkdir $OSSIAN/test/wav/\n\npython ./scripts/speak.py -l rm -s rss_toy_demo -o ./test/wav/romanian_toy_HTS.wav naive_01_nn ./test/txt/romanian.txt\n```\n\nYou can find the audio for this sentence [here](https://www.dropbox.com/s/xm9d7j7125y6j13/romanian_test_sentence_reference.wav?dl=0) for comparison (it was not used in training).\n\nThe configuration files used for duration and acoustic model training will work as-is for the toy data set, but when you move to other data sets, you will want to experiment with editing them to get better permformance.\nIn particular, you will want to increase training_epochs to train voices on larger amounts of data; this could be set to e.g. 30 for the acoustic model and e.g. 100 for the duration model.\nYou will also want to experiment with learning_rate, batch_size, and network architecture (hidden_layer_size, hidden_layer_type). Currently, Ossian only supports feed-forward networks.\n\n## Synthesis\n\nThe command to synthesis new wave given text as input is:\n\n```shell\npython ./scripts/speak.py -l $OSSIAN_LANG -s $DATA_NAME -o ./test/wav/${OSSIAN_LANG}_${DATA_NAME}_test.wav $RECIPE ./test/txt/test.txt\n```\n\nWhere `./test/wav/${OSSIAN_LANG}_${DATA_NAME}_test.wav` and `$RECIPE ./test/txt/test.txt` are the synthesized wave and input text. \n\nThe complete usage of `speak.py` is:\n\n```shell\nusage: speak.py [-h] -s SPEAKER -l LANG [-o OUTPUT] [-t STAGE] [-play] [-lab]\n                [-bin CUSTOM_BINDIR]\n                config [files [files ...]]\n\npositional arguments:\n  config              configuration to use: naive, semi-naive, gold, as\n                      defined in \u003cROOT\u003e/recipes/\u003cconfig\u003e -directory\n  files               text files to speak, reading from stdin by default\n\noptional arguments:\n  -h, --help          show this help message and exit\n  -s SPEAKER          the name of the speaker: \u003cROOT\u003e/corpus/\u003cLANG\u003e/\u003cSPEAKER\u003e\n  -l LANG             the language of the speaker: \u003cROOT\u003e/corpus/\u003cLANG\u003e\n  -o OUTPUT           output audio here\n  -t STAGE            defines the current usage stage (definitions of stages\n                      should by found in \u003cconfig\u003e/recipe.cfg\n  -play               play audio after synthesis\n  -lab                make label file as well as wave in output location\n  -bin CUSTOM_BINDIR\n```\n\nIf you want to export your `pre-trained` model, you should pack up the following files:\n\n1. `voice/` \n2. `train/cn/speakers/king_cn_corpus/naive_01_nn.cn/questions_dur.hed.cont`\n3. `train/cn/speakers/king_cn_corpus/naive_01_nn.cn/questions_dur.hed`\n4. `train//cn/speakers/king_cn_corpus/naive_01_nn.cn/questions_dnn.hed.cont`\n\nThen, after you put them to the right directory, someone else could use your model to synthesis given text.\n\n## Pre-trained Model\n\nHere, We provide a simple pre-trained model for Chinese TTS. As the model is trained on a limited small inner corpus for testing, the quality of the synthesized voice is not very good. \n\nSimple Pre-trained Chinese Model: [Ossian_cn_pretrained_model.tar.gz](https://cnbj1.fds.api.xiaomi.com/tts/ExternalLink/Github/Ossian_cn_pretrained_model.tar.gz)\n\nSome samples generated from this model could be found here: [Ossian_Chinese_samples.zip](https://cnbj1.fds.api.xiaomi.com/tts/ExternalLink/Github/Ossian_Chinese_samples.zip)\n\n## Latest Merlin Repo\n\nIf you want to use the latest merlin repo, it is possible now. However, when export model some `files no exist` error would occurs. You could manually copy the corresponding files to the right folder to deal with it. These files are existed after training, but not in the right directory. You could use `find -name *.dat` to find where they are.\n\nHere is a example:\n\n```shell\n# Duration model\ncp ./train/cn/speakers/toy_cn_corpus/naive_01_nn.cn/dnn_training_ACOUST/inter_module/norm_info__mgc_lf0_vuv_bap_187_MVN.dat /root/Ossian/train/cn/speakers/toy_cn_corpus/naive_01_nn.cn//cmp//norm_info_mgc_lf0_vuv_bap_187_MVN.dat\n\ncp ./train/cn/speakers/toy_cn_corpus/naive_01_nn.cn/dnn_training_ACOUST/inter_module/label_norm_HTS_3491.dat /root/Ossian/train/cn/speakers/toy_cn_corpus/naive_01_nn.cn//cmp//label_norm_HTS_3491.dat\n\ncp ./train/cn/speakers/toy_cn_corpus/naive_01_nn.cn/dnn_training_ACOUST/nnets_model/feed_forward_6_tanh.model /root/Ossian/train/cn/speakers/toy_cn_corpus/naive_01_nn.cn//dnn_training_ACOUST//nnets_model/DNN_TANH_TANH_TANH_TANH_TANH_TANH_LINEAR__mgc_lf0_vuv_bap_0_6_1024_1024_1024_1024_1024_1024_3491.187.train.243.0.002000.rnn.model\n\n# Acoustic model\ncp ./train/cn/speakers/toy_cn_corpus/naive_01_nn.cn/dnn_training_DUR/inter_module/norm_info__dur_5_MVN.dat /root/Ossian/train/cn/speakers/toy_cn_corpus/naive_01_nn.cn///norm_info_dur_5_MVN.dat\n\ncp ./train/cn/speakers/toy_cn_corpus/naive_01_nn.cn/dnn_training_DUR/inter_module/label_norm_HTS_3482.dat /root/Ossian/train/cn/speakers/toy_cn_corpus/naive_01_nn.cn/\n...\n```\n\n## Other recipes\n\nWe have used many other recipes with Ossian which will be documented here when cleaned up enough to be useful to others. These will give the ability to add more  knowledge to the voices built, in the form of lexicons, letter-to-sound rules etc., and integrate existing trained components where they are available for the target language. Some of them could be found here:\n\n1. [Chinese Text-to-Speech recipe](./doc/recipe_usage/naive_01_nn.cn.md)\n\n## Announcement\n\nThis project is based on the [CSTR-Edinburgh/Ossian](https://github.com/CSTR-Edinburgh/Ossian). All copyright is belonging to the original project.\n\n[Yunchao He](https://weibo.com/heyunchao)\n\nyunchaohe@gmail.com\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcandlewill%2Fossian","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcandlewill%2Fossian","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcandlewill%2Fossian/lists"}