{"id":13936284,"url":"https://github.com/lukalabs/cakechat","last_synced_at":"2025-09-30T08:31:46.696Z","repository":{"id":37752375,"uuid":"119573866","full_name":"lukalabs/cakechat","owner":"lukalabs","description":"CakeChat: Emotional Generative Dialog System","archived":true,"fork":false,"pushed_at":"2020-05-25T12:05:56.000Z","size":602,"stargazers_count":1709,"open_issues_count":1,"forks_count":925,"subscribers_count":144,"default_branch":"master","last_synced_at":"2025-06-23T16:05:08.779Z","etag":null,"topics":["conversational-agents","conversational-ai","conversational-bots","deep-learning","dialog-systems","dialogue-agents","dialogue-systems","keras","nlp","seq2seq","seq2seq-chatbot","seq2seq-model","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lukalabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-01-30T18:02:58.000Z","updated_at":"2025-06-21T00:24:25.000Z","dependencies_parsed_at":"2022-06-26T07:31:09.286Z","dependency_job_id":null,"html_url":"https://github.com/lukalabs/cakechat","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/lukalabs/cakechat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukalabs%2Fcakechat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukalabs%2Fcakechat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukalabs%2Fcakechat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukalabs%2Fcakechat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lukalabs","download_url":"https://codeload.github.com/lukalabs/cakechat/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukalabs%2Fcakechat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277652931,"owners_count":25854384,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-30T02:00:09.208Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conversational-agents","conversational-ai","conversational-bots","deep-learning","dialog-systems","dialogue-agents","dialogue-systems","keras","nlp","seq2seq","seq2seq-chatbot","seq2seq-model","tensorflow"],"created_at":"2024-08-07T23:02:32.550Z","updated_at":"2025-09-30T08:31:46.125Z","avatar_url":"https://github.com/lukalabs.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"**Note on the top: the project is unmaintained.**\n\nTransformer-based dialog models work better and we recommend using them instead of RNN-based CakeChat. See, for example https://github.com/microsoft/DialoGPT\n\n---\n\n## CakeChat: Emotional Generative Dialog System\n\nCakeChat is a backend for chatbots that are able to express emotions via conversations.\n\n![CakeChat representation](https://user-images.githubusercontent.com/2272790/57650691-3a8b9280-7580-11e9-9b60-ae3b28692c05.png)\n\nCakeChat is built on [Keras](https://keras.io/) and [Tensorflow](https://www.tensorflow.org).\n\nThe code is flexible and allows to condition model's responses by an arbitrary categorical variable.\nFor example, you can train your own persona-based neural conversational model[\u003csup\u003e\\[1\\]\u003c/sup\u003e](#f1)\nor create an emotional chatting machine[\u003csup\u003e\\[2\\]\u003c/sup\u003e](#f2).\n\n#### Main requirements\n\n* python 3.5.2\n* tensorflow 1.12.2\n* keras 2.2.4\n\n## Table of contents\n\n  1. [Network architecture and features](#network-architecture-and-features)\n  1. [Quick start](#quick-start)\n  1. [Setup for training and testing](#setup-for-training-and-testing)\n      1. [Docker](#docker)\n          1. [CPU-only setup](#cpu-only-setup)\n          1. [GPU-enabled setup](#gpu-enabled-setup)\n      1. [Manual setup](#manual-setup)\n  1. [Getting the pre-trained model](#getting-the-pre-trained-model)\n  1. [Training data](#training-data)\n  1. [Training the model](#training-the-model)\n      1. [Fine-tuning the pre-trained model on your data](#fine-tuning-the-pre-trained-model-on-your-data)\n      1. [Training the model from scratch](#training-the-model-from-scratch)\n      1. [Distributed train](#distributed-train)\n      1. [Validation metrics calculation](#validation-metrics-calculation)\n      1. [Testing the trained model](#testing-the-trained-model)\n  1. [Running CakeChat server](#running-cakechat-server)\n      1. [Local HTTP\\-server](#local-http-server)\n          1. [HTTP\\-server API description](#http-server-api-description)\n      1. [Gunicorn HTTP\\-server](#gunicorn-http-server)\n      1. [Telegram bot](#telegram-bot)\n  1. [Repository overview](#repository-overview)\n      1. [Important tools](#important-tools)\n      1. [Important configuration settings](#important-configuration-settings)\n  1. [Example use cases](#example-use-cases)\n  1. [References](#references)\n  1. [Credits \u0026 Support](#credits--support)\n  1. [License](#license)\n\n\n## Network architecture and features\n\n![Network architecture](https://user-images.githubusercontent.com/2272790/57819307-b7fc0200-773c-11e9-971b-4f73a72ef8ba.png)\n\n\nModel:\n\n  * Hierarchical Recurrent Encoder-Decoder (HRED) architecture for handling deep dialog context[\u003csup\u003e\\[3\\]\u003c/sup\u003e](#f3).\n  * Multilayer RNN with GRU cells. The first layer of the utterance-level encoder is always bidirectional.\n  By default, CuDNNGRU implementation is used for ~25% acceleration during inference.\n  * Thought vector is fed into decoder on each decoding step.\n  * Decoder can be conditioned on any categorical label, for example, emotion label or persona id.\n\nWord embedding layer:\n\n  * May be initialized using w2v model trained on your corpus.\n  * Embedding layer may be either fixed or fine-tuned along with other weights of the network.\n\nDecoding\n\n  * 4 different response generation algorithms: \"sampling\", \"beamsearch\", \"sampling-reranking\" and \"beamsearch-reranking\".\n  Reranking of the generated candidates is performed according to the log-likelihood or MMI-criteria[\u003csup\u003e\\[4\\]\u003c/sup\u003e](#f4).\n  See [configuration settings description](#important-configuration-settings) for details.\n\nMetrics:\n\n  * Perplexity\n  * n-gram distinct metrics adjusted to the samples size[\u003csup\u003e\\[4\\]\u003c/sup\u003e](#f4).\n  * Lexical similarity between samples of the model and some fixed dataset.\n  Lexical similarity is a cosine distance between TF-IDF vector of responses generated by the model and tokens\n  in the dataset.\n  * Ranking metrics: mean average precision and mean recall@k.[\u003csup\u003e\\[5\\]\u003c/sup\u003e](#f5)\n\n\n## Quick start\n\nIn case you are familiar with [Docker](https://docs.docker.com) here is the easiest way to run a pre-trained CakeChat\nmodel as a server. You may need to run the following commands with `sudo`.\n\nCPU version:\n\n```(bash)\ndocker pull lukalabs/cakechat:latest \u0026\u0026 \\\n\ndocker run --name cakechat-server -p 127.0.0.1:8080:8080 -it lukalabs/cakechat:latest bash -c \"python bin/cakechat_server.py\"\n```\n\nGPU version:\n\n```(bash)\ndocker pull lukalabs/cakechat-gpu:latest \u0026\u0026 \\\n\nnvidia-docker run --name cakechat-gpu-server -p 127.0.0.1:8080:8080 -it lukalabs/cakechat-gpu:latest bash -c \"CUDA_VISIBLE_DEVICES=0 python bin/cakechat_server.py\"\n```\n\nThat's it! Now test your CakeChat server by running the following command on your host machine:\n```(bash)\npython tools/test_api.py -f localhost -p 8080 -c \"hi!\" -c \"hi, how are you?\" -c \"good!\" -e \"joy\"\n```\n\nThe response dict may look like this:\n```\n{'response': \"I'm fine!\"}\n```\n\n\n## Setup for training and testing\n\n### Docker\n\nDocker is the easiest way to set up the environment and install all the dependencies for training and testing.\n\n\n#### CPU-only setup\n*Note:\nWe strongly recommend using GPU-enabled environment for training CakeChat model.\nInference can be made both on GPUs and CPUs.*\n\n1. Install [Docker](https://docs.docker.com/engine/installation/).\n\n2. Pull  a CPU-only docker image from dockerhub:\n\n```(bash)\ndocker pull lukalabs/cakechat:latest\n```\n\n3. Run a docker container in the CPU-only environment:\n```(bash)\ndocker run --name \u003cYOUR_CONTAINER_NAME\u003e -it lukalabs/cakechat:latest\n```\n\n#### GPU-enabled setup\n\n1. Install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) for the GPU support.\n\n2. Pull GPU-enabled docker image from dockerhub:\n\n```(bash)\ndocker pull lukalabs/cakechat-gpu:latest\n```\n\n3. Run a docker container in the GPU-enabled environment:\n\n```(bash)\nnvidia-docker run --name \u003cYOUR_CONTAINER_NAME\u003e -it cakechat-gpu:latest\n```\n\nThat's it! Now you can train your model and chat with it. See the corresponding section below for further instructions.\n\n### Manual setup\n\nIf you don't want to deal with docker, you can install all the requirements manually:\n\n```(bash)\npip install -r requirements.txt -r requirements-local.txt\n```\n\n**NB:**\n\nWe recommend installing the requirements inside a [virtualenv](https://virtualenv.pypa.io/en/stable/) to prevent\nmessing with your system packages.\n\n\n## Getting the pre-trained model\n\nYou can download our pre-trained model weights by running `python tools/fetch.py`.\n\nThe params of the pre-trained model are the following:\n\n  * context size **3** (\u003cspeaker_1_utterance\u003e, \u003cspeaker_2_utterance\u003e, \u003cspeaker_1_utterance\u003e)\n  * each encoded utterance contains **up to 30 tokens**\n  * the decoded utterance contains **up to 32 tokens**\n  * both encoder and decoder have **2 GRU layers** with **768 hidden units** each\n  * first layer of the encoder is bidirectional\n\n\n### Training data\n\nThe model was trained on a preprocessed Twitter corpus with ~50 million dialogs (11Gb of text data).\nTo clean up the corpus, we removed\n* URLs, retweets and citations;\n* mentions and hashtags that are not preceded by regular words or punctuation marks;\n* messages that contain more than 30 tokens.\n\nWe used our emotions classifier to label each utterance with one of the following 5 emotions: `\"neutral\", \"joy\",\n\"anger\", \"sadness\", \"fear\"`, and used these labels during training.\nTo mark-up your own corpus with emotions you can use, for example, [DeepMoji tool](https://github.com/bfelbo/DeepMoji).\n\nUnfortunately, due to Twitter's privacy policy, we are not allowed to provide our dataset.\nYou can train a dialog model on any text conversational dataset available to you, a great overview of existing\nconversational datasets can be found here: https://breakend.github.io/DialogDatasets/\n\nThe training data should be a txt file, where each line is a valid json object, representing a list of dialog utterances.\nRefer to our [dummy train dataset](data/corpora_processed/train_processed_dialogs.txt) to see the necessary\nfile structure. Replace this dummy corpus with your data before training.\n\n\n## Training the model\n\nThere are two options:\n\n1. training from scratch\n1. fine-tuning the provided trained model\n\nThe first approach is less restrictive: you can use any training data you want and set any config params of the model.\nHowever, you should be aware that you'll need enough train data (~50Mb at least), one or more GPUs and enough\npatience (days) to get good model's responses.\n\nThe second approach is limited by the choice of config params of the pre-trained model – see `cakechat/config.py` for\nthe complete list. If the default params are suitable for your task, fine-tuning should be a good option.\n\n\n### Fine-tuning the pre-trained model on your data\n\n1. Fetch the pre-trained model from Amazon S3 by running `python tools/fetch.py`.\n\n1. Put your training text corpus to\n[`data/corpora_processed/train_processed_dialogs.txt`](data/corpora_processed/train_processed_dialogs.txt). Make sure that your\ndataset is large enough, otherwise your model risks to overfit the data and the results will be poor.\n\n1. Run `python tools/train.py`.\n    1. The script will look for the pre-trained model weights in `results/nn_models`, the full path is inferred from the\n    set of config params.\n    1. If you want to initialize the model weights from a custom file, you can specify the path to the file via `-i`\n    argument, for example, `python tools/train.py -i results/nn_models/my_saved_weights/model.current`.\n    1. Don't forget to set `CUDA_VISIBLE_DEVICES=\u003cGPU_ID\u003e` environment variable (with \u003cGPU_ID\u003e as in output of\n    **nvidia-smi** command) if you want to use GPU. For example, `CUDA_VISIBLE_DEVICES=0 python tools/train.py` will run the\n    train process on the 0-th GPU.\n    1. Use parameter `-s` to train the model on a subset of the first N samples of your training data to speed up\n    preprocessing for debugging. For example, run `python tools/train.py -s 1000` to train on the first 1000 samples.\n\nWeights of the trained model are saved to `results/nn_models/`.\n\n### Training the model from scratch\n\n1. Put your training text corpus to\n[`data/corpora_processed/train_processed_dialogs.txt`](data/corpora_processed/train_processed_dialogs.txt).\n\n1. Set up training parameters in [`cakechat/config.py`](cakechat/config.py).\nSee [configuration settings description](#important-configuration-settings) for more details.\n\n1. Consider running `PYTHONHASHSEED=42 python tools/prepare_index_files.py` to build the index files with tokens and\nconditions from the training corpus. Make sure to set `PYTHONHASHSEED` environment variable, otherwise you may get\ndifferent index files for different launches of the script.\n**Warning:** this script overwrites the original tokens index files `data/tokens_index/t_idx_processed_dialogs.json` and\n`data/conditions_index/c_idx_processed_dialogs.json`.\nYou should only run this script in case your corpus is large enough to contain all the words that you want your model\nto understand. Otherwise, consider fine-tuning the pre-trained model as described above. If you messed up with index\nfiles and want to get the default versions, delete your copies and run `python tools/fetch.py` anew.\n\n1. Consider running `python tools/train_w2v.py` to build w2v embedding from the training corpus.\n**Warning:** this script overwrites the original w2v weights that are stored in `data/w2v_models`.\nYou should only run this script in case your corpus is large enough to contain all the words that you want your model\nto understand. Otherwise, consider fine-tuning the pre-trained model as described above. If you messed up with w2v\nfiles and want to get the default version, delete your file copy and run `python tools/fetch.py` anew.\n\n1. Run `python tools/train.py`.\n    1. Don't forget to set `CUDA_VISIBLE_DEVICES=\u003cGPU_ID\u003e` environment variable (with \u003cGPU_ID\u003e\nas in output of **nvidia-smi** command) if you want to use GPU. For example `CUDA_VISIBLE_DEVICES=0 python tools/train.py`\nwill run the train process on the 0-th GPU.\n    1. Use parameter `-s` to train the model on a subset of the first N samples of your training data to speed up\npreprocessing for debugging. For example, run `python tools/train.py -s 1000` to train on the first 1000 samples.\n\n1. You can also set `IS_DEV=1` to enable the \"development mode\". It uses a reduced number of model parameters\n(decreased hidden layer dimensions, input and output sizes of token sequences, etc.) and performs verbose logging.\nRefer to the bottom lines of `cakechat/config.py` for the complete list of dev params.\n\nWeights of the trained model are saved to `results/nn_models/`.\n\n### Distributed train\n\nGPU-enabled docker container supports distributed train on multiple GPUs using [horovod](https://github.com/horovod/horovod).\n\nFor example, run `python tools/distributed_train.py -g 0 1` to start training on 0 and 1 GPUs.\n\n\n### Validation metrics calculation\n\nDuring training the following datasets are used for validations metrics calculation:\n\n* [`data/corpora_processed/val_processed_dialogs.txt`](data/corpora_processed/val_processed_dialogs.txt)(dummy example, replace with your data) – for the\ncontext-sensitive dataset\n* [`data/quality/context_free_validation_set.txt`](data/quality/context_free_validation_set.txt) – for the context-free\nvalidation dataset\n* [`data/quality/context_free_questions.txt`](data/quality/context_free_questions.txt) – is used for generating\nresponses for logging and computing distinct-metrics\n* [`data/quality/context_free_test_set.txt`](data/quality/context_free_test_set.txt) – is used for computing metrics of\nthe trained model, e.g. ranking metrics\n\nThe metrics are stored to `cakechat/results/tensorboard` and can be visualized using\n[Tensorboard](https://www.tensorflow.org/guide/summaries_and_tensorboard).\nIf you run a docker container from the provided CPU or GPU-enabled docker image, tensorboard server should start\nautomatically and serve on `http://localhost:6006`. Open this link in your browser to see the training graphs.\n\nIf you installed the requirements manually, start tensorboard server first by running the following command from your\ncakechat root directory:\n\n```\nmkdir -p results/tensorboard \u0026\u0026 tensorboard --logdir=results/tensorboard 2\u003eresults/tensorboard/err.log \u0026\n```\n\nAfter that proceed to `http://localhost:6006`.\n\n\n### Testing the trained model\n\nYou can run the following tools to evaluate your trained model on\n[test data](data/corpora_processed/test_processed_dialogs.txt)(dummy example, replace with your data):\n\n* [`tools/quality/ranking_quality.py`](tools/quality/ranking_quality.py) –\ncomputes ranking metrics of a dialog model\n* [`tools/quality/prediction_distinctness.py`](tools/quality/prediction_distinctness.py) –\ncomputes distinct-metrics of a dialog model\n* [`tools/quality/condition_quality.py`](tools/quality/condition_quality.py) –\ncomputes metrics on different subsets of data according to the condition value\n* [`tools/generate_predictions.py`](tools/generate_predictions.py) –\nevaluates the model. Generates predictions of a dialog model on the set of given dialog contexts and then computes\nmetrics. Note that you should have a reverse-model in the `data/nn_models` directory if you want to use \"\\*-reranking\"\nprediction modes\n* [`tools/generate_predictions_for_condition.py`](tools/generate_predictions_for_condition.py) –\ngenerates predictions for a given condition value\n\n\n## Running CakeChat server\n\n### Local HTTP-server\n\nRun a server that processes HTTP-requests with given input messages and returns response messages from the model:\n\n```(bash)\npython bin/cakechat_server.py\n```\n\nSpecify `CUDA_VISIBLE_DEVICES=\u003cGPU_ID\u003e` environment variable to run the server on a certain GPU.\n\nDon't forget to run `python tools/fetch.py` prior to starting the server if you want to use our pre-trained model.\n\nTo make sure everything works fine, test the model on the following conversation\n\n\u003e – Hi, Eddie, what's up?  \n\u003e – Not much, what about you?  \n\u003e – Fine, thanks. Are you going to the movies tomorrow?\n\nby running the command:\n\n```(bash)\npython tools/test_api.py -f 127.0.0.1 -p 8080 \\\n    -c \"Hi, Eddie, what's up?\" \\\n    -c \"Not much, what about you?\" \\\n    -c \"Fine, thanks. Are you going to the movies tomorrow?\"\n```\n\nYou should get a meaningful answer, for example:\n```\n{'response': \"Of course!\"}\n```\n\n#### HTTP-server API description\n\n##### /cakechat_api/v1/actions/get_response\nJSON parameters are:\n\n|Parameter|Type|Description|\n|---|---|---|\n|context|list of strings|List of previous messages from the dialogue history (max. 3 is used)|\n|emotion|string, one of enum|One of {'neutral', 'anger', 'joy', 'fear', 'sadness'}. An emotion to condition the response on. Optional param, if not specified, 'neutral' is used|\n\n##### Request\n```\nPOST /cakechat_api/v1/actions/get_response\ndata: {\n 'context': ['Hello', 'Hi!', 'How are you?'],\n 'emotion': 'joy'\n}\n```\n\n##### Response OK\n```\n200 OK\n{\n  'response': 'I\\'m fine!'\n}\n```\n\n### Gunicorn HTTP-server\n\nWe recommend using [Gunicorn](http://gunicorn.org/) for serving the API of your model at production scale.\n\n1. Install gunicorn: `pip install gunicorn`\n\n2. Run a server that processes HTTP-queries with input messages and returns response messages of the model:\n\n```(bash)\ncd bin \u0026\u0026 gunicorn cakechat_server:app -w 1 -b 127.0.0.1:8080 --timeout 2000\n```\n\n\n### Telegram bot\n\nYou can run your CakeChat model as a Telegram bot:\n\n1. [Create a telegram bot](https://core.telegram.org/bots#3-how-do-i-create-a-bot) to get bot's token.\n2. Run `python tools/telegram_bot.py --token \u003cYOUR_BOT_TOKEN\u003e` and chat with it on Telegram.\n\n\n## Repository overview\n\n* `cakechat/dialog_model/` – contains computational graph, training procedure and other model utilities\n* `cakechat/dialog_model/inference/` – algorithms for response generation\n* `cakechat/dialog_model/quality/` – code for metrics calculation and logging\n* `cakechat/utils/` – utilities for text processing, w2v training, etc.\n* `cakechat/api/` – functions to run http server: API configuration, error handling\n* `tools/` – scripts for training, testing and evaluating your model\n\n\n### Important tools\n\n* [`bin/cakechat_server.py`](bin/cakechat_server.py) – \nRuns an HTTP-server that returns response messages of the model given dialog contexts and an emotion.\nSee [run section](#gunicorn-http-server) for details.\n* [`tools/train.py`](tools/train.py) – \nTrains the model on your data. You can use the `--reverse` option to train a reverse-model used in \"\\*-reranking\" response\ngeneration algorithms for more accurate predictions.\n* [`tools/prepare_index_files.py`](tools/prepare_index_files.py) – \nPrepares index for the most commonly used tokens and conditions. Use this script before training the model from scratch\non your own data.\n* [`tools/quality/ranking_quality.py`](tools/quality/ranking_quality.py) – \nComputes ranking metrics of a dialog model.\n* [`tools/quality/prediction_distinctness.py`](tools/quality/prediction_distinctness.py) – \nComputes distinct-metrics of a dialog model.\n* [`tools/quality/condition_quality.py`](tools/quality/condition_quality.py) – \nComputes metrics on different subsets of data according to the condition value.\n* [`tools/generate_predictions.py`](tools/generate_predictions.py) – \nEvaluates the model. Generates predictions of a dialog model on the set of given dialog contexts and then computes\nmetrics. Note that you should have a reverse-model in the `results/nn_models` directory if you want to use \"\\*-reranking\"\nprediction modes.\n* [`tools/generate_predictions_for_condition.py`](tools/generate_predictions_for_condition.py) – \nGenerates predictions for a given condition value.\n* [`tools/test_api.py`](tools/test_api.py) – \nExample code to send requests to a running HTTP-server.\n* [`tools/fetch.py`](tools/fetch.py) –\nDownloads the pre-trained model and index files associated with it.\n* [`tools/telegram_bot.py`](tools/telegram_bot.py) – \nRuns Telegram bot on top of trained model.\n\n\n### Important configuration settings\n\nAll the configuration parameters for the network architecture, training, predicting and logging steps are defined in\n[`cakechat/config.py`](cakechat/config.py). Some inference parameters used in an HTTP-server are defined in\n[`cakechat/api/config.py`](cakechat/api/config.py).\n\n* Network architecture and size\n    * `HIDDEN_LAYER_DIMENSION` is the main parameter that defines the number of hidden units in recurrent layers.\n    * `WORD_EMBEDDING_DIMENSION` and `CONDITION_EMBEDDING_DIMENSION` define the number of hidden units that each\n    token/condition are mapped into.\n    * Number of units of the output layer of the decoder is defined by the number of tokens in the dictionary in the\n    `tokens_index` directory.\n* Decoding algorithm:\n    * `PREDICTION_MODE_FOR_TESTS` defines how the responses of the model are generated. The options are the following:\n        - **sampling** – response is sampled from output distribution token-by-token.\n        For every token the temperature transform is performed prior to sampling. \n        You can control the temperature value by tuning `DEFAULT_TEMPERATURE` parameter.\n        - **sampling-reranking** – multiple candidate-responses are generated using sampling procedure described above.\n        After that the candidates are ranked according to their MMI-score[\u003csup\u003e\\[4\\]\u003c/sup\u003e](#f4)\n        You can tune this mode by picking `SAMPLES_NUM_FOR_RERANKING` and `MMI_REVERSE_MODEL_SCORE_WEIGHT` parameters.\n        - **beamsearch** – candidates are sampled using\n        [beam search algorithm](https://en.wikipedia.org/wiki/Beam_search).\n        The candidates are ordered according to their log-likelihood score computed by the beam search procedure.\n        - **beamsearch-reranking** – same as above, but the candidates are re-ordered after the generation in the same\n        way as in sampling-reranking mode.\n        \n    Note that there are other parameters that affect the response generation process.\n    See `REPETITION_PENALIZE_COEFFICIENT`, `NON_PENALIZABLE_TOKENS`, `MAX_PREDICTIONS_LENGTH`.\n\n\n## Example use cases\n\nBy providing additional condition labels within dataset entries, you can build the following models:\n* [A Persona-Based Neural Conversation Model][5] — a model that allows to condition responses on a persona ID to make\nthem lexically similar to the given persona's linguistic style.\n* [Emotional Chatting Machine][4]-like model — a model that allows conditioning responses on different emotions to provide\nemotional styles (anger, sadness, joy, etc).\n* [Topic Aware Neural Response Generation][6]-like model — a model that allows to condition responses on a certain\ntopic to keep the topic-aware conversation.\n\nTo make use of these extra conditions, please refer to the section [Training the model](#training-the-model).\nJust set the \"condition\" field in the [training set](data/corpora_processed/train_processed_dialogs.txt) to one of the\nfollowing: **persona ID**, **emotion** or **topic** label, update the index files and start the training.\n\n\n## References\n\n* \u003ca name=\"f1\"/\u003e\u003csup\u003e\\[1\\]\u003c/sup\u003e [A Persona-Based Neural Conversation Model][1]\n* \u003ca name=\"f2\"/\u003e\u003csup\u003e\\[2\\]\u003c/sup\u003e [Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory][2]\n* \u003ca name=\"f3\"/\u003e\u003csup\u003e\\[3\\]\u003c/sup\u003e [A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion][3]\n* \u003ca name=\"f4\"/\u003e\u003csup\u003e\\[4\\]\u003c/sup\u003e [A Diversity-Promoting Objective Function for Neural Conversation Models][4]\n* \u003ca name=\"f5\"/\u003e\u003csup\u003e\\[5\\]\u003c/sup\u003e [Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems][5]\n* \u003ca name=\"f6\"/\u003e\u003csup\u003e\\[6\\]\u003c/sup\u003e [Topic Aware Neural Response Generation][6]\n\n[1]: https://arxiv.org/pdf/1603.06155.pdf\n[2]: https://arxiv.org/pdf/1704.01074.pdf\n[3]: https://arxiv.org/pdf/1507.02221.pdf\n[4]: https://arxiv.org/pdf/1510.03055.pdf\n[5]: http://mi.eng.cam.ac.uk/~sjy/papers/scgy05.pdf\n[6]: https://arxiv.org/pdf/1606.08340v2.pdf\n\n## Credits \u0026 Support\n**CakeChat** is developed and maintained by the [Replika team](https://replika.ai):\n\n[Nicolas Ivanov](https://github.com/nicolas-ivanov), [Michael Khalman](https://github.com/mihaha),\n[Nikita Smetanin](https://github.com/nikitos9000), [Artem Rodichev](https://github.com/rodart) and\n[Denis Fedorenko](https://github.com/sadreamer).\n\nDemo by [Oleg Akbarov](https://github.com/olegakbarov), [Alexander Kuznetsov](https://github.com/alexkuz) and\n[Vladimir Chernosvitov](http://chernosvitov.com/).\n\nAll issues and feature requests can be tracked here – [GitHub Issues](https://github.com/lukalabs/cakechat/issues).\n\n## License\n© 2019 Luka, Inc. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flukalabs%2Fcakechat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flukalabs%2Fcakechat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flukalabs%2Fcakechat/lists"}