{"id":13693224,"url":"https://github.com/firmai/mtss-gan","last_synced_at":"2025-05-02T21:31:45.885Z","repository":{"id":49355540,"uuid":"268634939","full_name":"firmai/mtss-gan","owner":"firmai","description":"MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)","archived":true,"fork":false,"pushed_at":"2020-09-29T18:06:05.000Z","size":3798,"stargazers_count":93,"open_issues_count":0,"forks_count":31,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-25T03:09:10.413Z","etag":null,"topics":["adverserial","finance","generative-adversarial-network","model-validation","multivariate-data","multivariate-timeseries","similarity-measures","simulation","stress-test","synthetic-data","synthetic-dataset-generation","time-series"],"latest_commit_sha":null,"homepage":"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3616557","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/firmai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-06-01T21:20:26.000Z","updated_at":"2024-12-21T18:38:05.000Z","dependencies_parsed_at":"2022-09-12T15:30:32.918Z","dependency_job_id":null,"html_url":"https://github.com/firmai/mtss-gan","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firmai%2Fmtss-gan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firmai%2Fmtss-gan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firmai%2Fmtss-gan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firmai%2Fmtss-gan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/firmai","download_url":"https://codeload.github.com/firmai/mtss-gan/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252108843,"owners_count":21696147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adverserial","finance","generative-adversarial-network","model-validation","multivariate-data","multivariate-timeseries","similarity-measures","simulation","stress-test","synthetic-data","synthetic-dataset-generation","time-series"],"created_at":"2024-08-02T17:01:07.117Z","updated_at":"2025-05-02T21:31:45.017Z","avatar_url":"https://github.com/firmai.png","language":null,"funding_links":[],"categories":["Data-driven methods","Related Codebase"],"sub_categories":["Time Series","Time-Series"],"readme":"# MTSS-GAN: Multivariate Time Series Simulation Generative Adversarial Networks\n\nPlease experiment with the code in the colab below and give me your feedback in the issues tab. I will read it to improve a future version of this model.\n\nThe model has been developed on a colaboratory [notebook](https://colab.research.google.com/drive/1UFa3p4TEhK1jAPSj0KMqLpJGsgSBww_b?usp=sharing). Here I have added a few code snippets, if there is demand, I can build a package, please let me know in the issues tab. For some additional information, feel free to consult the [paper](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3616557). \n\nMTSS-GAN is a new generative adversarial network (GAN) developed to simulate diverse multivariate time series (MTS) data with finance applications in mind. The purpose of this synthesiser is two-fold, we both want to generate data that accurately represents the original data, while also having the flexibility to generate data with novel and unique relationships that could help with model testing and robustness checks. The method is inspired by stacked GANs originally designed for image generation. Stacked GANs have produced some of the best quality images, for that reason MTSS-GAN is expected to be a leading contender in multivariate time series generation. \n\nDesign\n---------------------\n\n![](assets/network.png)\n\nSimilarity\n---------------------\n![](assets/plots.png)\n\nUtility\n---------------------\n![](assets/utility.png)\n\nCode\n---------------------\nGenerator:\n```python\ndef generator(inputs,\n              activation='sigmoid',\n              labels=None,\n              codes=None):\n    \"\"\"\n\n    if codes is not None:\n        # generator 0 of MTSS\n        inputs = [inputs, codes]\n        x = concatenate(inputs, axis=1)\n        # noise inputs + conditional codes\n    else:\n        # default input is just a noise dimension (z-code)\n        x = inputs ## \n\n    x = Dense(SHAPE[0]*SHAPE[1])(x)\n    x = Reshape((SHAPE[0], SHAPE[1]))(x)\n    x = GRU(72, return_sequences=False, return_state=False,unroll=True)(x)\n    x = Reshape((int(SHAPE[0]/2), 6))(x)\n    x = Conv1D(128, 4, 1, \"same\")(x)\n    x = BatchNormalization(momentum=0.8)(x) # adjusting and scaling the activations\n    x = ReLU()(x)\n    x = UpSampling1D()(x)\n    x = Conv1D(6, 4, 1, \"same\")(x)\n    x = BatchNormalization(momentum=0.8)(x)\n\n    if activation is not None:\n        x = Activation(activation)(x)\n\n    # generator output is the synthesized data x\n    return Model(inputs, x,  name='gen1')\n```\n\nDiscriminator\n```python\n\ndef discriminator(inputs,\n                  activation='sigmoid',\n                  num_labels=None,\n                  num_codes=None):\n\n    ints = int(SHAPE[0]/2)\n    x = inputs\n    x = GRU(SHAPE[1]*SHAPE[0] , return_sequences=False, return_state=False,unroll=True, activation=\"relu\")(x)\n    x = Reshape((ints, ints))(x)\n    x = Conv1D(16, 3,2, \"same\")(x)\n    x = LeakyReLU(alpha=0.2)(x)\n    x = Conv1D(32, 3, 2, \"same\")(x)\n    x = LeakyReLU(alpha=0.2)(x)\n    x = Conv1D(64, 3, 2, \"same\")(x)\n    x = LeakyReLU(alpha=0.2)(x)\n    x = Conv1D(128, 3, 1, \"same\")(x)\n    x = LeakyReLU(alpha=0.2)(x)\n\n    x = Flatten()(x)\n    # default output is probability that the time series array is real\n    outputs = Dense(1)(x)\n\n    if num_codes is not None:\n        # MTSS-GAN Q0 output\n        # z0_recon is reconstruction of z0 normal distribution\n        # eventually two loss functions from this output.\n        z0_recon =  Dense(num_codes)(x)\n        z0_recon = Activation('tanh', name='z0')(z0_recon)\n        outputs = [outputs, z0_recon]\n\n    return Model(inputs, outputs, name='discriminator')\n```\n\nEncoder\n```python\n\ndef build_encoder(inputs, num_labels=6, feature0_dim=6*24):\n\n    x, feature0 = inputs\n\n    y = GRU(SHAPE[0]*SHAPE[1], return_sequences=False, return_state=False,unroll=True)(x)\n    y = Flatten()(y)\n    feature0_output = Dense(feature0_dim, activation='relu')(y)\n    # Encoder0 or enc0: data to feature0 \n    enc0 = Model(inputs=x, outputs=feature0_output, name=\"encoder0\")\n    \n    # Encoder1 or enc1\n    y = Dense(num_labels)(feature0)\n    labels = Activation('softmax')(y)\n    # Encoder1 or enc1: feature0 to class labels \n    enc1 = Model(inputs=feature0, outputs=labels, name=\"encoder1\")\n\n    # return both enc0 and enc1\n    return enc0, enc1\n ```\n\nBuild\n\n```python\ndef build_and_train_models():\n    \"\"\"Load the dataset, build MTSS discriminator,\n    generator, and adversarial models.\n    Call the MTSS train routine.\n    \"\"\"\n\n    dataX, _, _ = google_data_loading(seq_length)\n    dataX = np.stack(dataX)\n\n    train_n = int(len(dataX)*.70)\n    X = dataX[:,:,:-1]\n    y = dataX[:,-1,-1]\n    x_train, y_train = X[:train_n,:,:], y[:train_n]\n    x_test, y_test = X[train_n:,:,:], y[train_n:]\n\n    # number of labels\n    num_labels = len(np.unique(y_train))\n    # to one-hot vector\n    y_train = to_categorical(y_train)\n    y_test = to_categorical(y_test)\n\n    model_name = \"MTSS-GAN\"\n    # network parameters\n    batch_size = 64\n    train_steps = 10\n    #train_steps = 2000\n\n    lr = 2e-4\n    decay = 6e-8\n    z_dim = 50 ##this is the real noise input\n    z_shape = (z_dim, )\n    feature0_dim = SHAPE[0]*SHAPE[1]\n    feature0_shape = (feature0_dim, )\n    # [1] uses Adam, but discriminator converges easily with RMSprop\n    optimizer = RMSprop(lr=lr, decay=decay)\n\n    # build discriminator 0 and Q network 0 models\n    input_shape = (feature0_dim, )\n    inputs = Input(shape=input_shape, name='discriminator0_input')\n    dis0 = build_discriminator(inputs, z_dim=z_dim )\n    #Model(Dense(SHAPE[0]*SHAPE[1]), [f0_source, z0_recon], name='dis0')\n\n    # loss fuctions: 1) probability feature0 is real \n    # (adversarial0 loss)\n    # 2) MSE z0 recon loss (Q0 network loss or entropy0 loss)\n    # Because there are two outputs. \n\n    loss = ['binary_crossentropy', 'mse']\n    loss_weights = [1.0, 1.0] \n    dis0.compile(loss=loss,\n                 loss_weights=loss_weights,\n                 optimizer=optimizer,\n                 metrics=['accuracy'])\n    dis0.summary() # feature0 discriminator, z0 estimator\n\n    # build discriminator 1 and Q network 1 models\n\n    input_shape = (x_train.shape[1], x_train.shape[2])\n    inputs = Input(shape=input_shape, name='discriminator1_input')\n    dis1 = discriminator(inputs, num_codes=z_dim)\n\n    # loss fuctions: 1) probability time series arrays is real (adversarial1 loss)\n    # 2) MSE z1 recon loss (Q1 network loss or entropy1 loss)\n    loss = ['binary_crossentropy', 'mse']\n    loss_weights = [1.0, 10.0] \n    dis1.compile(loss=loss,\n                 loss_weights=loss_weights,\n                 optimizer=optimizer,\n                 metrics=['accuracy'])\n    dis1.summary() # time series array discriminator, z1 estimator \n\n\n    # build generator models\n    label_shape = (num_labels, )\n    feature0 = Input(shape=feature0_shape, name='feature0_input')\n    labels = Input(shape=label_shape, name='labels')\n    z0 = Input(shape=z_shape, name=\"z0_input\")\n    z1 = Input(shape=z_shape, name=\"z1_input\")\n    latent_codes = (labels, z0, z1, feature0)\n    gen0, gen1 = build_generator(latent_codes)\n    # gen0: classes and noise (labels + z0) to feature0 \n    gen0.summary() # (latent features generator)\n    # gen1: feature0 + z0 to feature1 \n    gen1.summary() # (time series array generator )\n\n    # build encoder models\n    input_shape = SHAPE\n    inputs = Input(shape=input_shape, name='encoder_input')\n    enc0, enc1 = build_encoder((inputs, feature0), num_labels)\n     # Encoder0 or enc0: data to feature0  \n    enc0.summary() # time series array to feature0 encoder\n     # Encoder1 or enc1: feature0 to class labels\n    enc1.summary() # feature0 to labels encoder (classifier)\n    encoder = Model(inputs, enc1(enc0(inputs)))\n    encoder.summary() # time series array to labels encoder (classifier)\n\n    data = (x_train, y_train), (x_test, y_test)\n    print(x_train.shape)\n    print(y_train.shape)\n\n    # this process would train enco, enc1, and encoder\n    train_encoder(encoder, data, model_name=model_name)\n\n\n    # build adversarial0 model = \n    # generator0 + discriminator0 + encoder1\n    # encoder0 weights frozen\n    enc1.trainable = False\n    # discriminator0 weights frozen\n    dis0.trainable = False\n    gen0_inputs = [labels, z0]\n    gen0_outputs = gen0(gen0_inputs)\n    adv0_outputs = dis0(gen0_outputs) + [enc1(gen0_outputs)]\n    # labels + z0 to prob labels are real + z0 recon + feature1 recon\n    adv0 = Model(gen0_inputs, adv0_outputs, name=\"adv0\")\n    # loss functions: 1) prob labels are real (adversarial1 loss)\n    # 2) Q network 0 loss (entropy0 loss)\n    # 3) conditional0 loss (classifier error)\n    loss_weights = [1.0, 1.0, 1.0] \n    loss = ['binary_crossentropy', \n            'mse',\n            'categorical_crossentropy']\n    adv0.compile(loss=loss,\n                 loss_weights=loss_weights,\n                 optimizer=optimizer,\n                 metrics=['accuracy'])\n    adv0.summary()\n\n    # build adversarial1 model =\n    # generator1 + discriminator1 + encoder0\n    optimizer = RMSprop(lr=lr*0.5, decay=decay*0.5)\n    # encoder1 weights frozen\n    enc0.trainable = False\n    # discriminator1 weights frozen\n    dis1.trainable = False\n    gen1_inputs = [feature0, z1]\n    gen1_outputs = gen1(gen1_inputs)\n    print(gen1_inputs)\n    print(gen1_outputs)\n    adv1_outputs = dis1(gen1_outputs) + [enc0(gen1_outputs)]\n    # feature1 + z1 to prob feature1 is \n    # real + z1 recon + feature1/time series array recon\n    adv1 = Model(gen1_inputs, adv1_outputs, name=\"adv1\")\n    # loss functions: 1) prob feature1 is real (adversarial0 loss)\n    # 2) Q network 1 loss (entropy1 loss)\n    # 3) conditional1 loss\n    loss = ['binary_crossentropy', 'mse', 'mse']\n    loss_weights = [1.0, 10.0, 1.0] \n    adv1.compile(loss=loss,\n                 loss_weights=loss_weights,\n                 optimizer=optimizer,\n                 metrics=['accuracy'])\n    adv1.summary()\n\n    \n\n    # train discriminator and adversarial networks\n    models = (enc0, enc1, gen0, gen1, dis0, dis1, adv0, adv1)\n    params = (batch_size, train_steps, num_labels, z_dim, model_name)\n    gen0, gen1 = train(models, data, params)\n\n\n    return gen0, gen1\n```\n\n\nTraining\n```python\n\ndef train(models, data, params):\n\n    enc0, enc1, gen0, gen1, dis0, dis1, adv0, adv1 = models\n    # network parameters\n    batch_size, train_steps, num_labels, z_dim, model_name = params\n    # train dataset\n    (x_train, y_train), (_, _) = data # I can do this. \n    # the generated time series array is saved every 500 steps\n    save_interval = 500\n\n    # label and noise codes for generator testing\n    z0 = np.random.normal(scale=0.5, size=[SHAPE[0], z_dim])\n    z1 = np.random.normal(scale=0.5, size=[SHAPE[0], z_dim])\n    noise_class = np.eye(num_labels)[np.arange(0, SHAPE[0]) % num_labels]\n    noise_params = [noise_class, z0, z1]\n    # number of elements in train dataset\n    train_size = x_train.shape[0]\n    print(model_name,\n          \"Labels for generated time series arrays: \",\n          np.argmax(noise_class, axis=1))\n\n    tv_plot = tv.train.PlotMetrics(columns=5, wait_num=5)\n    for i in range(train_steps):\n        # train the discriminator1 for 1 batch\n        # 1 batch of real (label=1.0) and fake feature1 (label=0.0)\n        # randomly pick real time series arrays from dataset\n        dicta = {}\n        rand_indexes = np.random.randint(0, \n                                         train_size, \n                                         size=batch_size)\n        real_samples = x_train[rand_indexes]\n        # real feature1 from encoder0 output\n        real_feature0 = enc0.predict(real_samples)\n        # generate random 50-dim z1 latent code\n        real_z0 = np.random.normal(scale=0.5,\n                                   size=[batch_size, z_dim])\n        # real labels from dataset\n        real_labels = y_train[rand_indexes]\n\n        # generate fake feature1 using generator1 from\n        # real labels and 50-dim z1 latent code\n        fake_z0 = np.random.normal(scale=0.5,\n                                   size=[batch_size, z_dim])\n        fake_feature0 = gen0.predict([real_labels, fake_z0])\n\n        # real + fake data\n        feature0 = np.concatenate((real_feature0, fake_feature0))\n        z0 = np.concatenate((fake_z0, fake_z0))\n\n        # label 1st half as real and 2nd half as fake\n        y = np.ones([2 * batch_size, 1])\n        y[batch_size:, :] = 0\n\n        # train discriminator1 to classify feature1 as \n        # real/fake and recover\n        # latent code (z0). real = from encoder1, \n        # fake = from genenerator10\n        # joint training using discriminator part of \n        # advserial1 loss and entropy0 loss\n        metrics = dis0.train_on_batch(feature0, [y, z0])\n        # log the overall loss only\n        log = \"%d: [dis0_loss: %f]\" % (i, metrics[0])\n        dicta[\"dis0_loss\"] = metrics[0]\n         \n        # train the discriminator1 for 1 batch\n        # 1 batch of real (label=1.0) and fake time series arrays (label=0.0)\n        # generate random 50-dim z1 latent code\n        fake_z1 = np.random.normal(scale=0.5, size=[batch_size, z_dim])\n        # generate fake time series arrays from real feature1 and fake z1\n        fake_samples = gen1.predict([real_feature0, fake_z1])\n       \n        # real + fake data\n        x = np.concatenate((real_samples, fake_samples))\n        z1 = np.concatenate((fake_z1, fake_z1))\n\n        # train discriminator1 to classify time series arrays \n        # as real/fake and recover latent code (z1)\n        # joint training using discriminator part of advserial0 loss\n        # and entropy1 loss\n        metrics = dis1.train_on_batch(x, [y, z1])\n        # log the overall loss only (use dis1.metrics_names)\n        log = \"%s [dis1_loss: %f]\" % (log, metrics[0])\n        dicta[\"dis1_loss\"] = metrics[0]\n\n        # adversarial training \n        # generate fake z0, labels\n        fake_z0 = np.random.normal(scale=0.5, \n                                   size=[batch_size, z_dim])\n        # input to generator0 is sampling fr real labels and\n        # 50-dim z0 latent code\n        gen0_inputs = [real_labels, fake_z0]\n\n        # label fake feature0 as real (specifies whether real or not)\n        # is it bypassing the discriminator?\n        y = np.ones([batch_size, 1])\n    \n        # train generator0 (thru adversarial) by fooling \n        # the discriminator\n        # and approximating encoder1 feature0 generator\n        # joint training: adversarial0, entropy0, conditional0\n        metrics = adv0.train_on_batch(gen0_inputs,\n                                      [y, fake_z0, real_labels])\n        fmt = \"%s [adv0_loss: %f, enc1_acc: %f]\"\n        dicta[\"adv0_loss\"] = metrics[0]\n        dicta[\"enc1_acc\"] = metrics[6]\n\n        # log the overall loss and classification accuracy\n        log = fmt % (log, metrics[0], metrics[6])\n\n        # input to generator0 is real feature0 and \n        # 50-dim z0 latent code\n        fake_z1 = np.random.normal(scale=0.5,\n                                   size=[batch_size, z_dim])\n        \n        gen1_inputs = [real_feature0, fake_z1]\n\n        # train generator1 (thru adversarial) by fooling \n        # the discriminator and approximating encoder1 time series arrays \n        # source generator joint training: \n        # adversarial1, entropy1, conditional1\n        metrics = adv1.train_on_batch(gen1_inputs,\n                                      [y, fake_z1, real_feature0])\n        # log the overall loss only\n        log = \"%s [adv1_loss: %f]\" % (log, metrics[0])\n        dicta[\"adv1_loss\"] = metrics[0]\n\n\n        print(log)\n        if (i + 1) % save_interval == 0:\n            generators = (gen0, gen1)\n            plot_ts(generators,\n                        noise_params=noise_params,\n                        show=False,\n                        step=(i + 1),\n                        model_name=model_name)\n            \n        tv_plot.update({'dis0_loss': dicta[\"dis0_loss\"], 'dis1_loss': dicta[\"dis1_loss\"], 'adv0_loss': dicta[\"adv0_loss\"], 'enc1_acc': dicta[\"enc1_acc\"], 'adv1_loss': dicta[\"adv1_loss\"]})\n        tv_plot.draw()\n\n    # save the modelis after training generator0 \u0026 1\n    # the trained generator can be reloaded for\n    # future data generation\n    gen0.save(model_name + \"-gen1.h5\")\n    gen1.save(model_name + \"-gen0.h5\")\n\n    return  gen0, gen1 \n    \n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirmai%2Fmtss-gan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffirmai%2Fmtss-gan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirmai%2Fmtss-gan/lists"}