{"id":13935893,"url":"https://github.com/bshao001/TF-Model-Deploy-Tutorial","last_synced_at":"2025-07-19T21:31:00.261Z","repository":{"id":202176420,"uuid":"113878451","full_name":"bshao001/TF-Model-Deploy-Tutorial","owner":"bshao001","description":"A tutorial exploring multiple approaches to deploy a trained TensorFlow (or Keras) model or multiple models  for prediction.","archived":false,"fork":false,"pushed_at":"2017-12-20T02:32:43.000Z","size":34,"stargazers_count":51,"open_issues_count":0,"forks_count":20,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-07T02:21:50.420Z","etag":null,"topics":["freeze-model","keras","keras-models","keras-tensorflow","keras-tutorials","tensorflow","tensorflow-models","tensorflow-tutorials"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bshao001.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-12-11T15:53:16.000Z","updated_at":"2024-05-10T04:14:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"e82588cd-0cfc-4074-bb21-c6768d62fe90","html_url":"https://github.com/bshao001/TF-Model-Deploy-Tutorial","commit_stats":null,"previous_names":["bshao001/tf-model-deploy-tutorial"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bshao001/TF-Model-Deploy-Tutorial","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bshao001%2FTF-Model-Deploy-Tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bshao001%2FTF-Model-Deploy-Tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bshao001%2FTF-Model-Deploy-Tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bshao001%2FTF-Model-Deploy-Tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bshao001","download_url":"https://codeload.github.com/bshao001/TF-Model-Deploy-Tutorial/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bshao001%2FTF-Model-Deploy-Tutorial/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266019657,"owners_count":23864916,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["freeze-model","keras","keras-models","keras-tensorflow","keras-tutorials","tensorflow","tensorflow-models","tensorflow-tutorials"],"created_at":"2024-08-07T23:02:10.753Z","updated_at":"2025-07-19T21:31:00.255Z","avatar_url":"https://github.com/bshao001.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# TensorFlow Model Deployment\r\n\r\nA tutorial exploring multiple approaches to deploy / serve a trained TensorFlow (or Keras) model or multiple models \r\nin a production environment for prediction / inferences.\r\n\r\nThe code samples provided here may originally developed based on TensorFlow 1.2, 1.3 or 1.4. However, unless \r\nexplicitly specified, they should work for all versions \u003e= 1.0.\r\n\r\nTable of Contents\r\n=================\r\n1.  [Import the Model Graph from Meta File](#importGraph)\r\n2.  [Create the Model Graph from Scratch](#createGraph)\r\n3.  [Restore Multiple Models](#restoreMultiple)\r\n4.  [Inspect a Model](#inspectModel)\r\n5.  [Freeze a Model before Serving it](#freeezeModel)\r\n6.  [Convert a Keras model to a TensorFlow model](#convertKeras)\r\n7.  [Deploy Multiple Freezed Models](#multiFreezed)\r\n8.  [Serve a Model via Web Services](#webServices)\r\n\r\nDuring the training, TensorFlow generates the following 3 files for each checkpoint, although optionally, \r\nyou can choose not to create the meta file. You can ignore the file named checkpoint as it is not used in \r\nthe prediction process.\r\n\r\n1. meta file: It holds the compressed Protobufs graph of the model and all the other metadata associated, such \r\nas collections and operations.\r\n2. index file: It holds an immutable table (key-value table) linking a serialised tensor name to where to find \r\nits data in the data file. \r\n3. data file: It is TensorBundle collection, which saves the values of all variables, such as weights.\r\n\r\n\u003ca name=\"importGraph\"\u003e\u003c/a\u003e\r\n### Import the Model Graph from Meta File\r\nOne common approach is to restore the model graph from the meta file, and then restore weights and other data \r\nfrom the data file (index file will be used as well). Here is a sample code snippet:\r\n\r\n```python\r\nimport tensorflow as tf\r\n    \r\nwith tf.Session(graph=tf.Graph()) as sess:\r\n    saver = tf.train.import_meta_graph(\"/trained/model_ckpt.meta\")\r\n    saver.restore(sess, \"/trained/model_ckpt\")\r\n    \r\n    # Retrieve Ops from the collection\r\n        \r\n    # Run sess to predict\r\n```\r\n\r\nA small trick here is where to place the following of code (saver) when you define the model graph for training. \r\nBy default, only variables defined above this line will be saved into the meta file. If you don't plan to retrain\r\nthe model, you can leave the code defining your train_ops, such as optimizer, loss, accuracy below this line so \r\nthat your model file can be reasonably smaller.\r\n\r\n```\r\nsaver = tf.train.Saver()\r\n```\r\n\r\nYou normally need to leave some hooks in the trained model so that you can easily feed the data for prediction. \r\nFor example, you need to save logits and image_placehoder into the collection and save them in the training, and \r\nlater retrieve them for prediction.\r\n\r\nA concrete example can be found in train() and predict() methods \r\n[here](https://github.com/bshao001/DmsMsgRcg/blob/Sliding_Window_Version/misc/imgconvnets.py).\r\n\r\nThis applies to the case when the graph used for inference and training are the same or very similar. In case\r\nthe inference graph is very different from the graph used for training, this approach is not preferred as it \r\nwould require the graph built for the training to adapt both training and inference, making it unnecessarily \r\nlarge.\r\n\r\n\u003ca name=\"createGraph\"\u003e\u003c/a\u003e\r\n### Create the Model Graph from Scratch\r\nAnother common approach is to create the model graph from scratch instead of restoring the graph from the meta\r\nfile. This is extremely useful when the graph for inference is considerably different from the graph for training.\r\nThe new TensorFlow NMT model (https://github.com/tensorflow/nmt) is one of the cases.\r\n\r\n```\r\nimport tensorflow as tf\r\n# Replace this with your valid ModelCreator\r\nimport ModelCreator \r\n    \r\nwith tf.Session() as sess:\r\n    # Replace this line with your valid ModelCreator and its arguments\r\n    model = ModelCreator(training=False)\r\n    # Restore model weights\r\n    model.saver.restore(sess, \"/trained/model_ckpt\")\r\n```\r\n\r\nA concrete example can be found in the constructor (\\_\\_init\\_\\_ method) \r\n[here](https://github.com/bshao001/ChatLearner/blob/master/chatbot/botpredictor.py).\r\n\r\n\u003ca name=\"restoreMultiple\"\u003e\u003c/a\u003e\r\n### Restore Multiple Models\r\nSometimes, you may need to load multiple trained models into a single TF session to work together for a task. For \r\nexample, in a face recognition application, you may need a model to detect faces from a given images, then use \r\nanother model to recognize these faces. In a typical photo OCR application, you normally require three models to \r\nwork as a pipeline: model one to detect the text areas (blocks) from a given image; model two to segment characters \r\nfrom the text strings detected by the first model; and model three to recognize those characters.\r\n\r\nLoading multiple models into a single session can be tricky if you don't handle it properly. Here are the steps to \r\nfollow:\r\n\r\n1. For each of the models, you need to have a unique model_scope, and define all the variables within that scope when\r\nbuilding the graph for training:\r\n\r\n```\r\nwith tf.variable_scope(model_scope):\r\n    # Define variables here\r\n```\r\n \r\n2. At the time of restoring models, do the following:\r\n\r\n```\r\ntf.train.import_meta_graph(os.path.join(result_dir, result_file + \".meta\"))\r\nall_vars = tf.global_variables()\r\nmodel_vars = [var for var in all_vars if var.name.startswith(model_scope)]\r\nsaver = tf.train.Saver(model_vars)\r\nsaver.restore(sess, os.path.join(result_dir, result_file))\r\n```\r\n\r\nHere, a TF session object (sess) is often passed into the method, as you don't want to create its own session here. \r\nAlso, don't be fooled by the frequently used way of this statement:\r\n\r\n```\r\nsaver = tf.train.import_meta_graph(\"/trained/model_ckpt.meta\")\r\n```\r\n\r\nWhen the right side is run inside a TF session, the model graph is imported. It returns a saver, but you don't have \r\nto use it. My experience was if this saver is used to restore the data (weights), it won't work for loading multiple\r\nmodels: it will complain all kinds of conflicts. \r\n\r\nA whole working example can be found in my [DmsMsgRcg](https://github.com/bshao001/DmsMsgRcg/tree/Sliding_Window_Version) \r\nproject:\r\n- Training: https://github.com/bshao001/DmsMsgRcg/blob/Sliding_Window_Version/misc/imgconvnets.py\r\n- Predictor Definition: https://github.com/bshao001/DmsMsgRcg/blob/Sliding_Window_Version/misc/cnnpredictor.py\r\n- Final Application: https://github.com/bshao001/DmsMsgRcg/blob/Sliding_Window_Version/mesgclsf/msgclassifier.py\r\n \r\n\u003ca name=\"inspectModel\"\u003e\u003c/a\u003e\r\n### Inspect a Model\r\nVery often, you need to check what are in the model files, including operations and possibly weights. Here are a few \r\nthings you may want to do.\r\n \r\n1. Check the operations (nodes), all variables, or trainable variables in the graph; OR even save everything, including\r\nthe weights into a text file so that you can read them. \r\n\r\n```python\r\nimport tensorflow as tf\r\n    \r\nsaver = tf.train.import_meta_graph(\"/trained/model_ckpt.meta\")\r\ngraph = tf.get_default_graph()\r\ninput_graph_def = graph.as_graph_def()\r\n    \r\nwith tf.Session() as sess:\r\n    saver.restore(sess, \"/trained/model_ckpt\")\r\n\r\n    # Check all operations (nodes) in the graph:\r\n    print(\"## All operations: \")\r\n    for op in graph.get_operations():\r\n        print(op.name)\r\n\r\n    # OR check all variables in the graph:\r\n    print(\"## All variables: \")\r\n    for v in tf.global_variables():\r\n        print(v.name)\r\n\r\n    # OR check all trainable variables in the graph:\r\n    print(\"## Trainable variables: \")\r\n    for v in tf.trainable_variables():\r\n        print(v.name)\r\n\r\n    # OR save the whole graph and weights into a text file:\r\n    log_dir = \"/log_dir\"\r\n    out_file = \"train.pbtxt\"\r\n    tf.train.write_graph(input_graph_def, logdir=log_dir, name=out_file, as_text=True)\r\n  ```\r\n\r\n2. Inspect all tensors and their weight values:\r\n\r\n```python\r\nfrom tensorflow.python import pywrap_tensorflow\r\n    \r\nmodel_file = \"/trained/model_ckpt\"\r\nreader = pywrap_tensorflow.NewCheckpointReader(model_file)\r\nvar_to_shape_map = reader.get_variable_to_shape_map()\r\n    \r\nfor key in sorted(var_to_shape_map):\r\n    print(\"tensor_name: \", key)\r\n    print(reader.get_tensor(key))\r\n```\r\n\r\nA complete working script is included in this repository (inspect_checkpoint.py).\r\n\r\n\u003ca name=\"freeezeModel\"\u003e\u003c/a\u003e\r\n### Freeze a Model before Serving it\r\nSometimes, a trained model (file) can be very big, and ranging from half to several GB is a common case. At inference \r\ntime, you don't have to deal with the big file if you choose to freeze the model. This process can normally decrease \r\nthe model file to 25% to 35% of its original size, making the inference considerably faster.\r\n\r\nHere are the 3 steps to achieve this:\r\n\r\n1. Restore / load the trained model:\r\n\r\n```python\r\nimport tensorflow as tf\r\n    \r\nsaver = tf.train.import_meta_graph(\"/trained/model_ckpt.meta\")\r\ngraph = tf.get_default_graph()\r\ninput_graph_def = graph.as_graph_def()\r\nsess = tf.Session()\r\nsaver.restore(sess, \"/trained/model_ckpt\")\r\n```\r\n\r\n2. Choose the output for the freezed model:\r\n\r\n```\r\noutput_node_names = []\r\noutput_node_names.append(\"prediction_node\")  # Specify the real node name\r\noutput_graph_def = tf.graph_util.convert_variables_to_constants(\r\n    sess,\r\n    input_graph_def,\r\n    output_node_names\r\n)\r\n```\r\n\r\nHere, you may need to use the following code to check the output node name as you have learned in the above section:\r\n\r\n```\r\nfor op in graph.get_operations():\r\n    print(op.name)\r\n```\r\n\r\nKeep in mind that when you request to output an operation, all the other operations that it depends will also be \r\nsaved. Therefore, you only need to specify the final output operation in the inference graph for freezing purpose.\r\n\r\n3. Serialize and write the output graph and trained weights to the file system:\r\n\r\n```\r\noutput_file = \"model_file.pb\"\r\nwith tf.gfile.GFile(output_file, \"wb\") as f:\r\n    f.write(output_graph_def.SerializeToString())\r\n    \r\nsess.close()\r\n```\r\n\r\nA concrete working example, including how to use the freezed model for prediction can be found \r\n[here](https://github.com/bshao001/DmsMsgRcg/blob/master/misc/freezemodel.py).\r\n\r\n\u003ca name=\"convertKeras\"\u003e\u003c/a\u003e\r\n### Convert a Keras model to a TensorFlow model\r\nThe procedures to convert a Keras model to a TensorFlow model (and freeze it) are very similar to those to freeze a \r\nmodel described above.\r\n\r\nHere are the 3 steps to perform the model conversion:\r\n\r\n1. Load the Keras model you want to convert:\r\n\r\n```python\r\nimport tensorflow as tf\r\nfrom keras.models import load_model\r\nfrom keras import backend as K\r\n\r\nK.set_learning_phase(0)\r\nkeras_model = load_model(\"/trained/model_and_weights.h5\")\r\n```\r\n\r\nIf you have any custom functions when you build the Keras model, specify those in a python dict and pass the dict to\r\ncustom_objects when using load_model() method.\r\n\r\n2. Specify the output for the model. Use tf.identity to rename the output nodes.\r\n\r\n```\r\n# Define num_output. If you have multiple outputs, change this number accordingly\r\nnum_output = 1\r\n\r\noutput = [None] * num_output\r\nout_node_names = [None] * num_output\r\nfor i in range(num_output):\r\n    out_node_names[i] = name_output + str(i)\r\n    output[i] = tf.identity(keras_model.outputs[i], name=out_node_names[i])\r\n    \r\nsess = K.get_session()\r\nconstant_graph = tf.graph_util.convert_variables_to_constants(\r\n    sess,\r\n    sess.graph.as_graph_def(),\r\n    out_node_names  # All other operations relying on this will also be saved\r\n)\r\n```\r\n\r\n3. Serialize and write the output graph and trained weights to the file system:\r\n\r\n```\r\noutput_file = \"model_file.pb\"\r\nwith tf.gfile.GFile(output_file, \"wb\") as f:\r\n    f.write(output_graph_def.SerializeToString())\r\n``` \r\n\r\nA concrete working example, including how to use the converted model for prediction can be found \r\n[here](https://github.com/bshao001/DmsMsgRcg/blob/master/textdect/convertmodel.py). This example was based on tf.keras\r\nin TensorFlow 1.4.\r\n\r\n\u003ca name=\"multiFreezed\"\u003e\u003c/a\u003e\r\n### Deploy Multiple Freezed Models\r\nAs explained above, there is often a need to deploy multiple models. With the help of the two above sections, you can\r\nfreeze a trained model in TensorFlow, and convert a trained Keras model to a model in TensorFlow. So, now if you can \r\ndeploy multiple freezed models, you can actually deploy multiple models trained in TensorFlow or Keras (including the\r\ntf.keras in TF 1.4).\r\n\r\n1. Load the freezed or converted model\r\n\r\n```python\r\nimport tensorflow as tf\r\n    \r\nfrozen_model = \"/trained/model_ckpt.pb\"\r\n# Change this to use a specific prefix for all variables/tensors in this model. If the model has already a prefix\r\n# during the training time, let name=\"\" in the tf.import_graph_def() function.\r\nmodel_scope = \"model_prefix\"  \r\n    \r\nwith tf.gfile.GFile(frozen_model, \"rb\") as f:\r\n    graph_def = tf.GraphDef()\r\n    graph_def.ParseFromString(f.read())\r\n\r\n    # This model_scope adds a prefix to all the nodes in the graph\r\n    tf.import_graph_def(graph_def, input_map=None, return_elements=None, name=\"{}/\".format(model_scope))\r\n```\r\n\r\nYou normally need to identify the input and output of each model, and get the tensor. Then use session.run() for the\r\nprediction.\r\n\r\n2. Create the graph and session and pass the parameters to each model\r\n\r\nIn order to load mutliple models, you need to create the graph and session outside of the process of loading each model.\r\nSomething like this:\r\n\r\n```\r\nwith tf.Graph().as_default() as graph:\r\n    # Invoke model 1 constructor\r\n    # Invoke model 2 constructor\r\n    # Invoke model 3 constructor if you need\r\n    \r\nwith tf.Session(graph=graph) as sess:\r\n    # Run model 1 prediction\r\n    # Run model 2 prediction\r\n    # Run model 3 prediction if you need\r\n```\r\n\r\nA concrete working example can be found in my [DmsMsgRcg](https://github.com/bshao001/DmsMsgRcg) project.\r\n1. Model constructor and its prediction method for converted models from Keras: https://github.com/bshao001/DmsMsgRcg/blob/master/textdect/convertmodel.py\r\n2. Model constructor and its prediction method for freezed models in TensorFlow: https://github.com/bshao001/DmsMsgRcg/blob/master/misc/freezemodel.py\r\n3. Put everything together to load multiple models and run predictions: https://github.com/bshao001/DmsMsgRcg/blob/master/mesgclsf/msgclassifier.py\r\n                            \r\n\u003ca name=\"webServices\"\u003e\u003c/a\u003e\r\n### Serve a Model via Web Services\r\nAlthough this does not directly relate to the problem of how to serve a trained model in TensorFlow, it is a \r\ncommonly encountered issue. \r\n\r\nWe train a machine learning model using python and TensorFlow, however, we often need to make use of the model \r\nto provide services to other different environments, such as a web application or a mobile application, or using \r\ndifferent programming languages, such as Java or C#.\r\n\r\nBoth REST API and SOAP API can meet your needs on this. REST API is relatively light-weighted, but SOAP API is \r\nnot that complicated either. You can pick any of them based on your personal preferences. \r\n   \r\nAs there are many online tutorials talking about the technical details of how REST and SOAP work, I only want to \r\nprovide concrete working examples to illustrate the approaches.\r\n\r\n- REST API\r\n\r\nAn example based on Flask framework in Python as the server, while Java and Tomcat as the client can be found in\r\nmy [ChatLearner Project](https://github.com/bshao001/ChatLearner/tree/master/webui_alternative).\r\n\r\n- SOAP API\r\n\r\nAn example based on Tornado web server in Python as the server, while Java and Tomcat as the client can also be \r\nfound in my [ChatLearner Project](https://github.com/bshao001/ChatLearner/tree/master/webui).\r\n\r\n# References:\r\n1. http://cv-tricks.com/how-to/freeze-tensorflow-models/\r\n2. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/inspect_checkpoint.py\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbshao001%2FTF-Model-Deploy-Tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbshao001%2FTF-Model-Deploy-Tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbshao001%2FTF-Model-Deploy-Tutorial/lists"}