{"id":15538613,"url":"https://github.com/4paradigm/openembedding","last_synced_at":"2026-03-27T02:30:20.116Z","repository":{"id":49386720,"uuid":"383698887","full_name":"4paradigm/OpenEmbedding","owner":"4paradigm","description":"OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.","archived":false,"fork":false,"pushed_at":"2023-04-13T06:56:51.000Z","size":1839,"stargazers_count":31,"open_issues_count":0,"forks_count":6,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-23T11:03:54.042Z","etag":null,"topics":["distributed-training","embedding-layers","model-parallel","parameter-server","tensorflow","tensorflow-training"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/4paradigm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-07T06:37:15.000Z","updated_at":"2024-10-10T02:43:02.000Z","dependencies_parsed_at":"2024-11-15T03:43:47.642Z","dependency_job_id":"28d3ce8a-5026-41be-a3c4-736528a7a6d4","html_url":"https://github.com/4paradigm/OpenEmbedding","commit_stats":{"total_commits":105,"total_committers":10,"mean_commits":10.5,"dds":0.5047619047619047,"last_synced_commit":"1e540f5c0e458ac51193f2008c07894100a71bdd"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4paradigm%2FOpenEmbedding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4paradigm%2FOpenEmbedding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4paradigm%2FOpenEmbedding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4paradigm%2FOpenEmbedding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/4paradigm","download_url":"https://codeload.github.com/4paradigm/OpenEmbedding/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250458634,"owners_count":21433911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-training","embedding-layers","model-parallel","parameter-server","tensorflow","tensorflow-training"],"created_at":"2024-10-02T12:05:09.557Z","updated_at":"2026-03-27T02:30:20.084Z","avatar_url":"https://github.com/4paradigm.png","language":"C++","readme":"# OpenEmbedding\n\n[![build status](https://github.com/4paradigm/openembedding/actions/workflows/build.yml/badge.svg)](https://github.com/4paradigm/openembedding/actions/workflows/build.yml)\n[![docker pulls](https://img.shields.io/docker/pulls/4pdosc/openembedding.svg)](https://hub.docker.com/r/4pdosc/openembedding)\n[![python version](https://img.shields.io/pypi/pyversions/openembedding.svg?style=plastic)](https://badge.fury.io/py/openembedding)\n[![pypi package version](https://badge.fury.io/py/openembedding.svg)](https://badge.fury.io/py/openembedding)\n[![downloads](https://pepy.tech/badge/openembedding)](https://pepy.tech/project/openembedding)\n\nEnglish version | [中文版](README_cn.md)\n\n\n## About\n\n**OpenEmbedding is an open-source framework for TensorFlow distributed training acceleration.**\n\nNowadays, many machine learning and deep learning applications are built based on parameter servers, which are used to efficiently store and update model weights. When a model has a large number of sparse features (e.g., Wide\u0026Deep and DeepFM for CTR prediction), the number of weights easily runs into billions to trillions. In such a case, the tradition synchronization solutions (such as the Allreduce-based solution adopted by Horovod) are unable to achieve high-performance because of massive communication overhead introduced by a tremendous number of sparse features. In order to achieve efficiency for such sparse models, we develop OpenEmbedding, which enhances the parameter server especially for the sparse model training and inference.\n\n## Highlights\n\nEfficiency\n- We propose an efficient customized sparse format to handle sparse features. Together with our fine-grained optimization, such as cache-conscious algorithms, asynchronous cache read and write, and lightweight locks to maximize parallelism. OpenEmbedding is able to achieve the performance speedup of 3-8x compared with the Allreduce-based solution on a single machine equipped with 8 GPUs for sparse model training.\n\nEase-of-use\n- We have integrated OpenEmbedding into Tensorflow. Only three lines of code changes are required to utilize OpenEmbedding in Tensorflow for both training and inference.\n\nAdaptability\n- In addition to Tensorflow, it is straightforward to integrate OpenEmbedding into other popular frameworks. We have demonstrated the integration with DeepCTR and Horovod in the examples.\n\n## Benchmark\n\n![benchmark](documents/images/benchmark.png)\n\nFor models that contain sparse features, it is difficult to speed up using the Allreduce-based framework Horovod. Using both OpenEmbedding and Horovod can get better acceleration effects. In the single 8 GPU scene, the speedup ratio is 3 to 8 times. Many models achieved 3 to 7 times the performance of Horovod.\n\n- [Benchmark](documents/en/benchmark.md)\n\n## Install \u0026 Quick Start\n\nYou can install and run OpenEmbedding by the following steps. The examples show the whole process of training [criteo](https://labs.criteo.com/2014/09/kaggle-contest-dataset-now-available-academic-use/) data with OpenEmbedding and predicting with Tensorflow Serving.\n\n### Docker\n\nNVIDIA docker is required to use GPU in image. The OpenEmbedding image can be obtained from [Docker Hub](https://hub.docker.com/r/4pdosc/openembedding/tags).\n\n```bash\n# The script \"criteo_deepctr_stanalone.sh\" will train and export the model to the path \"tmp/criteo/1\".\n# It is okay to switch to:\n#    \"criteo_deepctr_horovod.sh\" (multi-GPU training with Horovod),\n#    \"criteo_deepctr_mirrored.sh\" (multi-GPU training with MirroredStrategy),\n#    \"criteo_deepctr_mpi.sh\" (multi-GPU training with MultiWorkerMirroredStrategy and MPI).\ndocker run --rm --gpus all -v /tmp/criteo:/openembedding/tmp/criteo \\\n    4pdosc/openembedding:latest examples/run/criteo_deepctr_standalone.sh \n\n# Start TensorFlow Serving to load the trained model.\ndocker run --name serving-example -td -p 8500:8500 -p 8501:8501 \\\n        -v /tmp/criteo:/models/criteo -e MODEL_NAME=criteo tensorflow/serving:latest\n# Wait the model server start.\nsleep 5\n\n# Send requests and get predict results.\ndocker run --rm --network host 4pdosc/openembedding:latest examples/run/criteo_deepctr_restful.sh\n\n# Clear docker.\ndocker stop serving-example\ndocker rm serving-example\n```\n\n### Ubuntu\n\n```bash\n# Install the dependencies required by OpenEmbedding.\napt update \u0026\u0026 apt install -y gcc-7 g++-7 python3 libpython3-dev python3-pip\npip3 install --upgrade pip\npip3 install tensorflow==2.5.1\npip3 install openembedding\n\n# Install the dependencies required by examples.\napt install -y git cmake mpich curl \nHOROVOD_WITHOUT_MPI=1 pip3 install horovod\npip3 install deepctr pandas scikit-learn mpi4py\n\n# Download the examples.\ngit clone https://github.com/4paradigm/OpenEmbedding.git\ncd OpenEmbedding\n\n# The script \"criteo_deepctr_stanalone.sh\" will train and export the model to the path \"tmp/criteo/1\".\n# It is okay to switch to:\n#    \"criteo_deepctr_horovod.sh\" (multi-GPU training with Horovod),\n#    \"criteo_deepctr_mirrored.sh\" (multi-GPU training with MirroredStrategy),\n#    \"criteo_deepctr_mpi.sh\" (multi-GPU training with MultiWorkerMirroredStrategy and MPI).\nexamples/run/criteo_deepctr_standalone.sh \n\n# Start TensorFlow Serving to load the trained model.\ndocker run --name serving-example -td -p 8500:8500 -p 8501:8501 \\\n        -v `pwd`/tmp/criteo:/models/criteo -e MODEL_NAME=criteo tensorflow/serving:latest\n# Wait the model server start.\nsleep 5\n\n# Send requests and get predict results.\nexamples/run/criteo_deepctr_restful.sh\n\n# Clear docker.\ndocker stop serving-example\ndocker rm serving-example\n```\n\n### CentOS\n\n```bash\n# Install the dependencies required by OpenEmbedding.\nyum install -y centos-release-scl\nyum install -y python3 python3-devel devtoolset-7\nscl enable devtoolset-7 bash\npip3 install --upgrade pip\npip3 install tensorflow==2.5.1\npip3 install openembedding\n\n# Install the dependencies required by examples.\nyum install -y git cmake mpich curl \nHOROVOD_WITHOUT_MPI=1 pip3 install horovod\npip3 install deepctr pandas scikit-learn mpi4py\n\n# Download the examples.\ngit clone https://github.com/4paradigm/OpenEmbedding.git\ncd OpenEmbedding\n\n# The script \"criteo_deepctr_stanalone.sh\" will train and export the model to the path \"tmp/criteo/1\".\n# It is okay to switch to:\n#    \"criteo_deepctr_horovod.sh\" (multi-GPU training with Horovod),\n#    \"criteo_deepctr_mirrored.sh\" (multi-GPU training with MirroredStrategy),\n#    \"criteo_deepctr_mpi.sh\" (multi-GPU training with MultiWorkerMirroredStrategy and MPI).\nexamples/run/criteo_deepctr_standalone.sh \n\n# Start TensorFlow Serving to load the trained model.\ndocker run --name serving-example -td -p 8500:8500 -p 8501:8501 \\\n        -v `pwd`/tmp/criteo:/models/criteo -e MODEL_NAME=criteo tensorflow/serving:latest\n# Wait the model server start.\nsleep 5\n\n# Send requests and get predict results.\nexamples/run/criteo_deepctr_restful.sh\n\n# Clear docker.\ndocker stop serving-example\ndocker rm serving-example\n```\n\n### Note\n\nThe installation usually requires g++ 7 or higher, or a compiler compatible with `tf.version.COMPILER_VERSION`. The compiler can be specified by environment variable `CC` and `CXX`. Currently OpenEmbedding can only be installed on linux.\n```bash\nCC=gcc CXX=g++ pip3 install openembedding\n```\n\nIf TensorFlow was updated, you need to reinstall OpenEmbedding.\n```bash\npip3 uninstall openembedding \u0026\u0026 pip3 install --no-cache-dir openembedding\n```\n\n## User Guide\n\nA sample program for common usage is as follows.\n\nCreate `Model` and `Optimizer`.\n```python\nimport tensorflow as tf\nimport deepctr.models import WDL\noptimizer = tf.keras.optimizers.Adam()\nmodel = WDL(feature_columns, feature_columns, task='binary')\n```\n\nTransform to distributed `Model` and distributed `Optimizer`. The `Embedding` layer will be stored on the parameter server.\n```python\nimport horovod as hvd\nimport openembedding.tensorflow as embed\nhvd.init()\n\noptimizer = embed.distributed_optimizer(optimizer)\noptimizer = hvd.DistributedOptimizer(optimizer)\n\nmodel = embed.distributed_model(model)\n```\nHere, `embed.distributed_optimizer` is used to convert the TensorFlow optimizer into an optimizer that supports the parameter server, so that the parameters on the parameter server can be updated. The function `embed.distributed_model` is to replace the `Embedding` layers in the model and override the methods to support saving and loading with parameter servers. Method `Embedding.call` will pull the parameters from the parameter server and the backpropagation function was registered to push the gradients to the parameter server.\n\nData parallelism by Horovod.\n```python\nmodel.compile(optimizer, \"binary_crossentropy\", metrics=['AUC'],\n              experimental_run_tf_function=False)\ncallbacks = [ hvd.callbacks.BroadcastGlobalVariablesCallback(0),\n              hvd.callbacks.MetricAverageCallback() ]\nmodel.fit(dataset, epochs=10, verbose=2, callbacks=callbacks)\n```\n\nExport as a stand-alone SavedModel so that can be loaded by TensorFlow Serving.\n```python\nif hvd.rank() == 0:\n    # Must specify include_optimizer=False explicitly\n    model.save_as_original_model('model_path', include_optimizer=False)\n```\n\nMore examples as follows.\n- [Replace `Embedding` layer](examples/criteo_deepctr_hook.py)\n- [Transform network model](examples/criteo_deepctr_network.py)\n- [Custom subclass model](examples/criteo_lr_subclass.py)\n- [With MirroredStrategy](examples/criteo_deepctr_network_mirrored.py)\n- [With MultiWorkerMirroredStrategy and MPI](examples/criteo_deepctr_network_mirrored.py)\n\n## Build\n\n### Docker Build\n\n```\ndocker build -t 4pdosc/openembedding-base:0.1.0 -f docker/Dockerfile.base .\ndocker build -t 4pdosc/openembedding:0.0.0-build -f docker/Dockerfile.build .\n```\n\n### Native Build\n\nThe compiler needs to be compatible with `tf.version.COMPILER_VERSION` (\u003e= 7), and install all [prpc](https://github.com/4paradigm/prpc) dependencies to `tools` or `/usr/local`, and then run `build.sh` to complete the compilation. The `build.sh` will automatically install prpc (pico-core) and parameter-server (pico-ps) to the `tools` directory.\n\n```bash\ngit submodule update --init --checkout --recursive\npip3 install tensorflow\n./build.sh clean \u0026\u0026 ./build.sh build\npip3 install ./build/openembedding-*.tar.gz\n```\n\n## Features\n\nTensorFlow 2\n- `dtype`: `float32`, `float64`.\n- `tensorflow.keras.initializers`\n  - `RandomNormal`, `RandomUniform`, `Constant`, `Zeros`, `Ones`.\n  - The parameter `seed` is currently ignored.\n- `tensorflow.keras.optimizers`\n  - `Adadelta`, `Adagrad`, `Adam`, `Adamax`, `Ftrl`, `RMSprop`, `SGD`.\n  - `decay` and `LearningRateSchedule` are not supported.\n  - `Adam(amsgrad=True)` is not supported.\n  - `RMSProp(centered=True)` is not supported.\n  - The parameter server uses a sparse update method, which may cause different training results for the `Optimizer` with momentum.\n- `tensorflow.keras.layers.Embedding`\n  - Support array for known `input_dim` and hash table for unknown `input_dim` (2**63 range).\n  - Can still be stored on workers and use dense update method.\n  - Should not use `embeddings_regularizer`, `embeddings_constraint`.\n- `tensorflow.keras.Model`\n  - Can be converted to distributed `Model` and automatically ignore or convert incompatible settings (such as `embeddings_constraint`).\n  - Distributed `save`, `save_weights`, `load_weights` and `ModelCheckpoint`.\n  - Saving the distributed `Model` as a stand-alone SavedModel, which can be load by TensorFlow Serving.\n  - Do not support training multiple distributed `Model`s in one task.\n- Can collaborate with Horovod. Training with `MirroredStrategy` or `MultiWorkerMirroredStrategy` is experimental.\n\n## TODO\n\n- Improve performance\n- Support PyTorch training\n- Support `tf.feature_column.embedding_column`\n- Approximate `embedding_regularizer`, `LearningRateSchedule` and etc.\n- Improve the support for `Initializer` and `Optimizer`\n- Training multiple distributed `Model`s in one task \n- Support ONNX\n\n## Designs\n\n- [Training](documents/en/training.md)\n- [Serving](documents/en/serving.md)\n\n## Authors\n\n- Yiming Liu (liuyiming@4paradigm.com)\n- Yilin Wang (wangyilin@4paradigm.com)\n- Cheng Chen (chencheng@4paradigm.com)\n- Guangchuan Shi (shiguangchuan@4paradigm.com)\n- Zhao Zheng (zhengzhao@4paradigm.com)\n\n## Persistent Memory (PMem)\n\nCurrently, the interface for persistent memory is experimental.\nPMem-based OpenEmbedding provides a lightweight checkpointing scheme as well as the comparable performance with its DRAM version. For long-running deep learning recommendation model training, PMem-based OpenEmbedding provides not only an efficient but also a reliable training process.\n- [PMem-based OpenEmbedding](documents/en/pmem.md)\n\n## Publications\n\n- [OpenEmbedding: A Distributed Parameter Server for Deep Learning Recommendation Models using Persistent Memory](documents/papers/openembedding_icde2023.pdf). Cheng Chen, Yilin Wang, Jun Yang, Yiming Liu, Mian Lu, Zhao Zheng, Bingsheng He, Weng-Fai Wong, Liang You, Penghao Sun, Yuping Zhao, Fenghua Hu, and Andy Rudoff. In 2023 IEEE 39rd International Conference on Data Engineering (ICDE) 2023.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4paradigm%2Fopenembedding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F4paradigm%2Fopenembedding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4paradigm%2Fopenembedding/lists"}