Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/4paradigm/openembedding
OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.
https://github.com/4paradigm/openembedding
distributed-training embedding-layers model-parallel parameter-server tensorflow tensorflow-training
Last synced: 2 months ago
JSON representation
OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.
- Host: GitHub
- URL: https://github.com/4paradigm/openembedding
- Owner: 4paradigm
- License: apache-2.0
- Created: 2021-07-07T06:37:15.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-04-13T06:56:51.000Z (over 1 year ago)
- Last Synced: 2024-10-09T12:33:31.264Z (2 months ago)
- Topics: distributed-training, embedding-layers, model-parallel, parameter-server, tensorflow, tensorflow-training
- Language: C++
- Homepage:
- Size: 1.75 MB
- Stars: 30
- Watchers: 7
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# OpenEmbedding
[![build status](https://github.com/4paradigm/openembedding/actions/workflows/build.yml/badge.svg)](https://github.com/4paradigm/openembedding/actions/workflows/build.yml)
[![docker pulls](https://img.shields.io/docker/pulls/4pdosc/openembedding.svg)](https://hub.docker.com/r/4pdosc/openembedding)
[![python version](https://img.shields.io/pypi/pyversions/openembedding.svg?style=plastic)](https://badge.fury.io/py/openembedding)
[![pypi package version](https://badge.fury.io/py/openembedding.svg)](https://badge.fury.io/py/openembedding)
[![downloads](https://pepy.tech/badge/openembedding)](https://pepy.tech/project/openembedding)English version | [中文版](README_cn.md)
## About
**OpenEmbedding is an open-source framework for TensorFlow distributed training acceleration.**
Nowadays, many machine learning and deep learning applications are built based on parameter servers, which are used to efficiently store and update model weights. When a model has a large number of sparse features (e.g., Wide&Deep and DeepFM for CTR prediction), the number of weights easily runs into billions to trillions. In such a case, the tradition synchronization solutions (such as the Allreduce-based solution adopted by Horovod) are unable to achieve high-performance because of massive communication overhead introduced by a tremendous number of sparse features. In order to achieve efficiency for such sparse models, we develop OpenEmbedding, which enhances the parameter server especially for the sparse model training and inference.
## Highlights
Efficiency
- We propose an efficient customized sparse format to handle sparse features. Together with our fine-grained optimization, such as cache-conscious algorithms, asynchronous cache read and write, and lightweight locks to maximize parallelism. OpenEmbedding is able to achieve the performance speedup of 3-8x compared with the Allreduce-based solution on a single machine equipped with 8 GPUs for sparse model training.Ease-of-use
- We have integrated OpenEmbedding into Tensorflow. Only three lines of code changes are required to utilize OpenEmbedding in Tensorflow for both training and inference.Adaptability
- In addition to Tensorflow, it is straightforward to integrate OpenEmbedding into other popular frameworks. We have demonstrated the integration with DeepCTR and Horovod in the examples.## Benchmark
![benchmark](documents/images/benchmark.png)
For models that contain sparse features, it is difficult to speed up using the Allreduce-based framework Horovod. Using both OpenEmbedding and Horovod can get better acceleration effects. In the single 8 GPU scene, the speedup ratio is 3 to 8 times. Many models achieved 3 to 7 times the performance of Horovod.
- [Benchmark](documents/en/benchmark.md)
## Install & Quick Start
You can install and run OpenEmbedding by the following steps. The examples show the whole process of training [criteo](https://labs.criteo.com/2014/09/kaggle-contest-dataset-now-available-academic-use/) data with OpenEmbedding and predicting with Tensorflow Serving.
### Docker
NVIDIA docker is required to use GPU in image. The OpenEmbedding image can be obtained from [Docker Hub](https://hub.docker.com/r/4pdosc/openembedding/tags).
```bash
# The script "criteo_deepctr_stanalone.sh" will train and export the model to the path "tmp/criteo/1".
# It is okay to switch to:
# "criteo_deepctr_horovod.sh" (multi-GPU training with Horovod),
# "criteo_deepctr_mirrored.sh" (multi-GPU training with MirroredStrategy),
# "criteo_deepctr_mpi.sh" (multi-GPU training with MultiWorkerMirroredStrategy and MPI).
docker run --rm --gpus all -v /tmp/criteo:/openembedding/tmp/criteo \
4pdosc/openembedding:latest examples/run/criteo_deepctr_standalone.sh# Start TensorFlow Serving to load the trained model.
docker run --name serving-example -td -p 8500:8500 -p 8501:8501 \
-v /tmp/criteo:/models/criteo -e MODEL_NAME=criteo tensorflow/serving:latest
# Wait the model server start.
sleep 5# Send requests and get predict results.
docker run --rm --network host 4pdosc/openembedding:latest examples/run/criteo_deepctr_restful.sh# Clear docker.
docker stop serving-example
docker rm serving-example
```### Ubuntu
```bash
# Install the dependencies required by OpenEmbedding.
apt update && apt install -y gcc-7 g++-7 python3 libpython3-dev python3-pip
pip3 install --upgrade pip
pip3 install tensorflow==2.5.1
pip3 install openembedding# Install the dependencies required by examples.
apt install -y git cmake mpich curl
HOROVOD_WITHOUT_MPI=1 pip3 install horovod
pip3 install deepctr pandas scikit-learn mpi4py# Download the examples.
git clone https://github.com/4paradigm/OpenEmbedding.git
cd OpenEmbedding# The script "criteo_deepctr_stanalone.sh" will train and export the model to the path "tmp/criteo/1".
# It is okay to switch to:
# "criteo_deepctr_horovod.sh" (multi-GPU training with Horovod),
# "criteo_deepctr_mirrored.sh" (multi-GPU training with MirroredStrategy),
# "criteo_deepctr_mpi.sh" (multi-GPU training with MultiWorkerMirroredStrategy and MPI).
examples/run/criteo_deepctr_standalone.sh# Start TensorFlow Serving to load the trained model.
docker run --name serving-example -td -p 8500:8500 -p 8501:8501 \
-v `pwd`/tmp/criteo:/models/criteo -e MODEL_NAME=criteo tensorflow/serving:latest
# Wait the model server start.
sleep 5# Send requests and get predict results.
examples/run/criteo_deepctr_restful.sh# Clear docker.
docker stop serving-example
docker rm serving-example
```### CentOS
```bash
# Install the dependencies required by OpenEmbedding.
yum install -y centos-release-scl
yum install -y python3 python3-devel devtoolset-7
scl enable devtoolset-7 bash
pip3 install --upgrade pip
pip3 install tensorflow==2.5.1
pip3 install openembedding# Install the dependencies required by examples.
yum install -y git cmake mpich curl
HOROVOD_WITHOUT_MPI=1 pip3 install horovod
pip3 install deepctr pandas scikit-learn mpi4py# Download the examples.
git clone https://github.com/4paradigm/OpenEmbedding.git
cd OpenEmbedding# The script "criteo_deepctr_stanalone.sh" will train and export the model to the path "tmp/criteo/1".
# It is okay to switch to:
# "criteo_deepctr_horovod.sh" (multi-GPU training with Horovod),
# "criteo_deepctr_mirrored.sh" (multi-GPU training with MirroredStrategy),
# "criteo_deepctr_mpi.sh" (multi-GPU training with MultiWorkerMirroredStrategy and MPI).
examples/run/criteo_deepctr_standalone.sh# Start TensorFlow Serving to load the trained model.
docker run --name serving-example -td -p 8500:8500 -p 8501:8501 \
-v `pwd`/tmp/criteo:/models/criteo -e MODEL_NAME=criteo tensorflow/serving:latest
# Wait the model server start.
sleep 5# Send requests and get predict results.
examples/run/criteo_deepctr_restful.sh# Clear docker.
docker stop serving-example
docker rm serving-example
```### Note
The installation usually requires g++ 7 or higher, or a compiler compatible with `tf.version.COMPILER_VERSION`. The compiler can be specified by environment variable `CC` and `CXX`. Currently OpenEmbedding can only be installed on linux.
```bash
CC=gcc CXX=g++ pip3 install openembedding
```If TensorFlow was updated, you need to reinstall OpenEmbedding.
```bash
pip3 uninstall openembedding && pip3 install --no-cache-dir openembedding
```## User Guide
A sample program for common usage is as follows.
Create `Model` and `Optimizer`.
```python
import tensorflow as tf
import deepctr.models import WDL
optimizer = tf.keras.optimizers.Adam()
model = WDL(feature_columns, feature_columns, task='binary')
```Transform to distributed `Model` and distributed `Optimizer`. The `Embedding` layer will be stored on the parameter server.
```python
import horovod as hvd
import openembedding.tensorflow as embed
hvd.init()optimizer = embed.distributed_optimizer(optimizer)
optimizer = hvd.DistributedOptimizer(optimizer)model = embed.distributed_model(model)
```
Here, `embed.distributed_optimizer` is used to convert the TensorFlow optimizer into an optimizer that supports the parameter server, so that the parameters on the parameter server can be updated. The function `embed.distributed_model` is to replace the `Embedding` layers in the model and override the methods to support saving and loading with parameter servers. Method `Embedding.call` will pull the parameters from the parameter server and the backpropagation function was registered to push the gradients to the parameter server.Data parallelism by Horovod.
```python
model.compile(optimizer, "binary_crossentropy", metrics=['AUC'],
experimental_run_tf_function=False)
callbacks = [ hvd.callbacks.BroadcastGlobalVariablesCallback(0),
hvd.callbacks.MetricAverageCallback() ]
model.fit(dataset, epochs=10, verbose=2, callbacks=callbacks)
```Export as a stand-alone SavedModel so that can be loaded by TensorFlow Serving.
```python
if hvd.rank() == 0:
# Must specify include_optimizer=False explicitly
model.save_as_original_model('model_path', include_optimizer=False)
```More examples as follows.
- [Replace `Embedding` layer](examples/criteo_deepctr_hook.py)
- [Transform network model](examples/criteo_deepctr_network.py)
- [Custom subclass model](examples/criteo_lr_subclass.py)
- [With MirroredStrategy](examples/criteo_deepctr_network_mirrored.py)
- [With MultiWorkerMirroredStrategy and MPI](examples/criteo_deepctr_network_mirrored.py)## Build
### Docker Build
```
docker build -t 4pdosc/openembedding-base:0.1.0 -f docker/Dockerfile.base .
docker build -t 4pdosc/openembedding:0.0.0-build -f docker/Dockerfile.build .
```### Native Build
The compiler needs to be compatible with `tf.version.COMPILER_VERSION` (>= 7), and install all [prpc](https://github.com/4paradigm/prpc) dependencies to `tools` or `/usr/local`, and then run `build.sh` to complete the compilation. The `build.sh` will automatically install prpc (pico-core) and parameter-server (pico-ps) to the `tools` directory.
```bash
git submodule update --init --checkout --recursive
pip3 install tensorflow
./build.sh clean && ./build.sh build
pip3 install ./build/openembedding-*.tar.gz
```## Features
TensorFlow 2
- `dtype`: `float32`, `float64`.
- `tensorflow.keras.initializers`
- `RandomNormal`, `RandomUniform`, `Constant`, `Zeros`, `Ones`.
- The parameter `seed` is currently ignored.
- `tensorflow.keras.optimizers`
- `Adadelta`, `Adagrad`, `Adam`, `Adamax`, `Ftrl`, `RMSprop`, `SGD`.
- `decay` and `LearningRateSchedule` are not supported.
- `Adam(amsgrad=True)` is not supported.
- `RMSProp(centered=True)` is not supported.
- The parameter server uses a sparse update method, which may cause different training results for the `Optimizer` with momentum.
- `tensorflow.keras.layers.Embedding`
- Support array for known `input_dim` and hash table for unknown `input_dim` (2**63 range).
- Can still be stored on workers and use dense update method.
- Should not use `embeddings_regularizer`, `embeddings_constraint`.
- `tensorflow.keras.Model`
- Can be converted to distributed `Model` and automatically ignore or convert incompatible settings (such as `embeddings_constraint`).
- Distributed `save`, `save_weights`, `load_weights` and `ModelCheckpoint`.
- Saving the distributed `Model` as a stand-alone SavedModel, which can be load by TensorFlow Serving.
- Do not support training multiple distributed `Model`s in one task.
- Can collaborate with Horovod. Training with `MirroredStrategy` or `MultiWorkerMirroredStrategy` is experimental.## TODO
- Improve performance
- Support PyTorch training
- Support `tf.feature_column.embedding_column`
- Approximate `embedding_regularizer`, `LearningRateSchedule` and etc.
- Improve the support for `Initializer` and `Optimizer`
- Training multiple distributed `Model`s in one task
- Support ONNX## Designs
- [Training](documents/en/training.md)
- [Serving](documents/en/serving.md)## Authors
- Yiming Liu ([email protected])
- Yilin Wang ([email protected])
- Cheng Chen ([email protected])
- Guangchuan Shi ([email protected])
- Zhao Zheng ([email protected])## Persistent Memory (PMem)
Currently, the interface for persistent memory is experimental.
PMem-based OpenEmbedding provides a lightweight checkpointing scheme as well as the comparable performance with its DRAM version. For long-running deep learning recommendation model training, PMem-based OpenEmbedding provides not only an efficient but also a reliable training process.
- [PMem-based OpenEmbedding](documents/en/pmem.md)## Publications
- [OpenEmbedding: A Distributed Parameter Server for Deep Learning Recommendation Models using Persistent Memory](documents/papers/openembedding_icde2023.pdf). Cheng Chen, Yilin Wang, Jun Yang, Yiming Liu, Mian Lu, Zhao Zheng, Bingsheng He, Weng-Fai Wong, Liang You, Penghao Sun, Yuping Zhao, Fenghua Hu, and Andy Rudoff. In 2023 IEEE 39rd International Conference on Data Engineering (ICDE) 2023.