{"id":26947154,"url":"https://github.com/lsds/crossbow","last_synced_at":"2025-04-02T20:18:08.689Z","repository":{"id":45108669,"uuid":"163797653","full_name":"lsds/Crossbow","owner":"lsds","description":"Crossbow: A Multi-GPU Deep Learning System for Training with Small Batch Sizes","archived":false,"fork":false,"pushed_at":"2022-10-05T19:19:19.000Z","size":609,"stargazers_count":55,"open_issues_count":7,"forks_count":6,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-03-27T08:48:28.986Z","etag":null,"topics":["deep-learning","gpu-acceleration","machine-learning","training"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lsds.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-01-02T05:27:24.000Z","updated_at":"2024-03-13T04:51:04.000Z","dependencies_parsed_at":"2023-01-19T07:15:21.938Z","dependency_job_id":null,"html_url":"https://github.com/lsds/Crossbow","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsds%2FCrossbow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsds%2FCrossbow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsds%2FCrossbow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsds%2FCrossbow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lsds","download_url":"https://codeload.github.com/lsds/Crossbow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246884767,"owners_count":20849554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","gpu-acceleration","machine-learning","training"],"created_at":"2025-04-02T20:18:08.072Z","updated_at":"2025-04-02T20:18:08.678Z","avatar_url":"https://github.com/lsds.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Crossbow: A Multi-GPU Deep Learning System for Training with Small Batch Sizes\n\n**Crossbow** is a multi-GPU system for training deep learning models that\n  allows users to choose freely their preferred batch size, however small,\n  while scaling to multiple GPUs. \n  \n**Crossbow** utilises modern GPUs better than other systems by training multiple  _model replicas_ on the same GPU. When the batch size is sufficiently small to leave GPU resources unused, **Crossbow** trains a second model replica, a third, etc., as long as training throughput increases.\n\nTo synchronise many model replicas, **Crossbow** uses _synchronous model averaging_ to adjust the trajectory of each individual replica based on the average of all. With model averaging, the batch size does not increase linearly with the number of model replicas, as it would with synchronous SGD. This yields better statistical efficiency without cumbersome hyper-parameter tuning when trying to scale training to a larger number of GPUs.\n\nSee our [VLDB 2019 paper](http://www.vldb.org/pvldb/vol12/p1399-koliousis.pdf) for more details.\n\nThe system supports a variety of training algorithms, including synchronous SGD. We are working to seemlesly port existing TensorFlow models to Crossbow. \n\n## Installing Crossbow\n\n### Prerequisites\n\n**Crossbow** has been primarily tested on Ubuntu Linux 16.04. It requires the following Linux packages:\n\n```shell\n$ sudo apt-get install build-essential git openjdk-8-jdk maven libboost-all-dev graphviz wget\n```\n \n**Crossbow** requires NVIDIA's [CUDA](https://developer.nvidia.com/cuda-toolkit) toolkit, the [cuDDN](https://developer.nvidia.com/cudnn) library and the [NCCL](https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html) library (currently using versions 8.0, 6.0, and 2.1.15, respectively). After successful installation, make sure that:\n\n* `CUDA_HOME` is set (the default location is `/usr/local/cuda`)\n* `NCCL_HOME` is set\n\nand that:\n\n* `PATH` includes `$CUDA_HOME/bin` and\n* `LD_LIBRARY_PATH` includes `$CUDA_HOME/lib64` and `$NCCL_HOME/lib`\n\n**Crossbow** also requires the [OpenBLAS](https://github.com/xianyi/OpenBLAS.git) and [libjpeg-turbo](https://github.com/libjpeg-turbo/libjpeg-turbo) libraries. After successful installation, make sure that:\n\n* `BLAS_HOME` is set (the default location is `/opt/OpenBLAS`)\n* `JPEG_HOME` is set\n\nand that:\n\n* `LD_LIBRARY_PATH` includes `$BLAS_HOME/lib` and `$JPEG_HOME/lib`\n\n### Configure OS\n\n**Crossbow** uses page-locked memory regions to speed up data transfers from CPU to GPU and vice versa. The amount of memory locked by the system usually exceeds the default OS limit. Edit `/etc/security/limits.conf` and append the following lines to the end of the file:\n\n```\n*\thard\tmemlock\tunlimited\n* \tsoft\tmemlock\tunlimited\n```\n\nSave changes and reboot the machine.\n\n### Building Crossbow\n\nAssuming all enviromental variables have been set, build Crossbow's Java and C/C++ library:\n\n```shell\n$ git clone http://github.com/lsds/Crossbow.git\n$ cd Crossbow\n$ export CROSSBOW_HOME=`pwd`\n$ ./scripts/build.sh\n```\n\n_**Note:** We will shortly add an installation script as well as a Docker image to simplify the installation process and avoid library conflicts._\n\n## Training one of our benchmark models\n\n### ResNet-50\n\n**Crossbow** serialises [ImageNet](http://www.image-net.org) images and their labels into a binary format similar to TensorFlow's TFRecord. Follow [TensorFlow's instructions](https://github.com/tensorflow/models/blob/master/research/inception/README.md#getting-started) to download and convert the dataset to TFRecord format. You will end up with 1,024 training and 128 validation record files in a directory of your choice (say, `/data/imagenet/tfrecords`). Then, run:\n\n```shell\n$ cd $CROSSBOW_HOME\n$ ./scripts/datasets/imagenet/prepare-imagenet.sh /data/imagenet/tfrecords /data/imagenet/crossbow\n```\n\nThe script  will convert TensorFlow's record files to Crossbow's own binary format and store them in `/data/imagenet/crossbow`. You are now ready to train ResNet-50 with the ImageNet data set:\n\n```shell\n$ ./scripts/benchmarks/resnet-50.sh\n```\n\n### LeNet\n\nThe first script downloads the [MNIST](http://yann.lecun.com/exdb/mnist/) data set and converts it to Crossbow's binary record format. Output files are written in `$CROSSBOW_HOME/data/mnist/b-001` and they are tailored to a specific batch size (in this case, 1). The second script will train LeNet with the  MNIST data set.\n\n```shell\n$ cd $CROSSBOW_HOME\n$ ./scripts/datasets/mnist/prepare-mnist.sh\n$ ./scripts/benchmarks/lenet.sh\n```\n\n### Others\n\n**Crossbow** supports the entire ResNet family of neural networks. It also supports VGG-16 based on the implementation [here](https://github.com/geifmany/cifar-vgg). It supports the [convnet-benchmarks](https://github.com/soumith/convnet-benchmarks) suite of micro-benchmarks too.\n\n_**Note:** We will shortly add a page describing how to configure Crossbow's system parameters._\n\n## Trying your first Crossbow program\n\n**Crossbow** represents a deep learning application as a data flow graph: nodes\n  represent operations and edges the data (multi-dimensional arrays, also known\n  as _tensors_) that flow among them. The most notable operators are\n  inner-product, pooling, convolutional layers and activation functions. Some of these operators have _learnable_ parameters (also multi-dimensional arrays) that form part of the model being trained. An inner-product operator, for example, has two learnable parameters, `weights` and `bias`:\n\n```java\nInnerProductConf conf = new InnerProductConf ();\n\n/* Let's assume that there are 10 possible output labels, as in MNIST */\nconf.setNumberOfOutputs (10);\n\n/* Initialise weights with values drawn a random Gaussian distribution; \n * and all of bias elements with the same value */\nconf.setWeightInitialiser (new InitialiserConf ().setType (InitialiserType.GAUSSIAN).setStd(0.1F));\nconf.setBiasInitialiser   (new InitialiserConf ().setType (InitialiserType.CONSTANT).setValue(1F));\n\n/* Create inner-product operator and wrap it in a graph node */\nOperator op = new Operator (\"InnerProduct\", new InnerProduct (conf));\nDataflowNode innerproduct = new DataflowNode (op);\n```\n\nConnect data flow nodes together to form a neural network. For example, we can connect the forward layers of a logistic regression model:\n\n```java\ninnerproduct.connectTo(softmax).connectTo(loss);\n```\n\nAt the end, we can construct our model and train it for 1 epoch:\n\n```java\nSubGraph subgraph = new SubGraph (innerproduct);\nDataflow dataflow = new Dataflow (subgraph).setPhase(Phase.TRAIN);\nExecutionContext context = new ExecutionContext (new Dataflow [] { dataflow, null });\ncontext.init();\ncontext.train(1, TrainingUnit.EPOCHS);\n```\n\nThe full source code is available [here](src/test/java/uk/ac/imperial/lsds/crossbow/LogisticRegression.java).\n\n## For more information\n\n* [LSDS Website](https://www.lsds.doc.ic.ac.uk) \n\n## Licence\n\n[Apache License 2.0](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsds%2Fcrossbow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flsds%2Fcrossbow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsds%2Fcrossbow/lists"}