Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/PaddlePaddle/Anakin

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.
https://github.com/PaddlePaddle/Anakin

ai amd arm bitmain cambricon cross-platform high-performance inference-engine intel nvidia

Last synced: 3 months ago
JSON representation

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Awesome Lists containing this project

README

        

# Anakin2.0

[![Build Status](https://travis-ci.org/PaddlePaddle/Anakin.svg?branch=developing)](https://travis-ci.org/PaddlePaddle/Anakin)
[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
[![Coverage Status](https://coveralls.io/repos/github/xklnono/Anakin/badge.svg)](https://coveralls.io/github/xklnono/Anakin)

Welcome to the Anakin GitHub.

Anakin is a cross-platform, high-performance inference engine, which is originally
developed by Baidu engineers and is a large-scale application of industrial products.

Please refer to our [release announcement](https://github.com/PaddlePaddle/Anakin/releases) to track the latest feature of Anakin.

## Features

- **Flexibility**

Anakin is a cross-platform, high-performance inference engine, supports a wide range of neural network architectures and different hardware platforms. It is easy to run Anakin on GPU / x86 / ARM platform.

Anakin has integrated with NVIDIA TensorRT and open source this part of integrated API to provide services, developers can call the API directly or modify it as needed, which will be more flexible for development requirements.

- **High performance**

In order to give full play to the performance of hardware, we optimized the
forward prediction at different levels.
- Automatic graph fusion. The goal of all performance optimizations under a
given algorithm is to make the ALU as busy as possible. Operator fusion
can effectively reduce memory access and keep the ALU busy.

- Memory reuse. Forward prediction is a one-way calculation. We reuse
the memory between the input and output of different operators, thus
reducing the overall memory overhead.

- Assembly level optimization. Saber is a underlying DNN library for Anakin, which
is deeply optimized at assembly level.

## NV GPU Benchmark
### Machine And Enviornment
> CPU: `Intel(R) Xeon(R) CPU 5117 @ 2.0GHz`
> GPU: `Tesla P4`
> cuda: `CUDA8`
> cuDNN: `v7`

* Time:warmup 10,running 1000 times to get average time
* Latency (`ms`) and Memory(MB) of different batch

> The counterpart of **`Anakin`** is the acknowledged high performance inference engine **`NVIDIA TensorRT 5`** , The models which TensorRT 5 doesn't support we use the custom plugins to support.

### VGG16

| Batch_Size | RT latency FP32(ms) | Anakin2 Latency FP32 (ms) |RT Memory (MB) | Anakin2 Memory (MB) |
|------------|---------------------|---------------------------|---------------|---------------------|
| 1 | 8.52532 | 8.2387 |1090.89 | 702 |
| 2 | 14.1209 | 13.8772 |1056.02 | 768.76 |
| 4 | 24.4529 | 24.3391 |1002.17 | 840.54 |
| 8 | 46.7956 | 46.3309 |1098.98 | 935.61 |

### Resnet50

| Batch_Size | RT latency FP32(ms) | Anakin2 Latency FP32 (ms) | RT Latency INT8 (ms) | Anakin2 Latency INT8 (ms) | RT Memory FP32(MB) | Anakin2 Memory FP32(MB) |
|------------|---------------------|---------------------------|----------------------|---------------------------|--------------------|-------------------------|
| 1 | 4.6447 | 3.0863 | 1.78892 | 1.61537 | 1134.88 | 311.25 |
| 2 | 6.69187 | 5.13995 | 2.71136 | 2.70022 | 1108.86 | 382 |
| 4 | 11.1943 | 9.20513 | 4.16771 | 4.77145 | 885.96 | 406.86 |
| 8 | 19.8769 | 17.1976 | 6.2798 | 8.68197 | 813.84 | 532.61 |

### Resnet101

| Batch_Size | RT latency (ms) | Anakin2 Latency (ms) | RT Latency INT8 (ms) | Anakin2 Latency INT8 (ms) | RT Memory (MB) | Anakin2 Memory (MB) |
|------------|-----------------|----------------------|----------------------|---------------------------|----------------|---------------------|
| 1 | 9.98695 | 5.44947 | 2.81031 | 2.74399 | 1159.16 | 500.5 |
| 2 | 17.3489 | 8.85699 | 4.8641 | 4.69473 | 1158.73 | 492 |
| 4 | 20.6198 | 16.8214 | 7.11608 | 8.45324 | 1021.68 | 541.08 |
| 8 | 31.9653 | 33.5015 | 11.2403 | 15.4336 | 914.49 | 611.54 |

## X86 CPU Benchmark
### Machine And Enviornment
> CPU: `Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz` with HT, for FP32 test
> CPU: `Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz` with HT, for INT8 test
> System: `CentOS 6.3` with `GCC 4.8.2`, for benchmark between Anakin and Intel Caffe

* All test enable `8 thread parallel`
* Time:warmup 10,running 200 times to get average time

> The counterpart of **`Anakin`** is **`Intel Cafe`(1.1.6)** with mklml.

| Net_Name | Batch_Size | Anakin2 Latency(2650v4) fp32 (ms) | caffe Latency(2650v4) fp32 (ms) | Anakin2 Latency int8(6271) (ms) |
|-------------|----|-------------------------------------|-----------------------------------|---------------------------------|
| resnet50 | 1 | 20.6201 | 24.1369 | 3.20866 |
| resnet50 | 2 | 39.2286 | 43.1096 | 5.44311 |
| resnet50 | 4 | 77.1392 | 81.8814 | 9.93424 |
| resnet50 | 8 | 152.941 | 158.321 | 19.5618 |
| vgg16 | 1 | 55.6132 | 70.532 | 15.3181 |
| vgg16 | 2 | 96.5034 | 131.451 | 22.5082 |
| vgg16 | 4 | 180.479 | 247.926 | 37.2974 |
| vgg16 | 8 | 346.619 | 485.44 | 67.6682 |
| mobilenetv1 | 1 | 3.98104 | 5.42775 | 0.926546 |
| mobilenetv1 | 2 | 7.27079 | 9.16058 | 1.35007 |
| mobilenetv1 | 4 | 14.4029 | 16.2505 | 2.37271 |
| mobilenetv1 | 8 | 29.1651 | 29.8381 | 3.75992 |
| vgg16_ssd | 1 | 125.948 | 143.412 | |
| vgg16_ssd | 2 | 247.242 | 266.22 | |
| vgg16_ssd | 4 | 488.377 | 510.978 | |
| vgg16_ssd | 8 | 972.762 | 995.407 | |
| mobilenetv2 | 1 | 3.78504 | 23.0066 | |
| mobilenetv2 | 2 | 7.24622 | 65.9301 | |
| mobilenetv2 | 4 | 13.7638 | 85.3893 | |
| mobilenetv2 | 8 | 28.4093 | 131.669 |

## ARM CPU Benchmark
### Machine And Enviornment
> CPU: `Kirin 980`
> CPU: `Snapdragon 652`
> CPU: `Snapdragon 855`
> CPU: `RK3399`

* Compile circumstance: Android ndk cross compile,gcc 4.9,enable neon
* Time:warmup 10,running 10 times to get average time
* Note: 1、shufflenetv2 int8 model add swish operator
> The counterpart of **`Anakin`** is **`ncnn`(20190320)**. This benchmark we test ARMv7 ARMv8 splitly

### ARMv8 TEST
* ABI: arm64-v8a
- Latency (`ms`) of `one batch`

| Kirin 980 | Anakin fp32 | | | Anakin int8 | | | NCNN fp32 | | | NCNN int8 | | |
|---------------|-------------|----------|----------|-------------|----------|----------|-----------|----------|----------|-----------|----------|----------|
| | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread |
| mobilenet_v1 | 34.172 | 19.369 | 12.723 | 37.588 | 20.692 | 13.280 | 45.420 | 24.220 | 16.730 | 50.560 | 27.820 | 20.010 |
| mobilenet_v2 | 30.489 | 17.784 | 12.327 | 29.581 | 17.208 | 15.307 | 30.390 | 17.310 | 12.900 | | | |
| mobilenet_ssd | 71.609 | 37.477 | 28.952 | | | | 88.220 | 70.070 | 66.430 | 103.700 | 85.160 | 85.320 |
| resnet50 | 255.748 | 137.842 | 104.628 | | | | 1299.480 | 695.830 | 498.010 | 243.360 | 131.100 | 89.800 |
| shufflenetv1 | 11.544 | 8.931 | 7.027 | | | | 12.810 | 9.390 | 8.030 | | | |
| shufflenetv2 | 11.687 | 7.899 | 5.321 | 20.402 | 11.529 | 9.061 | | | | | | |
| squeezenet | 28.580 | 16.638 | 14.435 | | | | | | | | | |
| googlenet | 93.917 | 52.742 | 40.301 | 130.875 | 72.522 | 54.204 |

---
---

| Snapdragon 855 | Anakin fp32 | | | Anakin int8 | | | NCNN fp32 | | | NCNN int8 | | |
|---------------|-------------|----------|---------|-------------|----------|----------|-----------|-----------|----------|-----------|----------|----------|
| | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread |
| mobilenet_v1 | 32.019 | 19.024 | 10.491 | 34.363 | 20.292 | 10.382 | 37.110 | 22.310 | 13.520 | 47.430 | 28.350 | 15.830 |
| mobilenet_v2 | 28.533 | 17.455 | 10.433 | 24.487 | 15.182 | 9.133 | 25.060 | 15.970 | 11.250 | | | |
| mobilenet_ssd | 66.454 | 41.397 | 23.639 | | | | 101.560 | 69.380 | 43.930 | 136.420 | 91.010 | 47.490 |
| resnet50 | 201.362 | 132.133 | 78.300 | | | | 1141.290 | 724.090 | 385.990 | 229.020 | 138.450 | 82.060 |
| shufflenetv1 | 10.153 | 7.101 | 5.327 | | | | 11.610 | 8.020 | 5.870 | | | |
| shufflenetv2 | 10.868 | 6.713 | 4.526 | 17.306 | 10.987 | 6.788 | | | | | | |
| squeezenet | 25.880 | 16.134 | 9.697 | | | | | | | | | |
| googlenet | 85.774 | 54.518 | 34.025 | 118.120 | 73.686 | 41.865 |

---
---

| Snapdragon 652 | Anakin fp32 | | | Anakin int8 | | | NCNN fp32 | | | NCNN int8 | | |
|---------------|-------------|----------|----------|-------------|----------|----------|-----------|-----------|----------|-----------|----------|----------|
| | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread |
| mobilenet_v1 | 109.994 | 54.937 | 33.174 | 83.887 | 43.639 | 24.665 | 123.320 | 122.670 | 65.100 | 128.800 | 154.370 | 125.570 |
| mobilenet_v2 | 80.712 | 46.314 | 30.874 | 69.340 | 43.590 | 31.864 | 89.920 | 90.900 | 55.320 | | | |
| mobilenet_ssd | 246.459 | 121.684 | 134.019 | | | | 248.190 | 138.170 | 142.350 | 247.020 | 145.080 | 211.000 |
| resnet50 | 673.285 | 346.287 | 378.065 | | | | 880.940 | 514.190 | | 533.760 | 313.630 | |
| shufflenetv1 | 34.948 | 26.635 | 21.571 | | | | 39.950 | 25.520 | 20.180 | | | |
| shufflenetv2 | 35.530 | 21.440 | 16.434 | 49.498 | 29.116 | 19.346 | | | | | | |
| squeezenet | 87.037 | 47.192 | 28.663 | | | | | | | | | |
| googlenet | 268.023 | 148.533 | 95.624 | 236.492 | 131.510 | 81.561 |

---
---

| RK3399 | Anakin fp32 | | | Anakin int8 | | | NCNN fp32 | | | NCNN int8 | | |
|---------------|-------------|----------|------|-------------|----------|------|-----------|----------|------|-----------|----------|------|
| | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread |
| mobilenet_v1 | 111.317 | 60.008 | | 87.201 | 45.693 | | 149.270 | 91.200 | | 142.790 | 86.140 | |
| mobilenet_v2 | 105.767 | 60.899 | | 79.065 | 53.914 | | 118.530 | 86.900 | | | | |
| mobilenet_ssd | 232.923 | 128.337 | | | | | 268.900 | 157.860 | | 256.560 | 149.730 | |
| resnet50 | 671.800 | 369.386 | | | | | 1029.300 | 571.230 | | 569.250 | 344.830 | |
| shufflenetv1 | 38.761 | 25.971 | | | | | | | | | | |
| shufflenetv2 | 36.220 | 22.095 | | 51.879 | 30.351 | | | | | | | |
| squeezenet | 98.489 | 54.863 | | | | | | | | | | |
| googlenet | 274.166 | 159.429 | | 235.085 | 133.044 |

### ARMv7 TEST
* ABI: armveabi-v7a with neon
- Latency (`ms`) of `one batch`

| Kirin 980 | Anakin fp32 | | | Anakin int8 | | | NCNN fp32 | | | NCNN int8 | | |
|---------------|-------------|----------|----------|-------------|----------|----------|-----------|----------|----------|-----------|----------|----------|
| | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread |
| mobilenet_v1 | 39.051 | 19.813 | 14.184 | 39.026 | 22.048 | 14.250 | 50.240 | 26.850 | 20.010 | 92.900 | 49.420 | 37.160 |
| mobilenet_v2 | 36.052 | 19.550 | 14.507 | 32.656 | 19.641 | 15.735 | 35.890 | 20.730 | 18.550 | | | |
| mobilenet_ssd | 83.474 | 44.530 | 33.116 | | | | 99.960 | 53.160 | 84.360 | 180.000 | 91.380 | 68.140 |
| resnet50 | 291.478 | 158.954 | 129.484 | | | | 1412.37 | 766.62 | 560.760 | 355.010 | 189.18 | 133.410 |
| shufflenetv1 | 11.909 | 9.761 | 7.441 | | | | 16.030 | 10.660 | 8.120 | | | |
| shufflenetv2 | 11.755 | 7.983 | 6.289 | 21.968 | 14.111 | 9.888 | | | | | | |
| squeezenet | 30.148 | 20.908 | 17.084 | | | | | | | | | |
| googlenet | 108.210 | 65.798 | 58.630 | 140.886 | 79.910 | 60.693 |

---
---

| Snapdragon 855 | Anakin fp32 | | | Anakin int8 | | | NCNN fp32 | | | NCNN int8 | | |
|---------------|-------------|----------|---------|-------------|----------|----------|-----------|-----------|----------|-----------|----------|----------|
| | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread |
| mobilenet_v1 | 34.015 | 20.064 | 11.410 | 42.222 | 21.532 | 11.746 | 41.150 | 24.870 | 18.420 | 79.180 | 48.470 | 24.530 |
| mobilenet_v2 | 30.742 | 18.507 | 11.354 | 24.628 | 15.133 | 9.079 | 30.060 | 19.220 | 15.520 | | | |
| mobilenet_ssd | 69.749 | 44.010 | 26.000 | | | | 85.030 | 62.770 | 48.940 | 154.600 | 138.700 | 82.140 |
| resnet50 | 218.581 | 146.509 | 92.899 | | | | 1380.340 | 996.410 | 540.660 | 324.720 | 261.920 | 126.270 |
| shufflenetv1 | 11.032 | 7.430 | 5.369 | | | | 13.390 | 9.270 | 6.360 | | | |
| shufflenetv2 | 11.372 | 7.120 | 4.728 | 19.393 | 12.278 | 7.719 | | | | | | |
| squeezenet | 27.860 | 17.538 | 10.729 | | | | | | | | | |
| googlenet | 100.719 | 69.509 | 49.021 | 127.982 | 83.369 | 50.275 |

---
---

| Snapdragon 652 | Anakin fp32 | | | Anakin int8 | | | NCNN fp32 | | | NCNN int8 | | |
|---------------|-------------|----------|----------|-------------|----------|----------|-----------|-----------|-----------|-----------|----------|----------|
| | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread | 1 thread | 2 thread | 4 thread |
| mobilenet_v1 | 121.982 | 63.004 | 37.325 | 86.672 | 45.728 | 26.354 | 130.740 | 140.850 | 81.810 | 184.630 | 192.730 | 144.740 |
| mobilenet_v2 | 89.113 | 50.609 | 35.291 | 72.679 | 45.888 | 33.887 | 94.520 | 101.380 | 65.570 | | | |
| mobilenet_ssd | 236.466 | 132.293 | 86.335 | | | | 270.630 | 295.520 | 174.280 | 350.640 | 286.420 | 243.850 |
| resnet50 | 751.528 | 405.433 | 255.699 | | | | 2762.890 | 1447.070 | 883.730 | 664.180 | 369.020 | |
| shufflenetv1 | 36.883 | 23.718 | 15.144 | | | | 53.660 | 33.450 | 23.330 | | | |
| shufflenetv2 | 36.933 | 26.353 | 20.507 | 53.243 | 31.083 | 21.550 | | | | | | |
| squeezenet | 92.748 | 51.936 | 33.027 | | | | | | | | | |
| googlenet | 296.092 | 179.542 | 125.509 | 242.505 | 140.083 | 89.646 |

---
---

| RK3399 | Anakin fp32 | | Anakin int8 | | NCNN fp32 | | NCNN int8 | |
|---------------|-------------|----------|-------------|----------|-----------|-----------|-----------|----------|
| | 1 thread | 2 thread | 1 thread | 2 thread | 1 thread | 2 thread | 1 thread | 2 thread |
| mobilenet_v1 | 116.981 | 65.033 | 87.768 | 47.617 | 155.830 | 98.520 | 201.800 | 116.440 |
| mobilenet_v2 | 118.229 | 70.567 | 83.790 | 55.413 | 126.530 | 90.930 | | |
| mobilenet_ssd | 237.196 | 134.508 | | | 292.130 | 183.650 | 361.570 | 200.370 |
| resnet50 | 725.582 | 413.995 | | | 2883.120 | 1632.800 | 702.660 | 404.970 |
| shufflenetv1 | 41.094 | 27.353 | | | | | | |
| shufflenetv2 | 37.660 | 23.489 | 53.558 | 32.122 | | | | |
| squeezenet | 104.519 | 59.402 | | | | | | |
| googlenet | 305.304 | 190.897 | 244.855 | 142.493 |

## Documentation

**All you need is in [Doc Index](docs/README.md)**

We also provide [English](docs/Manual/Tutorial_en.md) and [Chinese](docs/Manual/Tutorial_ch.md) tutorial documentation.

- User guide

You can get the working principle of the project, C++ interface description and code examples from [here](docs/Manual/Tutorial_ch.md). You can also learn about the model converter [here](docs/Manual/Converter_ch.md).

- Developer guide

You might want to know more details of Anakin and make it better. Please refer to [how to add custom devices](docs/Manual/addCustomDevice.md) and [how to add custom device operators](docs/Manual/addCustomOp.md).

- [How to Contribute](docs/Manual/Contribution_ch.md)

We appreciate your contributions!

## Ask Questions

You are welcome to submit questions and bug reports as [Github Issues](https://github.com/PaddlePaddle/Anakin/issues).

## Copyright and License
Anakin is provided under the [Apache-2.0 license](LICENSE).

## Acknowledgement
Anakin refers to the following projects:
- [Caffe](https://github.com/BVLC/caffe)
- [ComputeLibrary](https://github.com/ARM-software/ComputeLibrary)
- [gemmlowp](https://github.com/google/gemmlowp)
- [ncnn](https://github.com/Tencent/ncnn)