Blog posts

2018

Automatic Kernel Optimization for Deep Learning on All Hardware Platforms

9 minute read

Published: October 02, 2018

Optimizing the performance of deep neural network on a diverse range of hardware platforms is still a hard problem for AI developers. In terms of system support, we are facing a many-to-many problem here: deploying trained models from multiple frontends (e.g. Tensorflow, ONNX, MXNet) to multiple hardware platforms (e.g. CPU, GPU, Accelerators). The most performance critical part of this problem is obtaining high performance kernel implementations for growing model architectures and hardware platforms.

Optimizing Mobile Deep Learning on ARM GPU with TVM

13 minute read

Published: January 15, 2018

With the great success of deep learning, the demand for deploying deep neural networks to mobile devices is growing rapidly. Similar to what we do in desktop platforms, utilizing GPU in mobile devices can benefit both inference speed and energy efficiency. However, most existing deep learning frameworks do not support mobile GPU very well. The difficulty lies at the difference between mobile GPU architecture and desktop GPU architecture. It means special effort is required for optimizing on mobile GPU. The non-trivial extra work eventually results in the poor support of mobile GPU in most deep learning frameworks.

Lianmin Zheng

Blog posts

2018

Automatic Kernel Optimization for Deep Learning on All Hardware Platforms

Optimizing Mobile Deep Learning on ARM GPU with TVM