Practical execution of deep learning solutions is mostly restricted to high-performance computing platforms containing GPUs or FPGAs due to the high requirement for computational power and memory footprint. However, for businesses where computation on low-cost off-the-shelf embedded platforms that are already available in the market is the only economically viable option, these deep learning solutions should be ported to such platforms without considerable reduction of quality of calculations.
In this paper, we explore practical methodologies for moving Convolutional Neural Networks (CNNs) to constrained embedded platforms. We present practical techniques and tools for reducing the computational load and memory footprint of neural networks architectures by compromising accuracy of calculations. We also present the results from porting and running some transformed neural networks algorithms used in retail, automotive and industrial automation industries to select the off-the-shelf embedded platforms.