CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling
CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling
Blog Article
This paper analyzes the dependence of the convolutional neural network (CNN) accelerator performance on loop tiling.More specifically, based on the closed-form expression of the CNN accelerator performance, the dependence on the tile sizes is characterized by the derivative, the asymptote and the switching point between the computation-limited condition and the communication-limited condition.The analysis provides a useful insight into how Wall Art to determine the tile sizes to achieve the required performance while avoiding an unnecessary static random access memory (SRAM) size increase.The paper also deals with the optimum resource-constrained loop tiling for CNN accelerators.Given the constraint on either the on-chip buffer size or the multiply-accumulate (MAC) array size, tile sizes are optimized to maximize the performance.
The closed-form expressions of the optimum tile sizes provide useful insights into how to allocate the available hardware resources for Canopies maximum performance.From performance evaluation, the proposed tile sizes achieve almost the maximum performance, which enables the optimization of tile sizes without relying on exhaustive search, speeding up design space exploration.