Research on OpenCL optimization for FPGA deep learning application

Autoři: Shuo Zhang aff001;  Yanxia Wu aff001;  Chaoguang Men aff001;  Hongtao He aff001;  Kai Liang aff001
Působiště autorů: College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China aff001
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article


In recent years, with the development of computer science, deep learning is held as competent enough to solve the problem of inference and learning in high dimensional space. Therefore, it has received unprecedented attention from both the academia and the business community. Compared with CPU/GPU, FPGA has attracted much attention for its high-energy efficiency, short development cycle and reconfigurability in the aspect of deep learning algorithm. However, because of the limited research on OpenCL optimization on FPGA of deep learning algorithms, OpenCL tools and models applied to CPU/GPU cannot be directly used on FPGA. This makes it difficult for software programmers to use FPGA when implementing deep learning algorithms for a rewarding performance. To solve this problem, this paper proposed an OpenCL computational model based on FPGA template architecture to optimize the time-consuming convolution layer in deep learning. The comparison between the program applying the computational model and the corresponding optimization program provided by Xilinx indicates that the former is 8-40 times higher than the latter in terms of performance.

Klíčová slova:

Algorithms – Computer software – Convolution – Deep learning – Language – Memory – Neural networks – Optimization


1. Yu Q, Wang C, Ma X, Li X, Zhou X. A Deep Learning Prediction Process Accelerator Based FPGA. Ieee/acm International Symposium on Cluster, Cloud and Grid Computing. IEEE, 2015:585-594.

2. Lecun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553):436. doi: 10.1038/nature14539 26017442

3. Véstias M, Duarte RP, de Sousa JT, Neto H. Parallel dot-products for deep learning on FPGA. Field Programmable Logic and Applications (FPL), 2017 27th International Conference on. IEEE, 2017: 1-4.

4. Zhu J, Qian Z, Tsui CY. LRADNN: High-throughput and energy-efficient Deep Neural Network accelerator using Low Rank Approximation. Design Automation Conference. IEEE, 2016:581-586.

5. Lacey G, Taylor GW, Areibi S. Deep Learning on FPGAs: Past, Present, and Future. arXiv: Distributed, Parallel, and Cluster Computing. 2016

6. Chen DT, Singh DP. Fractal video compression in OpenCL: An evaluation of CPUs, GPUs, and FPGAs as acceleration platforms. Asia and south pacific design automation conference. 2013:297-304

7. Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. International conference on computer aided design. 2016.

8. Nurvitadhi E, Sim J, Sheffield D, Mishra A, Krishnan S, Marr D. Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC. Field programmable logic and applications. 2016:1-4.

9. Ouyang J, Lin S, Qi W, Wang Y, Yu B, Jiang S. SDA: Software-defined accelerator for large-scale DNN systems. Hot Chips 26 Symposium. IEEE. 2016:1-23

10. Nurvitadhi E, Sim J, Sheffield D, Mishra A, Krishnan S, Marr D. Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC International Conference on Field Programmable Logic and Applications. IEEE, 2016:1-4.

11. Stone JE, Gohara D, Shi G. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. Computing in Science & Engineering, 2010, 12(3):66–73. doi: 10.1109/MCSE.2010.69

12. Wei X, Yu C, Zhang P, Chen Y, Wang Y, Hu H, et al. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. The 54th Annual Design Automation Conference 2017. ACM, 2017.

13. Abdelouahab K, Pelcat M, Serot J, Bourrasset C, Quinton JC, Berry F. Hardware Automated Dataflow Deployment of CNNs. arXiv:1705.04543v3.2017

14. Huang Q, Lian R, Canis A, Choi J, Xi R, Brown S, et al. The Effect of Compiler Optimizations on High-Level Synthesis for FPGAs. IEEE International Symposium on Field-programmable Custom Computing Machines. IEEE, 2013.

15. Abdelfattah MS, Hagiescu A, Singh D. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL. Proceedings of the International Workshop on OpenCL 2013 & 2014.

16. Farabet C, Martini B, Akselrod P, Talay S, LeCun Y, Culurciello E. Hardware accelerated convolutional neural networks for synthetic vision systems. IEEE International Symposium on Circuits & Systems. IEEE, 2010.

17. Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2016.

18. Ko BS, Kim HG, Oh KJ, Choi HJ. Controlled dropout: A different approach to using dropout on deep neural network. IEEE International Conference on Big Data and Smart Computing. IEEE, 2017:358-362.

19. Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016:16-25.

20. Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. Acm/sigda International Symposium on Field-Programmable Gate Arrays. ACM, 2015:161-170.

21. Czajkowski TS, Aydonat U, Denisenko D, Freeman J, Kinsner M, Neto D, et al. From OpenCL to high-performance hardware on FPGAs. Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on. IEEE, 2012: 531-534.

22. Luo L, Wu Y, Qiao F, Yang Y, Wei Q, Zhou X, et al. Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL. International Journal of Reconfigurable Computing, 2018, 2018:1–10. doi: 10.1155/2018/1785892

23. Tapiador R, Riosnavarro A, Linaresbarranco A, Kim M, Kadetotad D, Seo J. Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs. Robotic and Technology of Computers Lab report. 2016.

Článek vyšel v časopise


2019 Číslo 10
Nejčtenější tento týden