Book Group Author
IEEESource
DOI
10.1109/ISCAS58744.2024.10558631Book Series
IEEE International Symposium on Circuits and SystemsPublished
2024Indexed
2024-09-21Document Type
Proceedings PaperConference
Meeting
IEEE International Symposium on Circuits and Systems (ISCAS)Location
Singapore, SINGAPOREDate
MAY 19-22, 2024Sponsors
IEEE; IEEE Circuits & Syst SocAbstract
Transformer-based large language models have gained much attention recently. Due to their superior performance, they are expected to take the place of conventional deep learning methods in many fields of applications, including edge computing. However, transformer models have even more amount of computations and parameters than convolutional neural networks which makes them challenging to be deployed at resource-constrained edge devices. To tackle this problem, in this paper, an efficient FPGA-based binary transformer accelerator is proposed. Within the proposed architecture, an energy efficient matrix multiplication decomposition method is proposed to reduce the amount of computation. Moreover, an efficient binarized Softmax computation method is also proposed to reduce the memory footprint during Softmax computation. The proposed architecture is implemented on Xilinx Zynq Untrascale+ device and implementation results show that the proposed matrix multiplication decomposition method can reduce up to 78% of computation at runtime. The proposed transformer accelerator can achieve improved throughput and energy efficiency compared to previous transformer accelerator designs.