Energy Efficient FPGA-Based Binary Transformer Accelerator for Edge Devices

作者：贾东宁发布时间：2025-04-03浏览次数：10

By

Du, CP (Du, Congpeng) [1] ; Ko, SB (Ko, Seok-Bum) [2] ; Zhang, H (Zhang, Hao) [1]

Book Group Author

IEEE

Source

2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024

DOI

10.1109/ISCAS58744.2024.10558631

Book Series

IEEE International Symposium on Circuits and Systems

Published

2024

Indexed

2024-09-21

Document Type

Proceedings Paper

Conference

Meeting

IEEE International Symposium on Circuits and Systems (ISCAS)

Location

Singapore, SINGAPORE

Date

MAY 19-22, 2024

Abstract

Transformer-based large language models have gained much attention recently. Due to their superior performance, they are expected to take the place of conventional deep learning methods in many fields of applications, including edge computing. However, transformer models have even more amount of computations and parameters than convolutional neural networks which makes them challenging to be deployed at resource-constrained edge devices. To tackle this problem, in this paper, an efficient FPGA-based binary transformer accelerator is proposed. Within the proposed architecture, an energy efficient matrix multiplication decomposition method is proposed to reduce the amount of computation. Moreover, an efficient binarized Softmax computation method is also proposed to reduce the memory footprint during Softmax computation. The proposed architecture is implemented on Xilinx Zynq Untrascale+ device and implementation results show that the proposed matrix multiplication decomposition method can reduce up to 78% of computation at runtime. The proposed transformer accelerator can achieve improved throughput and energy efficiency compared to previous transformer accelerator designs.

导航