Book Group Author
IEEE COMPUTER SOCSource
Page
179-185DOI
10.1109/EEET61723.2023.00015Published
2023Indexed
2024-10-18Document Type
Proceedings PaperConference
Meeting
6th International Conference on Electronics and Electrical Engineering Technology (EEET)Location
Nanjing, PEOPLES R CHINADate
DEC 01-03, 2023Sponsors
SE Univ; Beihang Univ; Beijing Cas Spark Inst Informat Technol; Joint Int Res Lab Informat Display & Visualizat; Tiangong Univ; Univ Sains Malaysia; APEXAbstract
Recently, Transformer models have provided better accuracy than traditional models in fields such as computer vision (CV) and natural language processing (NLP). However, compared with traditional convolutional neural networks (CNNs), a large number of softmax function calculations in Transformer models involve expensive exponential and division calculations, resulting in huge consumption of computing and storage resources and waste of power consumption, which makes Transformer networks unable to be effectively applied to edge computing. To solve this problem, in this paper an approximate softmax computation architecture is proposed. Compared with the classic base-e softmax function or the base-2 softmax function proposed in recent years, our proposed design uses simpler shift and fixed-point addition operations to replace complex exponential and division operations, and thus significantly reduces the resource consumption while maintaining a high computation accuracy. The proposed architecture can effectively accelerate the Soft-max computation of Transformer models in edge computing. Experimental results show that for 16-bit Softmax computation, the proposed design can achieve less resource occupation and energy consumption. Vision transformer(ViT) models are also used to verify the effectiveness of the proposed method in real transformer models.