Resource-Efficient FPGA Implementation of a Self-Attention Core for Edge AI

Janaki Rani M.; Mohana Divya S.; Kavya D.; Maniyammai B.

doi:10.36548/jei.2026.2.005

Resource-Efficient FPGA Implementation of a Self-Attention Core for Edge AI

Open Access

https://doi.org/10.36548/jei.2026.2.005

Vol. 8, No. 2 (2026)

Published: 02 May, 2026

Pages: 156-170

Janaki Rani M. , Janaki Rani M.

Department of Electronics and Communication Engineering, Dr. M.G.R. Educational and Research Institute, Chennai, India

Department of Electronics and Communication Engineering, Dr. M.G.R. Educational and Research Institute, Chennai, India
Mohana Divya S. , Mohana Divya S.

Department of Electronics and Communication Engineering, Dr. M.G.R. Educational and Research Institute, Chennai, India

Department of Electronics and Communication Engineering, Dr. M.G.R. Educational and Research Institute, Chennai, India
Kavya D. , Kavya D.

Department of Electronics and Communication Engineering, Dr. M.G.R. Educational and Research Institute, Chennai, India

Department of Electronics and Communication Engineering, Dr. M.G.R. Educational and Research Institute, Chennai, India
Maniyammai B. Maniyammai B.

Department of Electronics and Communication Engineering, Dr. M.G.R. Educational and Research Institute, Chennai, India

Department of Electronics and Communication Engineering, Dr. M.G.R. Educational and Research Institute, Chennai, India

view PDF

How to Cite

M., Janaki Rani, Mohana Divya S., Kavya D., and Maniyammai B. 2026. “Resource-Efficient FPGA Implementation of a Self-Attention Core for Edge AI”. Journal of Electronics and Informatics 8 (2): 156-70. https://doi.org/10.36548/jei.2026.2.005.

Keywords

FPGA Accelerator

Self-Attention

Transformer Models

Edge Computing

Fixed-Point Design

Pipelined MAC

BRAM-Based Architecture

Low-Latency Processing

Energy-Efficient Computing

Abstract

The Modern AI relies on transformer-based architectures, they require considerable resource requirements along with substantial memory requirements for implementation in resource-limited deployments due to their inherent computational complexity. The use of Field Programmable Gate Arrays (FPGAs) to create an efficient hardware accelerator for self-attention within transformers will allow for efficient use of hardware resources while enabling the use of transformers as part of an edge AI system. This research describes the implementation of such a design using the Xilinx Artix-7 FPGA platform with the Nexys DDR development board and demonstrates the capability of self-attention within an FPGA-based hardware accelerator to produce results in real time through a pipelined multiply-accumulate (MAC) architecture and finite state machine (FSM) based control structures. In addition, the implementation of this hardware accelerator was able to achieve functional accuracy when compared to the Python Golden Model. The hardware accelerator produced low levels of both hardware utilization and power consumption while at the same time providing a efficient computational performance. The architecture provides a unique solution for balancing performance, resource utilization and energy efficiency. FPGA-based domain-specific accelerators have demonstrated their ability to bridge the computational gap presented by transformer self-attention mechanisms and would be an appropriate method for implementing real-time inference of edge AI systems.

References

Ngo, Dat, Hyun-Cheol Park, and Bongsoon Kang. "Edge Intelligence: A Review of Deep Neural Network Inference in Resource-Limited Environments." Electronics 14, no. 12 (2025): 2495.
Zhang, Wenbo, Yan Zhang, Yiqi Liu, Lingjie Wu, and Xingtong Hu. "REATA: An Efficient Vision Transformer Accelerator Featuring a Resource-Optimized Attention Design on Versal ACAP." ACM Transactions on Reconfigurable Technology and Systems 19, no. 1 (2026): 1-32.
Ferreira, Lucas, Mariana Silva, Thiago Costa, Ana Beatriz Rocha, Rafael Almeida, Camila Oliveira, João Pedro Nunes, and Gulnaz Rati. "Bringing Foundation Models to the Edge with Efficient Deployment Strategies." Authorea Preprints 2025.
Sali, Safa Mohammed, Mahmoud Meribout, and Ashiyana Abdul Majeed. "Real Time FPGA Based CNNs for Detection, Classification, and Tracking in Autonomous Systems: State of the Art Designs and Optimizations." 2025, arXiv preprint arXiv:2509.04153.
Yuan, Haitao, Jing Bi, Ziqi Wang, Jia Zhang, MengChu Zhou, and Rajkumar Buyya. "Multi-Perspective and Energy-Efficient Deep Learning in Edge Computing." IEEE Internet of Things Journal 2025, vol. 13, no. 3, 3988-4003.
Tao, Shuailin. "Edge-centric AI for Biomedical Signals: Efficient Representation and Intelligent Processing." PhD diss., Nanyang Technological University, 2025 10.32657/10356/204832.
Zhou, Shuai, Sisi Meng, Huinan Tian, Jun Yu, and Kun Wang. "Edge-BiT: Software-Hardware Co-Design for Optimizing Binarized Transformer Networks Inference on Edge FPGA." In Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024, 1-9.
Dong, Jiale, Wenqi Lou, Hao Wu, Zhendong Zheng, Yunji Qin, Lei Gong, Chao Wang, and Xuehai Zhou. "MoE-Sched: Enabling Efficient FPGA Deployment of Mixture-of-Experts Vision Transformers via Coordinated Scheduling." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2025, vol. 34, no. 1, 104-117.
Hu, Wei, Heyuan Li, Fang Liu, and Zhiyv Zhong. "Hardware and Software Co-Optimization of Convolutional and Self-Attention Combined Model Based on FPGA." In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, Singapore: Springer Nature Singapore, 2023, 328-342.
Sali, Safa Mohammed, Mahmoud Meribout, and Ashiyana Abdul Majeed. "Real Time FPGA Based Transformers & VLMs for Vision Tasks: SOTA Designs and Optimizations." 2025, arXiv preprint arXiv:2509.04162.

Resource-Efficient FPGA Implementation of a Self-Attention Core for Edge AI

How to Cite

Download Citation

Keywords

Abstract

References