Abstract
The Modern AI relies on transformer-based architectures, they require considerable resource requirements along with substantial memory requirements for implementation in resource-limited deployments due to their inherent computational complexity. The use of Field Programmable Gate Arrays (FPGAs) to create an efficient hardware accelerator for self-attention within transformers will allow for efficient use of hardware resources while enabling the use of transformers as part of an edge AI system. This research describes the implementation of such a design using the Xilinx Artix-7 FPGA platform with the Nexys DDR development board and demonstrates the capability of self-attention within an FPGA-based hardware accelerator to produce results in real time through a pipelined multiply-accumulate (MAC) architecture and finite state machine (FSM) based control structures. In addition, the implementation of this hardware accelerator was able to achieve functional accuracy when compared to the Python Golden Model. The hardware accelerator produced low levels of both hardware utilization and power consumption while at the same time providing a efficient computational performance. The architecture provides a unique solution for balancing performance, resource utilization and energy efficiency. FPGA-based domain-specific accelerators have demonstrated their ability to bridge the computational gap presented by transformer self-attention mechanisms and would be an appropriate method for implementing real-time inference of edge AI systems.
References
- Ngo, Dat, Hyun-Cheol Park, and Bongsoon Kang. "Edge Intelligence: A Review of Deep Neural Network Inference in Resource-Limited Environments." Electronics 14, no. 12 (2025): 2495.
- Zhang, Wenbo, Yan Zhang, Yiqi Liu, Lingjie Wu, and Xingtong Hu. "REATA: An Efficient Vision Transformer Accelerator Featuring a Resource-Optimized Attention Design on Versal ACAP." ACM Transactions on Reconfigurable Technology and Systems 19, no. 1 (2026): 1-32.
- Ferreira, Lucas, Mariana Silva, Thiago Costa, Ana Beatriz Rocha, Rafael Almeida, Camila Oliveira, João Pedro Nunes, and Gulnaz Rati. "Bringing Foundation Models to the Edge with Efficient Deployment Strategies." Authorea Preprints 2025.
- Sali, Safa Mohammed, Mahmoud Meribout, and Ashiyana Abdul Majeed. "Real Time FPGA Based CNNs for Detection, Classification, and Tracking in Autonomous Systems: State of the Art Designs and Optimizations." 2025, arXiv preprint arXiv:2509.04153.
- Yuan, Haitao, Jing Bi, Ziqi Wang, Jia Zhang, MengChu Zhou, and Rajkumar Buyya. "Multi-Perspective and Energy-Efficient Deep Learning in Edge Computing." IEEE Internet of Things Journal 2025, vol. 13, no. 3, 3988-4003.
- Tao, Shuailin. "Edge-centric AI for Biomedical Signals: Efficient Representation and Intelligent Processing." PhD diss., Nanyang Technological University, 2025 10.32657/10356/204832.
- Zhou, Shuai, Sisi Meng, Huinan Tian, Jun Yu, and Kun Wang. "Edge-BiT: Software-Hardware Co-Design for Optimizing Binarized Transformer Networks Inference on Edge FPGA." In Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024, 1-9.
- Dong, Jiale, Wenqi Lou, Hao Wu, Zhendong Zheng, Yunji Qin, Lei Gong, Chao Wang, and Xuehai Zhou. "MoE-Sched: Enabling Efficient FPGA Deployment of Mixture-of-Experts Vision Transformers via Coordinated Scheduling." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2025, vol. 34, no. 1, 104-117.
- Hu, Wei, Heyuan Li, Fang Liu, and Zhiyv Zhong. "Hardware and Software Co-Optimization of Convolutional and Self-Attention Combined Model Based on FPGA." In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, Singapore: Springer Nature Singapore, 2023, 328-342.
- Sali, Safa Mohammed, Mahmoud Meribout, and Ashiyana Abdul Majeed. "Real Time FPGA Based Transformers & VLMs for Vision Tasks: SOTA Designs and Optimizations." 2025, arXiv preprint arXiv:2509.04162.

Journal of Electronics and Informatics