Article

Recent Developments in Transformer Inference Deployment on FPGA Platforms: A Survey

Arjan Blankestijn, Uraz Odyurt, Amirreza Yousefzadeh

Abstract

With the rapid and continuous growth in the incorporation of machine learning models based on the Transformer architecture, capable deployment is in high demand. In this context, capable deployment refers to operational performance aspects, e.g., throughput and latency, as well as efficiency aspects, e.g., energy consumption. When it comes to the task of inference using such models, purpose-built hardware accelerators provide a lucrative alternative to common deployment choices, such as Central Processing Units (CPUs) and Graphics Processing Units (GPUs). Field Programmable Gate Array (FPGA) platforms category is an example of such alternative accelerators, promising implementation flexibility, energy-efficiency, improved latency and suitability for on-site deployment. We investigate the most recent advances, trends, and design choices for Transformer inference on FPGA platforms. We perform a systematic literature review, extracting and delving into preferred techniques for implementation and optimisation. This study and the provided taxonomy of topics could act as a guide for researchers from the academia and industry alike.

Cite as » BibTeX download badge

Metadata

Type:
Article
Year:
2026
Journal:
Journal of Systems Architecture
DOI:
10.1016/j.sysarc.2026.103841
DOI (arXiv):
TBA

Links

Licence

Creative Commons Attribution (CC BY) licence Artefacts shared as PDF are licenced under CC BY 4.0.