Optimizing Transformer Inference on FPGA: A Study on Hardware Acceleration using Vitis HLS

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu | Master's thesis
Date
2023-08-21
Department
Major/Subject
Micro and Nanoelectronic Circuit Design
Mcode
ELEC3036
Degree programme
Master’s Programme in Electronics and Nanotechnology (TS2013)
Language
en
Pages
61
Series
Abstract
In the last decade, advancements in Natural Language Processing have been exemplary, it has reshaped human-computer interaction, spruced up mainly by the transformative power of deep learning models like the Transformer architecture. With its revolutionary self-attention mechanism, the Transformer has outperformed & outmanoeuvred traditional architectures, enhancing tasks varying from machine translation to sentiment analysis. However, the computational demands of these models challenge their integration onto devices with limited resources. This thesis strives to propose an FPGA-based hardware accelerator tailor made for the Transformer's encoder block, implemented using the Vitis High-Level Synthesis (HLS) framework. In this work, we systematically analyze the Transformer to pinpoint computational bottlenecks with alacrity. Through the Vitis HLS framework, the accelerator emphasizes parallelism, resource efficiency, and optimized memory access. Significantly, this approach employs HLS optimization to boost performance to stellar levels. A key contribution is the seamless integration of this accelerator with the Xilinx ecosystem, enriching its deployment on FPGA devices. We subject the proposed accelerator to rigorous & intensive testing, benchmarking its performance, optimum resource utilization, and energy efficiency. The results underscore the accelerator's potential in bridging the computational gap in resource-limited settings, establishing a benchmark for future NLP hardware acceleration endeavors.
Description
Supervisor
Andraud, Martin
Thesis advisor
Adam, Kazybek
Leslin, Jelin
Keywords
transformer, hardware accelerator, selfAattention, high level synthesis, natural language processing, deep learning
Other note
Citation