Neural Approximate Accelerator Architecture Optimization for DNN Inference on Lightweight FPGAs

Embedded Machine Learning (ML) constitutes an admittedly fast-growing field that comprises ML algorithms, hardware, and software capable of performing on-device sensor data analyses at extremely low power, enabling thus several always-on and battery-powered applications and services. Running ML-based applications on embedded edge devices witnesses a phenomenal research and business interest for many reasons, including accessibility, privacy, latency, cost, and security. Embedded ML is primarily represented by artificial intelligence (AI) at the edge (EdgeAI) and on tiny, ultra resource constrained devices, a.k.a. TinyML. TinyML poses requirements for energy efficiency but also low latency as well as to retain accuracy in acceptable levels mandating, thus, optimization of the software and hardware stack.
GPUs form the default platform for DNN training workloads, due to their high parallelism computing originating by the massive number of processing cores. Though, GPU is often not an optimal solution for DNN inference acceleration due to the high energy-cost and the lack of reconfigurability, especially for high sparsity models or customized architectures. On the other hand, Field Programmable Gate Arrays (FPGAs) have a unique privilege of potentially lower latency and higher efficiency than GPUs while offering high customization and faster time-to-market combined with potentially longer useful life than ASIC solutions.
In the context of TinyML, NA³Os focuses on a neural approximate accelerator-architecture co-search targeting specifically lightweight FPGA devices. This project investigates design techniques to optimally and automatically map DNNs to resource- constrained FPGAs while exploiting principles of approximate computing. Our particular topics of investigation include:

  • Efficient mapping of DNN operations onto approximate hardware components (e.g., multipliers, adders, DSP Blocks, BRAMs).
  • Techniques for fast and automated design space exploration of mappings of DNNs defined by a set of approximate operators and a set of FPGA platform constraints.
  • Investigation of a hardware-aware neural architecture co-search methodology targeting FPGA-based DNN accelerators.
  • Evaluation of robustness vs. energy efficiency tradeoffs.
  • Finally, all developed methods shall be evaluated experimentally by providing a proper synthesis path and comparing the quality of generated solutions with state-of-the-art solutions.