Core Concepts
This paper presents an end-to-end open-source pipeline for deploying a few-shot learning platform for real-time object classification on FPGA SoCs, enabling rapid adaptation to new tasks with minimal resources.
Abstract
The paper tackles the challenges of implementing few-shot learning on embedded systems, specifically FPGA SoCs, which is a vital approach for adapting to diverse classification tasks when the costs of data acquisition or labeling are prohibitively high.
The key contributions include:
Development of an end-to-end open-source pipeline, PEFSL, for a few-shot learning platform for object classification on FPGA SoCs, built on the Tensil open-source framework.
Demonstration of the potential of this platform by building and deploying a low-power, low-latency few-shot learning model trained on the MiniImageNet dataset with a dataflow architecture, achieving 54% accuracy on 32x32 images with a latency of 30ms and power consumption of 6.2W on the PYNQ-Z1 board.
Exploration of the design space, including network depth, training and test image size, downsampling methods, and number of feature maps, to identify the optimal trade-off between accuracy and latency for the target embedded application.
Comparison of the hardware implementation with other FPGA-based DNN accelerators, demonstrating the efficiency of the proposed approach.
The open-source pipeline and the demonstrator aim to pave the way for exciting new applications in fields such as robotics, drones, and autonomous vehicles, where responsiveness, computational power, and energy are critical factors.
Stats
The proposed system has a latency of 30 ms while consuming 6.2 W on the PYNQ-Z1 board.
For the 32x32 resolution, ResNets-9 exhibit higher accuracies than the ResNets-12, despite their lower number of layers and parameters.
Training with 32x32 images achieves better accuracy than training on larger images (84x84 and 100x100) for the 32x32 test resolution.
Using convolutions with a stride of 2 in the network reduces the number of operations compared to using max pooling layers, without impacting the accuracy.
Quotes
"One of the primary obstacles to the implementation of few-shot learning on embedded systems is the required computational power induced by the underlying cost of Deep Neural Networks (DNN) models."
"Careful design of low-complexity DNN adapted to embedded hardware targets is therefore a main concern."
"The challenges to be tackled toward such an implementation are the selection and adaptation of deployment frameworks, the identification and adaptation of an efficient training routine from the literature, and finally the design of a lightweight network that meets the constraints of embedded systems while also performing well for the defined task, few-shot learning on embedded FPGA SoC, with a real-time classification of a video stream."