Core Concepts
The author presents the Fast-Fruit-Detector (FFD) as a single-stage, post-processing free object detector that achieves high accuracy and speed for UAV-based fruit harvesting tasks. The FFD introduces novel components like the Latent Object Representation (LOR) module and query assignment strategy to enhance detection efficiency.
Abstract
The paper introduces the Fast-Fruit-Detector (FFD) designed for UAV-based fruit harvesting in vertical farming. FFD achieves 100FPS@FP32 precision on low-powered devices, outperforming existing detectors with its innovative design. The LOR module and query assignment strategy contribute to FFD's high accuracy and speed, making it suitable for robotic applications.
In agriculture automation, autonomous aerial harvesting using UAVs can revolutionize the industry by enabling continuous operations while reducing costs. Developing a fully autonomous system requires optimizing sub-systems like object detection, tracking, and grasping to work efficiently on low-powered devices. FFD addresses these challenges by offering a resource-efficient solution with high accuracy and speed.
The paper discusses related works in convolutional neural network-based detection methods like Faster-RCNN, SSD, FCOS, DETR, and YOLO-v8. It highlights the limitations of existing detectors in detecting small objects efficiently and introduces FFD as a novel approach to address these challenges.
FFD's unique design eliminates the need for multi-scale feature fusion and post-processing steps like NMS, leading to faster inference speeds without compromising accuracy. By generating synthetic scenes and employing comprehensive data augmentation techniques, FFD achieves high detection performance on challenging datasets.
Stats
FFD achieves 100FPS@FP32 precision on NVIDIA Jetson Xavier NX.
Faster-RCNN training time: 5.10s per iteration.
SSD inference @FP32: 32ms.
FCOS training time: 3.40s per iteration.
DETR inference @FP32: 25ms.
YOLO-v8 inference @FP32: 29ms.
Average size of instances in Dh dataset: 13x13 pixels.
Average size of instances in Dt dataset: 20x20 pixels.
Synthetic scenes improve AP from 30.7 to 46.6 when used with augmentation.
Quotes
"Developing a UAV-based fully autonomous harvesting system is not as straightforward as combining several algorithms." - IEEE Robotics and Automation Letters
"FFD represents objects as queries obtained directly from the backbone output without learning them." - IEEE Robotics and Automation Letters
"FFD outperforms various mainstream detectors in terms of training-testing efficiency and accuracy evaluation." - IEEE Robotics and Automation Letters