Abstract:
With the development of artificial intelligence and robotics, more and more automatic devices are designed and built to help human production. This work explores a representative application of fruit picking in the agriculture area, focusing on the detection and pose estimation of cherry tomatoes. Since the proper way of picking for cherry tomatoes is by cutting the branched stem of the fruit bunch, the detection of that particular stem and its pose is the main difficulty faced in this task. Two datasets were created for two proposed perception systems both based on YOLO, a deep learning model for object detection, and a keypoint labeling method was proposed for annotating the pose of the bunches for various shapes of cherry tomatoes. Although the first proposed model, which combined the detection for the stem and the bunch, did not meet the requirement of this task, the second model introducing key-point detection successfully tackled the problem inspired by YOLO-Pose, which is designed for the human pose estimation task. Some modifications to the keypoint generation process on the original YOLO-Pose were applied to limit the range of generation area by absorbing the information from the detection box, and brought an increase in both performance and robustness, reaching the AP@0.05 of (0.962) and AP@0.5 of (0.826) on the test set and reducing the normalized keypoint distance error by 10.6% on average.