核心概念
This study introduces a novel reinforcement learning-based approach for robotic palletization task planning, emphasizing the significance of iterative action masking learning to manage and prune the action space effectively, leading to enhanced learning efficiency and operational performance.
要約
The paper addresses the problem of efficient task planning for robotic palletization systems in logistics scenarios. The key highlights are:
- Formulation of the online 3D Bin Packing Problem (BPP) as a Markov Decision Process (MDP) for reinforcement learning (RL).
- Introduction of a supervised learning-based pipeline to develop an action masking mechanism, which identifies and masks out infeasible actions during RL training to address the challenge of large action space.
- Proposal of an iterative adaptation framework to refine the action masking model, further enhancing the RL policy's performance by mitigating the distribution shift issue.
- Extensive experiments in simulated environments demonstrating the efficacy of the proposed method in improving the learning efficiency and operational performance compared to baseline approaches.
- Successful deployment of the learned task planner in a real-world robotic palletization system, showcasing its robustness and applicability.
統計
The pallet is specified to have dimensions of 25×25 inches, with a stipulation that the maximum height of stacked boxes cannot exceed 25 inches.
The simulation environment features 80 boxes of 5 distinct sizes, with dimensions (in inches) of 10 × 8 × 6, 9 × 6 × 4, 6 × 6 × 6, 6 × 4 × 4, 4 × 4 × 4.
During testing, each box placement is subjected to random translational noise on the xy plane δt ∼N(0, 0.05) and rotational noise around the z-axis δr ∼N(0, 5), measured in inches and degrees, respectively.
引用
"The development of robotic systems for palletization in logistics scenarios is of paramount importance, addressing critical efficiency and precision demands in supply chain management."
"Confronted with the substantial challenge of a vast action space, which is a significant impediment to efficiently apply out-of-the-shelf RL methods, our study introduces a novel method of utilizing supervised learning to iteratively prune and manage the action space effectively."
"Experimental results indicate that our proposed method outperforms other baselines and is more sampling-efficient."