Temel Kavramlar
Quality Diversity optimization enhances fault diversity in policy testing, revealing informative and diverse faults in decision models.
Özet
This content explores the application of Quality Diversity (QD) optimization in policy testing for reinforcement learning. It discusses the importance of fault diversity, compares QD optimizers to traditional frameworks, and evaluates their efficiency and effectiveness. The study focuses on three use-cases: Lunar Lander, Bipedal Walker, and Taxi environments. Results show that QD-based testing improves fault detection and diversity compared to traditional methods.
Abstract:
- Reinforcement Learning is crucial for sequential decision-making tasks.
- Testing for functional fault detection builds trustworthiness in decision models.
- Quality Diversity optimization enhances fault diversity in policy testing.
Introduction:
- RL combined with neural networks solves complex sequential decision problems.
- Software validation practices include efficient testing for trust-building.
- Recent works focus on maximizing faults found but lack diversity consideration.
Related Work:
- Various testing objectives studied alongside corresponding techniques proposed.
- Different methodologies address policy testing with unique approaches.
Background:
- Sequential decision-making tasks are solved step-by-step by decision models.
- Reinforcement Learning learns optimal policies from rewards/penalties.
- Black-box testing without internal access to policies or simulators.
QD Optimization for Policy Testing:
- Challenges adapting QD optimization to policy testing addressed.
- Solutions' behavior and quality defined for effective evaluation.
Experimental Evaluation:
- Research questions explore efficiency, diversity improvement, and behavior space impact.
- Experiments conducted on three standard RL environments: Lunar Lander, Bipedal Walker, Taxi.
- Metrics include test efficiency, behavior coverage, final state diversity.
Results:
- QD-based testing improves fault detection and diversity compared to traditional methods.
- MAP-Elites consistently outperforms Random Testing while Novelty Search shows sensitivity to behavior space selection.
Conclusion:
Quality Diversity optimization enhances fault diversity in policy testing by revealing diverse faults in decision models. Further research is needed to address stochasticity challenges in MDPs.
Alıntılar
"QD-based policy testing finds more diverse faults in the model under test than dedicated techniques."
"MAP-Elites systematically beats Random Testing in efficiency."