핵심 개념
Value-decomposition in reinforcement learning improves performance in high-dimensional discrete action spaces.
초록
Abstract:
Discrete-action reinforcement learning struggles in high-dimensional spaces due to vast possible actions.
Value-decomposition from multi-agent RL addresses this challenge.
REValueD algorithm mitigates overestimation bias and target variance, outperforming in challenging tasks.
Introduction:
Deep reinforcement learning combines deep learning and RL for complex decision-making.
Traditional algorithms fail in high-dimensional, discrete action spaces.
Factorisable MDPs have a factorisable action space A = A1 ×...×AN.
Methodology:
DecQN algorithm learns utility values for sub-actions independently.
DecQN reduces overestimation bias but increases target variance.
Ensemble of critics in REValueD mitigates variance, regularisation loss minimizes impact of exploratory actions.
Experiments:
REValueD outperforms DecQN and BDQ in DeepMind Control Suite tasks.
Performance improves with increasing sub-actions per dimension.
Regularisation loss enhances performance in challenging tasks.
Conclusion:
Value-decomposition in REValueD improves performance in high-dimensional discrete action spaces.
Regularisation loss and ensemble of critics address overestimation bias and target variance effectively.
통계
최근 연구에서 가치 분해를 MARL에서 가져와 단일 에이전트 FMDP에 적용했습니다.
DecQN은 각 하위 액션 공간에 대한 유틸리티 값을 학습합니다.
REValueD는 DecQN의 분해된 가치에 대한 타겟 분산을 줄이기 위해 평가자 앙상블을 사용합니다.
인용구
"REValueD는 고차원 이산 액션 공간에서 성능을 향상시킵니다."
"DecQN은 오대추정 편향을 줄이지만 타겟 분산을 증가시킵니다."
"REValueD의 정규화 손실은 탐사적 액션의 영향을 완화하는 데 도움이 됩니다."