Grunnleggende konsepter
逆強化学習における報酬関数の効果的な生成と特徴選択の方法を提案し、複数のタスクでの効果を示す。
Statistikk
P(τi)|θ) = eθT ϕ(τi) Z(θ)
log P(τi)|θ) ∝θT ϕ(τi)
dim(Φ) = d + d(d + 1)/2 where d = dim(s)
Sitater
"Feature selection is then performed for the candidates by leveraging the correlation between trajectory probabilities and feature expectations."
"Our method attains comparable performance levels using significantly fewer features."
"The proposed method achieves sufficient benchmark results in all tasks."