"Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be efficient in terms of statistical complexity, computational complexity, and query complexity."
"Our algorithm further minimizes the query complexity through a novel randomized active learning procedure."
"We aim to design new RL algorithms that can learn from preference-based feedback and can be efficient in statistical complexity (i.e., regret), computational complexity, and query complexity."
引用文
"Despite achieving sublinear worst-case regret, these algorithms are computationally intractable even for simplified models such as tabular Markov Decision Processes (MDPs)."
"In this work, we aim to design new RL algorithms that can learn from preference-based feedback and can be efficient in statistical complexity (i.e., regret), computational complexity, and query complexity."
Tùy Chỉnh Tóm Tắt
Viết Lại Với AI
Tạo Trích Dẫn
Dịch Nguồn
Sang ngôn ngữ khác
Tạo sơ đồ tư duy
từ nội dung nguồn
Xem Nguồn
arxiv.org
Making RL with Preference-based Feedback Efficient via Randomization