Kernekoncepter
Proposing a generic early stopping method for direct policy search that significantly reduces computation time without the need for problem-specific knowledge.
Resumé
The content discusses the proposal of a generalized early stopping method for direct policy search to save computation time. It addresses the issue of lengthy evaluation times in optimization problems, especially in tasks like direct policy search. The proposed method is tested in various environments and compared with problem-specific stopping criteria, showing significant time savings and comparable performance.
Abstract:
- Lengthy evaluation times are common in optimization problems.
- Proposed early stopping method aims to save computation time.
- Tested in various environments with promising results.
Introduction:
- Evolutionary algorithms increasingly used in applications like games and robotics.
- Direct policy search algorithms require many evaluations, leading to long learning times.
- Surrogate models can be used to replace costly objective functions with faster alternatives.
Related Work:
- Many direct policy search tasks use problem-specific early stopping methods.
- Various approaches have been proposed for hyperparameter optimization.
- Early stopping based on the objective function alone may not always be applicable.
Generalized Early Stopping for Direct Policy Search (GESP):
- GESP is designed for problems with incremental approximation capabilities.
- Resuming an evaluation is not possible once stopped by GESP.
- Assumption of approximation quality ensures proper identification of solutions.
Experimentation:
- Validation of GESP through experiments in different direct policy search tasks.
- Results show significant reduction in computation time with GESP implementation.
- Comparison with problem-specific stopping criteria reveals effectiveness of GESP.
Statistik
Often when evaluating solution over a fixed time period it becomes clear that the objective value will not increase with additional computation time (for example when a two wheeled robot continuously spins on the spot).
We test the introduced stopping criterion in five direct policy search environments drawn from games, robotics and classic control domains, and show that it can save up to 75% of the computation time.
Citater
"Lengthy evaluation times are common in many optimization problems such as direct policy search tasks."
"The proposed method only looks at the objective value at each time step and requires no problem specific knowledge."