Alapfogalmak
A novel and automatic hyperparameter tuning method called Q-FOX is proposed, which uses the FOX optimizer and the Q-learning algorithm to effectively solve control tasks.
Kivonat
The paper presents a novel hyperparameter tuning method called Q-FOX that combines the FOX optimizer and the Q-learning algorithm to automatically tune the hyperparameters of Q-learning.
The key highlights are:
Q-FOX uses the FOX optimizer, a nature-inspired optimization algorithm, to automatically tune the hyperparameters of the Q-learning algorithm, including the step size (α), discount factor (γ), and exploration-exploitation trade-off (ε).
A new multi-objective fitness function is proposed that prioritizes the reward over the mean squared error and learning time, enabling Q-FOX to effectively optimize the hyperparameters.
Q-FOX is evaluated on two OpenAI Gym control tasks - Frozen Lake and Cart Pole. It outperforms other optimization methods like PSO, GA, Bee, and random search in terms of cumulative reward.
For the Frozen Lake task, Q-FOX achieved a cumulative reward of 0.95, while for the Cart Pole task, it achieved a cumulative reward of 32.08.
The results demonstrate that Q-FOX can effectively tune the hyperparameters of the Q-learning algorithm, leading to improved performance and efficiency in solving different control tasks.
However, the iterative nature of Q-FOX makes it time-consuming, limiting its direct application to real-world problems. It is recommended to use Q-FOX in a simulation environment to tune the hyperparameters before applying them to the real-world problem.
Statisztikák
The cumulative reward for the Frozen Lake task was 0.95.
The cumulative reward for the Cart Pole task was 32.08.
Idézetek
"Q-FOX has played an essential role in HP tuning for RL algorithms to effectively solve different control tasks."
"Q-FOX exhibited a remarkable convergence speed in the tuning of HP."