Stein Soft Actor-Critic: An Energy-Based Reinforcement Learning Algorithm with Expressive Stochastic Policies
S2AC learns expressive stochastic policies modeled as Stein Variational Gradient Descent (SVGD) samplers from Energy-Based Models (EBMs) over Q-values, enabling it to maximize both the expected future reward and the expected future entropy in a computationally efficient manner.