Improving Dictionary Learning in Language Models with Gated Sparse Autoencoders
Gated Sparse Autoencoders (Gated SAEs) achieve a Pareto improvement over baseline Sparse Autoencoders (SAEs) in terms of reconstruction fidelity and sparsity, by separating the functionality of detecting active features from estimating their magnitudes.