核心概念
Kolmogorov-Arnold Networks (KANs) are a promising alternative to Multi-Layer Perceptrons (MLPs) for accurate and interpretable function approximation. KANs place learnable activation functions on edges instead of fixed activation functions on nodes, allowing them to outperform MLPs in terms of accuracy and interpretability.
摘要
The paper introduces Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-Layer Perceptrons (MLPs) for function approximation.
Key highlights:
- KANs place learnable activation functions on edges ("weights") instead of fixed activation functions on nodes ("neurons") like MLPs.
- This seemingly simple change allows KANs to outperform MLPs in terms of accuracy and interpretability.
- For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving tasks.
- Theoretically and empirically, KANs possess faster neural scaling laws than MLPs.
- For interpretability, KANs can be intuitively visualized and can easily interact with human users to (re)discover mathematical and physical laws.
- The paper demonstrates KANs' advantages over MLPs through extensive numerical experiments and two examples in mathematics and physics.
統計資料
A 2-Layer width-10 KAN is 100 times more accurate than a 4-Layer width-100 MLP (10^-7 vs 10^-5 MSE) and 100 times more parameter efficient (102 vs 104 parameters) for PDE solving.
For the function f(x, y) = exp(sin(πx) + y^2), a [2, 1, 1] KAN can represent it exactly, while much larger MLPs struggle.
For the high-dimensional function f(x1, ..., x100) = exp(1/100 * sum(sin^2(πxi/2))), a [100, 1, 1] KAN scales as test RMSE ∝ N^-4, while MLPs plateau quickly.
引述
"KANs can not only learn features (thanks to their external similarity to MLPs), but can also optimize these learned features to great accuracy (thanks to their internal similarity to splines)."
"KANs are nothing more than combinations of splines and MLPs, leveraging their respective strengths and avoiding their respective weaknesses."