Core Concepts
Architectural designs, including network width, depth, and components, significantly impact continual learning performance. The proposed ArchCraft method crafts CL-friendly architectures that achieve state-of-the-art performance with fewer parameters.
Abstract
This paper investigates the impact of neural network architectural designs on continual learning (CL) performance. The authors systematically explore the effects of network width, depth, and components (skip connections, global pooling, and down-sampling) on both Task Incremental Learning (Task IL) and Class Incremental Learning (Class IL).
The key findings are:
Wider and shallower networks generally exhibit better CL performance compared to deeper networks.
The optimal configurations of network components differ between Task IL and Class IL. For Task IL, skip connections and max pooling without global average pooling (GAP) are preferred. For Class IL, skip connections and GAP are beneficial.
The authors propose the ArchCraft method, which crafts CL-friendly architectures by exploring the search space of network width, depth, and component locations. ArchCraft recrafts AlexNet and ResNet into AlexAC and ResAC, respectively.
Extensive experiments demonstrate that the ArchCraft-guided architectures achieve state-of-the-art CL performance while being significantly more parameter-efficient than the baseline architectures. For example, ResAC-A outperforms ResNet-18 by up to 8.19% in last accuracy and 8.02% in average incremental accuracy, with 23% fewer parameters.
The authors further analyze the stability and plasticity of the ArchCraft-guided architectures, showing that they exhibit less forgetting on previous tasks and higher accuracy on new tasks compared to the baselines. This is attributed to the ArchCraft architectures' ability to extract more shared features across incremental tasks, as evidenced by the higher similarity of their representations.
In summary, this work highlights the critical role of network architecture design in continual learning and proposes the ArchCraft method as an effective approach to craft CL-friendly architectures that balance stability and plasticity.
Stats
The network with wider and shallower architectures generally exhibits better CL performance.
Increasing the network width contributes more to CL performance than increasing the depth.
The optimal configurations of network components differ between Task IL and Class IL.
Quotes
"Wider and shallower networks may be more suitable for CL, explaining why ResNet-18 empirically shows better performance than ResNet-32 in Table 1."
"ArchCraft recrafts AlexNet/ResNet into AlexAC/ResAC to guide a well-designed network architecture for CL with fewer parameter sizes."
"ArchCraft results in better overall stability while maintaining plasticity."