Efficient Inference of Depthwise and Pointwise Convolutions on GPUs through Fused Convolutional Modules
Fused Convolutional Modules (FCMs) significantly reduce the memory access bottleneck of depthwise and pointwise convolutions, leading to low-latency and energy-efficient execution on GPUs.