Efficient Streaming Acceleration of Modern Convolutional Neural Networks on FPGAs with Smart Off-Chip Memory Management
The paper introduces a groundbreaking memory optimization methodology that systematically considers the allocation and utilization of both on-chip and off-chip memory within a layerwise pipelined, streaming architecture to enable efficient mapping of modern CNN models with large parameters and complex connections to FPGA devices.