Optimizing Deep Learning Computation on Inter-core Connected Intelligence Processors with T10
T10 is a deep learning compiler that efficiently utilizes the distributed on-chip memory and high inter-core communication bandwidth of emerging intelligence processors to scale deep learning computation.