Automatic optimization of GPU native instruction schedules through stochastic search can significantly enhance CUDA kernel performance.