Core Concepts
Matrix accelerators' feature differences can significantly impact numerical results, emphasizing the need for precise testing before porting code across GPUs.
Stats
NVIDIA의 V100, A100, AMD의 MI250X, MI100, H100은 모두 서브노멀 숫자를 처리함.
NVIDIA의 V100은 추가 비트를 사용하지 않음, A100은 1개, H100은 최소 2개의 추가 비트를 사용함.
모든 GPU 모델은 RTN-TE 라운딩 모드를 사용함.
NVIDIA의 V100은 FMA 유닛 폭이 4이고, A100은 8, H100은 최소 16임.
AMD MI100은 FP64를 지원하지 않음.
Quotes
"Unfortunately, very few facts are publicly documented about some of their attributes that can affect answers computed on identical code."
"We demonstrate that the lack of information on how these features differ across two matrix accelerators can make it impossible to reliably port codes across GPUs containing these differing accelerators."