How to measure bandwidth from pinned host memory to device memory on aws A100(p4d.24xlarge)?
Mar 04, 2024
I want to measure the bandwidth from pinned host memory to device memory on NVIDIA A100. On AWS p4d.24xlarge machine, 8 NVIDIA A100 with PCIe 4.0x16 is supported, so the ideal bandwidth should be 31.5GB/s. But I only get the result of about 13GB/s (from pinned host) by running the below code on NVIDIA developer blog.
code-samples/profile.cu at master · NVIDIA-developer-blog/code-samples · GitHub .
Is there any problem with this code, or any other reason why the speed cannot reach the ideal 31.5GB/S?