Home » News & Events » TRANSCEIVERS » 800G OSFP NDR

Nvlink

Mar 04, 2024

Hello,

Sorry for the long post. But I am very new in this field. I hope you can help. Big thanks in advance.

For better understanding, I suggest you look at the 2 zip files I have uplaod. I’m just wondering if the values I’m getting are normal.

I bought a new 3 slot NVLINK and am using it with 2 RTX 3090s.

Since I am a “noob” about NVLINK, I will write all my experiences here.

Game part:

In RDR2, there was a 32% increase in FPS compared to a single GPU. But the resolution was 1600x900. Because I had to send the 4k monitor I used to the technical service. GPU usage rate fluctuated between 70-85%. I’m wondering if this is normal.

Code Part:

OS : Wındows 11 Pro

Model : Res-Net 152

THE RESULTS I GET WITHOUT THIS PART NVLINK:

512 Batch - Single vs 2 GPUs:

Trial training ends in 2 3090 822 seconds without NVLINK, while on a single 3090 it is 1141 seconds with 512 Batch Size. This means a speed increase of around 40%. I tested it on 512 as 1024 batches could not be started on single GPU. I also see that the VRAMs are evenly split, the limit being 48GB. I’m assuming that 21 Degrees difference is not normal when using 2 GPUs. I mean peak values, of course.

2 GPU 1024 Batch - 2 GPU 512 Batch:

In both cases, VRAMs are shared. There is a 21-22 degree difference. I mean peak values, of course. The iconic thing is that there is not much difference between Batch 1024 and 512 (around 12%).

RESULTS WITH NVLINK:

While 1024 batch is full in 2 vrams, it is halved in 512 on both sides. There is nvlink with 512 batches and there is a difference of almost 100 seconds when there is no nvlink.

When 1024 batches, nvlink lags behind for 50 seconds.

I activated NVLINK from control panel and installed studio driver. I don’t know how I can boost. Also, unfortunately, I can’t write all the values one by one. Please open and compare the zip files side by side as nvlink exists or not. I’m wondering if it’s a problem.

I have some questions.

Does NVLINK have any other function besides pooling the memories?

I saw a comment in a Forum that “NVLINK pools GPU and Ram together” how true is it? (If true, does this mean that 64 GB RAM and 48 GB VRAM can be combined in the system?)

How much system resources will NVLINK use while performing its service? I mean RAM, CPU Core , Watt .

Will I be able to use GPUs for independent purposes even though NVLINK is connected? For example, will I be able to play games and stream with one and continue AI training with the other? If the answer is yes, as it should, how do I select the GPU? Or will I have to disable it each time?

Also, I read an article like this, I don’t know how true or false it is. “When you say batch to you X, every GPU takes X batches, the total batch becomes 2X,” he said. But when I give 512 batches, all memory is full to one GPU, while when I give 1024 batches, 2 GPUs (total batch will make 2048 if the theory is correct) is barely full. I couldn’t make sense of this. Also, I was able to export 1024 batches before without nvlink.

Even if true, it took 256 batch sizes, 881 seconds on 2 GPUs. it took 512 1141 seconds on a single gpu. If the theory is correct, there is an increase of over 70%. However, we can’t achieve a speed above 80-85% that there is a performance of over 90% in every benchmark.

There is one more result that is not included in the zip file.

1024 batches with sli open

706sec at epoch 10

2108 sec at epoch 30

closed in 1024 batches

863 sec at epoch 10

It took 2120 seconds at epoch 30.

Also

I saw the values of the 2080 ti sli system on the internet. It had a speed of 25gb per second. I wonder why I have only 14.

I’m wondering why it becomes less efficient when SLI is on, although it should work in epoch 10 at first, but should become more effective as the training time gets longer. I’m just looking for logic and can’t find it. please tell me.

nvidia smi.zip (930.1 KB)
basic benchmarkıng.zip (7.9 MB)