Network Card issue

Gothmoth

New member
Network Card issue - Mellanox ConnectX-3

hello,

i bought two 10GBe mellanox connectx-3 cards to direct connect two systems.
everything (seem) to went fine.
windows installed drivers and in 10 minutes i had setup a 10gbe network.

until a day later.
macrium reflect was making an automatic backup and i was informed that the verify ended with a hash error.
i run the backup again.. same result: hash error.

i then did a file copy with verify (using totalcommander).
i copied ~3000 jpg and tif files to the second system.
totalcommander nearly immediately warned me that some of the written files are different.
and indeed when checking the files they showed artefacts or only half of the file was displayed, the bottom parts of the images were only artefacts.

i was pretty shocked that the file transfers, seemingly, went fine when not veryfing.
but around 5% of the files where destroyed after copying.
had i not used macrium reflect and verified the backup image, i would not have noticed it that early.

i did a bunch of tests to figure out what the culprit is (cards, cable, harddsik etc.).
i don´t want to bore you with that.


in the end i noticed that when i put the card in the lower PCIe slot (the long one at the bottom called 4_3) of my asus crosshair 7 the network card produces these errors. files send from this system to the other system (no matter if the second system uses the internal intel 1GB via switch or direct connection to the mellanox 10GB network card) are probably defective.

when i put the networkcard into the second 16x slot, beneath the graphic card, the files are transfered without issues.

i have only one graphic card in this system, no other PCIe devices.


what can cause this?
someone with similiar experiences?


9SFq5fc.jpg
 
Last edited:
Did you check the windows event log when transfering in the bottom slot ?
Can you boot a linux live usb and check if you still get errors ? Easiest way to rule out a driver issue in windows.
 
i get some mellanox related messages in the event log when starting windows.


https://i.imgur.com/uH6eHIK.jpg


https://i.imgur.com/OHgcETl.jpg


but no message regarding the filetransfer errors.


the firmware is the latest.
i have update both cards from the mellanox website.


https://i.imgur.com/0z1LuRq.jpg



i tried with the latest recommended windows drivers from the mellanox website and the drivers that windows installs.


https://i.imgur.com/fnrfokT.jpg



still one pci slot works fine, the other not.
 
Last edited:
Try a linux live usb stick, transfer some file and see if the same happens.


will try it tomorrow.. will first have to create one. :)


i guess i will run into problems installing the driver for the network card.

i am a linux noob....



what tool will copy files and verify them?
 
Last edited:
If you happen to have the bottom M.2 slot populated with a nvme ssd it could be that it's sharing bandwidth with the lower pci-e slot and that might be causing the error's you are experiencing.

The other possible option is that the chipset that runs the bottom pcie slot simply can't keep up with the speed that card can pull through and it causes write errors, the middle pci-e slot is directly connected to the cpu and not the chipset.
 
Last edited:
If you happen to have the bottom M.2 slot populated with a nvme ssd it could be that it's sharing bandwidth with the lower pci-e slot and that might be causing the error's you are experiencing.

The other possible option is that the chipset that runs the bottom pcie slot simply can't keep up with the speed that card can pull through and it causes write errors, the middle pci-e slot is directly connected to the cpu and not the chipset.


no second NVME. only the one connected directly to the cpu.

i wonder if a bottleneck should prodcuce errors.
that would be a very bad PCI design. slowing it down.. ok.


but it is PCI 2.0 x4 that should be good for 4x 500 MB/s.
the 10gbe card is maxing out at ~1200mb/s.


as to linux.. i tried but i get no connection to the other system.
i guess i have to install the mellanox linux driver onto the USB stick.
have to look how to do that.


ps:


there is one more thing i have to check.
when testing the card i screwed it in when using the bottom slot. because it should stay there.
when using the second x16 slot i just put it in but did not screw it on.
i did it so each time i tested the different slots.


i just noticed, that when i screw the card in the card moves a tiny bit. because the slot shield (sorry i am german don´t know how it´s called in english) is bend a bit. so the end of the card moves 1-1.5mm up when screwed in.

it seems to sit fine in the PCI slot but maybe the connection to one or two of the pins is not 100%.

??? could this be the reason? should i not get more error messages in that case?


will do more testing....
 
Last edited:
uH6eHIK.jpeg
This would make me verify that the configs are compatible - but I've never used mellanox gear, or any serious networking gear on Windows so take that with a grain of salt.
 
it was the same when copying under linux.


but i think the issue was this:


ps:

there is one more thing i have to check.
when testing the card i screwed it in when using the bottom slot. because it should stay there.
when using the second x16 slot i just put it in but did not screw it on.
i did it so each time i tested the different slots.


i just noticed, that when i screw the card in the card moves a tiny bit. because the slot shield (sorry i am german don´t know how it´s called in english) is bend a bit. so the end of the card moves 1-1.5mm up when screwed in.

it seems to sit fine in the PCI slot but maybe the connection to one or two of the pins is not 100%.

??? could this be the reason? should i not get more error messages in that case?
i put the card in without screwing it in and no issues.
then i bend the card bracket a bit, so that it does not move at all, screwed it in.. no issues so far.


i will do verify for while to be sure, even when it´s slower.


make me verify that the configs are compatible - but I've never used mellanox gear, or any serious networking gear on Windows so take that with a grain of salt.


that some kind of extra feature these cards offer that i don´t use anyway.
but i found no way to disable it.

it should cause no issues. it´s just a fallback to an earlier version of that feature.
 
it´s working now since without an issue.

this was really a strange thing.
i build my own PC since the mid 90s but can´t remember having such an issue.
i may had a GPU that did not sit correctly in the PCI slot but then it was not working at all.

the rear of the networkcard really only moved a tiny bit up when screwing it in.
i only noticed it by coincidence.

but it must have been enough to compromise the connection.
 
Back
Top