Go Back   OC3D Forums > [OC3D] Hardware & Software > Networks & Security
Reply
 
Thread Tools Search this Thread Display Modes
 
  #1  
Old 10-12-20, 10:59 AM
Gothmoth Gothmoth is offline
Advanced Member
 
Join Date: Jun 2017
Location: Germany
Posts: 370
Network Card issue - Mellanox ConnectX-3

hello,

i bought two 10GBe mellanox connectx-3 cards to direct connect two systems.
everything (seem) to went fine.
windows installed drivers and in 10 minutes i had setup a 10gbe network.

until a day later.
macrium reflect was making an automatic backup and i was informed that the verify ended with a hash error.
i run the backup again.. same result: hash error.

i then did a file copy with verify (using totalcommander).
i copied ~3000 jpg and tif files to the second system.
totalcommander nearly immediately warned me that some of the written files are different.
and indeed when checking the files they showed artefacts or only half of the file was displayed, the bottom parts of the images were only artefacts.

i was pretty shocked that the file transfers, seemingly, went fine when not veryfing.
but around 5% of the files where destroyed after copying.
had i not used macrium reflect and verified the backup image, i would not have noticed it that early.

i did a bunch of tests to figure out what the culprit is (cards, cable, harddsik etc.).
i don´t want to bore you with that.


in the end i noticed that when i put the card in the lower PCIe slot (the long one at the bottom called 4_3) of my asus crosshair 7 the network card produces these errors. files send from this system to the other system (no matter if the second system uses the internal intel 1GB via switch or direct connection to the mellanox 10GB network card) are probably defective.

when i put the networkcard into the second 16x slot, beneath the graphic card, the files are transfered without issues.

i have only one graphic card in this system, no other PCIe devices.


what can cause this?
someone with similiar experiences?



__________________
2x TR 2950, 1x 3970x, 1x 3900x, 1x 5900x
2x Eizo CG 2730 1x Eizo CS 2740, 1x BenQ PD2700U
Reply With Quote
  #2  
Old 10-12-20, 02:46 PM
FTLN FTLN is offline
OC3D Elite
 
Join Date: Feb 2012
Location: France
Posts: 1,748
Did you check the windows event log when transfering in the bottom slot ?
Can you boot a linux live usb and check if you still get errors ? Easiest way to rule out a driver issue in windows.
Reply With Quote
  #3  
Old 10-12-20, 03:03 PM
Gothmoth Gothmoth is offline
Advanced Member
 
Join Date: Jun 2017
Location: Germany
Posts: 370
i get some mellanox related messages in the event log when starting windows.


https://i.imgur.com/uH6eHIK.jpg


https://i.imgur.com/OHgcETl.jpg


but no message regarding the filetransfer errors.


the firmware is the latest.
i have update both cards from the mellanox website.


https://i.imgur.com/0z1LuRq.jpg



i tried with the latest recommended windows drivers from the mellanox website and the drivers that windows installs.


https://i.imgur.com/fnrfokT.jpg



still one pci slot works fine, the other not.
__________________
2x TR 2950, 1x 3970x, 1x 3900x, 1x 5900x
2x Eizo CG 2730 1x Eizo CS 2740, 1x BenQ PD2700U
Reply With Quote
  #4  
Old 10-12-20, 05:01 PM
FTLN FTLN is offline
OC3D Elite
 
Join Date: Feb 2012
Location: France
Posts: 1,748
Try a linux live usb stick, transfer some file and see if the same happens.
Reply With Quote
  #5  
Old 10-12-20, 05:12 PM
Gothmoth Gothmoth is offline
Advanced Member
 
Join Date: Jun 2017
Location: Germany
Posts: 370
Quote:
Originally Posted by FTLN View Post
Try a linux live usb stick, transfer some file and see if the same happens.

will try it tomorrow.. will first have to create one.


i guess i will run into problems installing the driver for the network card.

i am a linux noob....



what tool will copy files and verify them?
__________________
2x TR 2950, 1x 3970x, 1x 3900x, 1x 5900x
2x Eizo CG 2730 1x Eizo CS 2740, 1x BenQ PD2700U
Reply With Quote
  #6  
Old 11-12-20, 11:20 AM
Dark NighT's Avatar
Dark NighT Dark NighT is offline
OC3D Elite
 
Join Date: Jan 2011
Location: The Netherlands
Posts: 2,103
If you happen to have the bottom M.2 slot populated with a nvme ssd it could be that it's sharing bandwidth with the lower pci-e slot and that might be causing the error's you are experiencing.

The other possible option is that the chipset that runs the bottom pcie slot simply can't keep up with the speed that card can pull through and it causes write errors, the middle pci-e slot is directly connected to the cpu and not the chipset.
__________________
Rig: Cpu: AMD R7 3700X. Mobo: Gigabyte X470 Aorus Gaming 7 wifi. Ram: Gskill Royal Silver 3200mhz cl16: Gpu: Msi RTX2070 Armor. Asus Xonar STX II.
Reply With Quote
  #7  
Old 11-12-20, 11:36 AM
Gothmoth Gothmoth is offline
Advanced Member
 
Join Date: Jun 2017
Location: Germany
Posts: 370
Quote:
Originally Posted by Dark NighT View Post
If you happen to have the bottom M.2 slot populated with a nvme ssd it could be that it's sharing bandwidth with the lower pci-e slot and that might be causing the error's you are experiencing.

The other possible option is that the chipset that runs the bottom pcie slot simply can't keep up with the speed that card can pull through and it causes write errors, the middle pci-e slot is directly connected to the cpu and not the chipset.

no second NVME. only the one connected directly to the cpu.

i wonder if a bottleneck should prodcuce errors.
that would be a very bad PCI design. slowing it down.. ok.


but it is PCI 2.0 x4 that should be good for 4x 500 MB/s.
the 10gbe card is maxing out at ~1200mb/s.


as to linux.. i tried but i get no connection to the other system.
i guess i have to install the mellanox linux driver onto the USB stick.
have to look how to do that.


ps:


there is one more thing i have to check.
when testing the card i screwed it in when using the bottom slot. because it should stay there.
when using the second x16 slot i just put it in but did not screw it on.
i did it so each time i tested the different slots.


i just noticed, that when i screw the card in the card moves a tiny bit. because the slot shield (sorry i am german don´t know how it´s called in english) is bend a bit. so the end of the card moves 1-1.5mm up when screwed in.

it seems to sit fine in the PCI slot but maybe the connection to one or two of the pins is not 100%.

??? could this be the reason? should i not get more error messages in that case?


will do more testing....
__________________
2x TR 2950, 1x 3970x, 1x 3900x, 1x 5900x
2x Eizo CG 2730 1x Eizo CS 2740, 1x BenQ PD2700U
Reply With Quote
  #8  
Old 11-12-20, 01:23 PM
FTLN FTLN is offline
OC3D Elite
 
Join Date: Feb 2012
Location: France
Posts: 1,748
http://manpages.ubuntu.com/manpages/....4freebsd.html


With Ubuntu live run these two commands in terminal after boot:
sudo -i
kldload mlx4en


You should see the card in the network settings.
Reply With Quote
  #9  
Old 11-12-20, 02:10 PM
Gothmoth Gothmoth is offline
Advanced Member
 
Join Date: Jun 2017
Location: Germany
Posts: 370
Quote:
Originally Posted by FTLN View Post
http://manpages.ubuntu.com/manpages/....4freebsd.html


With Ubuntu live run these two commands in terminal after boot:
sudo -i
kldload mlx4en


You should see the card in the network settings.

thanks
__________________
2x TR 2950, 1x 3970x, 1x 3900x, 1x 5900x
2x Eizo CG 2730 1x Eizo CS 2740, 1x BenQ PD2700U
Reply With Quote
  #10  
Old 11-12-20, 03:42 PM
looz's Avatar
looz looz is offline
OC3D Elite
 
Join Date: Feb 2013
Location: Finland
Posts: 2,309
This would make me verify that the configs are compatible - but I've never used mellanox gear, or any serious networking gear on Windows so take that with a grain of salt.
__________________
i7 9900k - 16GB - 3080 XC3 Ultra - 660p 1TB + MX500 2TB - HE-4XX w/ Topping D30+A30
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump










All times are GMT. The time now is 01:19 AM.
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2021, vBulletin Solutions, Inc.