Desperation Time, Please Help!

Arcnor

New member
Before I start, I'd just like to say that I am never rolling my own PC again. Ever. This has been pure hell.

Okay, so a few days after I finished my PC, everything seems to be working okay, but I started having a weird display error when quitting out of a Steam game back to the Windows desktop. My display would go one solid colour. Not a BSOD, just blank. Could only rectify the problem by doing a hard reset. I thought it was a Steam issue at first, and gave Valve some crap about it.

Then the same issue started happening somewhat randomly in a really old game (Fable: The Lost Chapters to be exact). It seemed to be a driver issue, maybe (specifically a TDR error, except my system would never recover).

Then it started happening when web browsing using Google Chrome, again randomly.

I also had instability (BSODs) while playing Tropico 4.

Now it's happened after I've completely wiped my SSD and was in the process of re-installing Windows 7 (can't really be a software conflict if there's no software, right?). It occurred right at the moment I was updating Windows and it said it was installing some low-level NVidia driver (not the driver from the website, just something that was included in the Windows update).

So I turfed everything even remotely resembling an overclock. Still got the error.

Took my rig to a local repair shop ("local" meaning "an hour's drive away"), and they said they could get the video card to crash consistently and recommended an RMA. I have RMA'd my GPU. EVGA are baffled -- said they didn't find anything wrong with the card. Second card had the same error, almost immediately. RMA'ing that too, just in case. Repair shop are equally baffled -- they said their tests pointed to a GPU defect. They suggested the problem might instead be the motherboard.

RMA'ing motherboard -- work in progress.

Tested RAM with Memtest86+ -- no errors, but I never ran it for 24 hours straight. Still, the fact that the error could be counted on to occur after quitting out of a game or a graphics benchmark like Unigine Valley makes me wonder if it's RAM. Not exactly a "random" event, there.

Power supply voltage readings (at least through software) never vary or drop on any rail, and 650W should be plenty for my system.

Does anyone have any further ideas, or do I just have to go down the list and replace every component one at a time until I hit the problem, and hope that it's just not some weird case where my particular hardware combination doesn't want to play nicely together? I use the HDMI port on the GPU, if that makes a difference, although I hardly think it'd be a conflict between audio drivers like some people experience, mostly because the error occurred in my re-install process before any such drivers were present.

Please, please, please OC3D, this is urgent. Does anyone have any further suggestions for what I can do?
 
Last edited:
i would test it with 1 stick of ram manually set the ram speed and timings and voltage this is reccomended anyway. its possible the motherboard bios might need updating..

i really would not be surprised if it is the ram defaulting to 1333mhz and running out of spec due to being on auto in the bios this can cause all the issues your experiencing.

if you can get a blue screen error code and post it i can further diagnose what is the issue but to me as it done it during windows install defiantly points to some sort of hardware error and im 99% certain its your ram..
 
Can't really test much right now, given that my whole PC is in pieces and the motherboard itself is in a package on its way to Asus, but I always manually set my RAM voltages and timings manually, even when I reverted everything to optimized defaults, and I never set them above their rated specs (1866 MHz, 10-11-10-30-2 @ 1.5V). Additionally, like I said, though I didn't run it for 24 hours (I did eight, probably not enough, but...), both sticks did pass Memtest86+ at those speeds and timings with no errors. I also never saw any evidence that the speeds were changing while in Windows. Could have happened when I wasn't looking, of course.

The BSODs were only in the one game, and though I don't have a dump file handy (erased SSD, remember), the code that was in Event Viewer said, if I remember, that ntoskrnl.exe was the problem, which of course told me absolutely nothing, since it seems anything could affect that. The blank screens were even more of a problem, since there was no error code in Event Viewer at that point, apart from the one caused by me doing a hard reset.

Once the motherboard gets back, hopefully I'll know more (actually, hopefully Asus finds the problem and I can get back to life with a working machine). If this keeps occurring, I'll start swapping RAM sticks in and out and see what happens. It didn't seem like a RAM error, though, since like I said, they passed Memtest, and they were set manually in BIOS (which was updated, by the way), and the error absolutely occurred whenever I quit out of a game or Unigine Valley, which doesn't seem very random.

Also, just as an aside, I didn't have a problem with RAM performance in OCCT, if that helps.
 
If you did a ramtest for 8h and there is no error I doubt it's the ram. Especially when the crashes are so frequently.
And I have never heard that ram makes problems when ram runs slower(!) than it's rated speed. The only thing I can think of is when the voltage might be to low, but the jedec thing should take care of that on auto.
 
When you get the motherboard back. Update the bios and all drivers from the manufacturer website.

Your ntoskrnl.exe can be related to memory or hdd. Others had your issue that was fixed after replacing them.

http://www.reviversoft.com/blog/2013/11/how-do-i-fix-a-ntoskrnl-exe-blue-screen-of-death/

This might give you a little better insight to the ntoskrnl.exe error

Check that you have connected everything securely and contacts with cpu/heatsink are good.
 
Motherboard BIOS at the time was the latest release. I think there may be one newer version now, and I'll update to it as soon as the motherboard comes back, but I don't think that was the problem.

Samsung SSD utility said everything was just fine. It's software, of course, and doesn't really run anything, but it was the only SSD health monitor I could find that applied.

Drivers were always up to date. As I said, the error happened once when I was trying to re-install Windows, before there were even any drivers present. I don't think that's the issue -- not driver conflicts, anyway.

CPU temps at my voltage settings never, ever went north of 65 degrees. GPU temps under hideous, torturous load (Unigine Valley with all settings cranked to max) after an hour never broke 70 degrees. Motherboard temps according to HWMonitor didn't even seem to hit 40.

I don't think it's temperature related.
 
EVGA may not necessarily have been thorough in the tests they performed on the card I sent in originally, but they sent a different card back, and the chances of the exact same error occurring in two separate cards seems unlikely. I've RMA'd again (they gave me another one without quibbling, nice of them), but I'm pretty sure the chance of this being a GPU error is remote.

What makes you think it's a PSU problem? Every voltage measurement I saw in Windows said everything was fine, even under load -- the only rail that ever twitched was the +12V, and it only changed from 12.288V to 12.192V. Pretty far within spec.

I will say that according to Asus' PSU calculator, they reckon I ought to run a 700W PSU, but every other PSU calc I've tried said 650W was more than adequate (and I put worst case scenarios in, like 90% load and 30% capacitor degradation). Also I figure that a PSU problem would crop up at moments of maximum load, not on quitting back to Windows, right?

One thing, though, and I'm not sure if it's related: every once in a while, when booting into Windows, my keyboard and/or mouse wouldn't be recognized. A reboot (or two...) usually solved the issue, but is this an indication of anything related, or just Windows being dickish?
 
Last edited:
One thing, though, and I'm not sure if it's related: every once in a while, when booting into Windows, my keyboard and/or mouse wouldn't be recognized. A reboot (or two...) usually solved the issue, but is this an indication of anything related, or just Windows being dickish?

Sounds like a motherboard issue possibly then. Your powesupply is more than adequate. I know that Tropico 4 is not a very optimised game and I've seen it suddenly cause GPU usage to spike to 100% ever so often and if your powersupply is a bit faulty the sudden increase in demand could cause issues.
 
Back
Top