So It looks like I *may* have a dying CPU (Phenom II 955)

Zoot

Active member
Dying CPU Maybe? (Phenom II 955)

So randomly yesterday I ended up getting this error while compiling a never version of VLC in Debian on my secondary rig. It's referencing an error in moving crap from the L2 to the L1 cache, and the "Hardware Error" message is rather onimous.

Code:
[B]mark @ ~ $ [/B]
Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780044] [Hardware Error]: CPU:3    MC0_STATUS[-|CE|-|-|AddrV|CECC]: 0x9400400075000136

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780059] [Hardware Error]:     MC0_ADDR: 0x00000001f4e4a840

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780065] [Hardware Error]: Data Cache Error: during L1 linefill from L2.

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780073] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780085] [Hardware Error]: CPU:3    MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780091] [Hardware Error]: Instruction Cache Error: Copyback Parity/Victim error.

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780098] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780108] [Hardware Error]: CPU:3    MC2_STATUS[Over|CE|-|-|AddrV]: 0xd40000000000017a

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780115] [Hardware Error]:     MC2_ADDR: 0x000000000103a840

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780120] [Hardware Error]: Bus Unit Error: EV error during data copyback.

Message from syslogd@PhenomII-Rig at Dec 26 07:52:40 ...
 kernel:[28500.780127] [Hardware Error]: cache level: L2, tx: GEN, mem-tx: EV
This is funny because it wasn't happening up until yesterday. I have the CPU undervolted, so I know the logical thing to do is remove that and see if it persists, but I'm too lazy to do that. :p

The strange thing is that the thing is perfectly stable in both Windows 7 and Linux, I've ran Prime95 again, (I ran it extensively years ago to confirm stability too) for nearly 12 hours in both OS's, and while Debian gives me the error, Prime95 itself doesn't report any errors. Windows on the other hand works fine with no errors either from Prime95 or Windows itself.

I've been doing some googling and it seems it's either a faulty CPU, or some other firmware/BIOS, or kernel incompatability. Some threads have mentioned BIOS upgrades solving the problem, and I am running a beta BIOS on this board so that could be it. For the meantime though, I've just disabled the daemon that's reporting the error, out of sight, out of mind. ^_^

I'm curious, has anyone ever actually had the CPU die on them after a long time of working well? I've had things like motherboards and graphics cards die on me, but never a CPU.
 
Last edited:
I've now got uptime of over 3 full days running the Linux version of Prime95.

I think I safely forget about this error now. :p
 
Back
Top