Jump to content

Frequent, but inconclusive, BSODs


wlw

Recommended Posts

Hello,

I'm struggling with frequent BSODs, I've analyzed the crash dumps but they didn't point me anywhere beyond the fact, that it's most probably a hardware issue.

I have done quite an extensive hardware testing, including ram, HDDs, GPU and CPU.

The thing is that I can run OCCT for hours and everything is fine, but the computer will crash while browsing the Internet or when I walk away from it (idle).

I'm thinking it might be the CPU or motherboard playing games with me, but there is nothing specific in those crashes, just some general csrss memory access issues.

I'm attaching some of the kernel crash dumps, hope the experts here can shed some light on this...

Minidump.zip

Link to comment
Share on other sites


0x124 = Bug Check 0x124: WHEA_UNCORRECTABLE_ERROR - This bug check indicates that a fatal hardware error has occurred.


-------------------------------------------------------------------------------
Record Id : 01cc0367a248bc04
Severity : Fatal (1)
Length : 928
Creator : Microsoft
Notify Type : Machine Check Exception
Timestamp : 4/25/2011 22:25:33
Flags : 0x00000000

===============================================================================
Section 0 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ fffffa8004c5a0a8
Section @ fffffa8004c5a180
Offset : 344
Length : 192
Flags : 0x00000001 Primary
Severity : Fatal

Proc. Type : x86/x64
Instr. Set : x64
Error Type : Cache error
Operation : Data Read
Flags : 0x00
Level : 1
CPU Version : 0x0000000000100f42
Processor ID : 0x0000000000000002

===============================================================================
Section 1 : x86/x64 Processor Specific
-------------------------------------------------------------------------------
Descriptor @ fffffa8004c5a0f0
Section @ fffffa8004c5a240
Offset : 536
Length : 128
Flags : 0x00000000
Severity : Fatal

Local APIC Id : 0x0000000000000002
CPU Id : 42 0f 10 00 00 08 04 02 - 09 20 80 00 ff fb 8b 17
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0 @ fffffa8004c5a240

===============================================================================
Section 2 : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor @ fffffa8004c5a138
Section @ fffffa8004c5a2c0
Offset : 664
Length : 264
Flags : 0x00000000
Severity : Fatal

Error : DCACHEL1_DRD_ERR (Proc 2 Bank 0)

you have a Cache Error:

AMD Phenom II X4 B50 Processor looks like you unlocked some CPU Cores which maybe damaged. So undo this unlock and go back to X2 and check if your PC is stable again.

Link to comment
Share on other sites

Agreed - DCACHE errors can indicate RAM errors as well, but usually only on Intel CPUs (due to the way they are capable of deferring an instruction on the register or partially in RAM, depending on load). I do not think this is something AMD CPUs do, so this error on an AMD CPU should indicate a CPU error 100% of the time, or thereabouts.

Link to comment
Share on other sites

Yes indeed, there were a lot of WHEA events in the Event Viewer as well, bu those were corrected hardware errors. When they couldn't be corrected - BSOD.

That however has been known to me and I have disabled the core that has the L1 or L2 damaged (fourth core).

So that part has been solved, I should have posted another crash dump probably. But that's one cause less.

I am more intrigued by those F4 BSODs involving csrss, as there is no LPC message to trace back and I have no clue what can be crashing csrss.

It does say VISTA_DRIVER_FAULT, but with no indication of possible suspect.

Will this require full memory dump (4GB+)?

Link to comment
Share on other sites

CRITICAL_OBJECT_TERMINATION (f4)

A process or thread crucial to system operation has unexpectedly exited or been

terminated.

Several processes and threads are necessary for the operation of the

system; when they are terminated (for any reason), the system can no

longer function.

Arguments:

Arg1: 0000000000000003, Process

Arg2: fffffa80055f2b30, Terminating object

Arg3: fffffa80055f2e10, Process image file name

Arg4: fffff80002f96db0, Explanatory message (ascii)

the dump only shows this:

fffff800`02f96db0 "Terminating critical process 0x%"

fffff800`02f96dd0 "p (%s)."

fffffa80`055f2e10 "csrss.exe"

the csrss.exe was terminated and this caused the bug check:

fffff880`027f50e8 fffff800`0301b982 nt!KeBugCheckEx

fffff880`027f50f0 fffff800`02fc90ab nt!PspCatchCriticalBreak+0x92

fffff880`027f5130 fffff800`02f4c698 nt! ?? ::NNGAKEGL::`string'+0x17ad6

fffff880`027f5180 fffff800`02c928d3 nt!NtTerminateProcess+0xf4

fffff880`027f5200 fffff800`02c8ee70 nt!KiSystemServiceCopyEnd+0x13

fffff880`027f5398 fffff800`02cdf11f nt!KiServiceLinkage

fffff880`027f53a0 fffff800`02c92cc2 nt! ?? ::FNODOBFM::`string'+0x49974

fffff880`027f5a40 fffff800`02c9183a nt!KiExceptionDispatch+0xc2

fffff880`027f5c20 00000000`76fc8e3d nt!KiPageFault+0x23a

00000000`00d609a0 00000000`00000000 0x76fc8e3d

3: kd> !process fffffa80055f2b30 3

GetPointerFromAddress: unable to read from fffff80002ec4000

PROCESS fffffa80055f2b30

SessionId: none Cid: 0198 Peb: 7fffffd7000 ParentCid: 0190

DirBase: 9ee49000 ObjectTable: fffff8a0073ab0e0 HandleCount: <Data Not Accessible>

Image: csrss.exe

VadRoot fffffa8005231170 Vads 89 Clone 0 Private 463. Modified 311. Locked 0.

DeviceMap fffff8a000006090

Token fffff8a001600550

ReadMemory error: Cannot get nt!KeMaximumIncrement value.

fffff78000000000: Unable to get shared data

ElapsedTime 00:00:00.000

UserTime 00:00:00.000

KernelTime 00:00:00.000

QuotaPoolUsage[PagedPool] 0

QuotaPoolUsage[NonPagedPool] 0

Working Set Sizes (now,min,max) (1139, 50, 345) (4556KB, 200KB, 1380KB)

PeakWorkingSetSize 1139

VirtualSize 45 Mb

PeakVirtualSize 46 Mb

PageFaultCount 1586

MemoryPriority BACKGROUND

BasePriority 13

CommitCharge 589

*** Error in reading nt!_ETHREAD @ fffffa800523f6d0

if the RAM is ok, run sfc /Scannow maybe the exe is damaged on the HDD. Best would be to provide a full dump. Zip it with 7z (LZMA2 - ULTRA compression to compress the dmp extremely) and upload it to mediafire.com

Link to comment
Share on other sites

The problems carried over three system, this is the third Windows installation (therefore the csrss.exe corruption is unlikely), I used different HDDs, scanned the RAM with bootable Memtest countless times and did anything else I could.

I do not have a full memory dump, I will have to reconfigure the system and hope for a crash to collect the file.

ED: sfc scan done, everything OK.

Edited by wlw
Link to comment
Share on other sites

What is in common (other than Windows) amongst your installs? Is it the install media, is it the apps you install, etc? CSRSS, LSASS, or SMSS crashes would cause an F4 bugcheck, and given that these binaries are responsible for security and (very) base OS functionality, the likelihood of a nice clean pipe or RPC backtrack isn't likely. Also note that even with a kernel or complete dump there are no guarantees to figure who or what specifically caused it, but I'd still suggest capturing a full dump (so we can see both user and kernel memory) the next time an F4 bugcheck occurs.

Link to comment
Share on other sites

The media - no. I have used different DVDs, one made from MSDN download and one retail.

Apps - yes, to some extent, the second installation was purely experimental, where I didn't even install system updates ( I thought that SP1 was originally responsible).

However I cannot find, let's say, a trigger, that would cause the F4 or 3B (win32k.sys) to pop up. Sometimes it happens then the PC is idle and there is no background app that are normally there, such as SpeedFan or Trixx.

I have configured the system for full crash dump with 4146MB page file, now all I can do is wait for it to happen, as I don't know any particular way of inducing it.

Link to comment
Share on other sites

This machine now officially hates me.

It's set up for full crash dump. I fired up TotaCMD to delete some files from C: to make more free space. I selected some files from main C: dir that I don't need, I pressed shift+del and the machine froze before the confirmation window showed up. The blue circle kept spinning, I could highlight the icons on the desktop but none of the windows and neither the taskbar were responsive.

So I pressed ctrl+alt+del in hope to get Task Manager up, but it stuck at "preparing security options" (or whatever it says in English, I have PL version installed this time).

Then, after some 10 to 20 seconds boom:

STOP 0xF4 (0x3, 0xFFFFFA8004DC4B30, 0xFFFFFA8004DC4E10, 0xFFFFF80002F84DB0)

Collecting data for crash dump ...

Initializing disk for crash dump ...

and nothing! No crash dump was created.

So I pressed the reset button, the system started to boot and it hung on the colorful logo.

It used to do that after F4's, except it usually crashed when booting with CI.dll BSOD, which I believe means system files images loaded into memory were corrupt.

After another reset it booted, I launched TotalCMD, selected those files again, pressed shift_del again, and it froze again :thumbup

Only this time it resumed operation after about 10 seconds.

Is there anything else I can do beside waiting for another F4 and praying that the dump is saved?

Link to comment
Share on other sites

Another one, I just had BSOD 1E during Internet browsing caused just by clicking the mouse, unfortunately it went up to only 75% when saving the full memory dump and the file isn't there at all...

Link to comment
Share on other sites

It's playing with me now! Two crashes in a row that look like this:

post-323241-0-80972100-1304433120_thumb.

Interestingly, the mouse pointer always stays on top of it, which leads me to conclusion that it's not the GPU's fault (it's not the frame buffer corruption).

It stays like this for a while and then the PC reboots itself.

I have completely disengaged the UCC module, which is a hardware CPU unlocking solution, so the CPU is back to being a Phenom II x2 550BE on stock clocks and voltage.

Above screenshot was taken on the CPU being locked back like this.

I also have 2 reports in Event Viewer saying that Software branch of the system registry was damaged and has been recovered. Besides that, nothing unusual.

Ram checked from bootable Memtest USB stick, disk checked.

Link to comment
Share on other sites

I do not have a different AM3 CPU to test, however this one has been with me since previous AM2+ board where it worked as dual core and never ever made any problems.

The GPU is Sapphire HD6850 1GB and I have tortured it with Furmark and alike and it gently lets you know when there's something bad going on with it, like when overheating or overclocked it will first show some slight shader artifacts (like green dots), if you push it further the display driver will reset itself ("The video driver regained stability" or however this translates) and then, when you drive it way off the edge, it will crash, but never like that. The screen just goes black and that's all, no fancy patterns like on the screenshot.

This patter however, I have seen it before, all I have to do is, let's say, launch some Flash videos, easiest way to cause it is to launch Steam, go to Store and play any game trailer - couple of seconds and it's dead.

And again, the strangest things of all is that it works for a week, then it goes mad like this when it's crashing, not booting and freezing, and then it works for another couple of days no matter what you do...

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...