Jump to content

STOP 0xF4 - CSRSS is crashing


Recommended Posts

Posted

One of my friend's machines started getting this STOP 0xF4 error this morning. The machine starts normally, but after it's been on for a few minutes, it blue screens with this error. This error does NOT occur in Safe Mode - the machine happily stays on. The machine is an HP designed for Vista (with an ICH9 chipset, I believe), but I installed XP SP2 for them. This was a year ago, and they've been running happily until now.

The error indicates that CSRSS has terminated. This is probably because some driver is causing this error I see in the dump:

The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

but I can't figure out which driver it is.

I've tried turning on verifier on all the non-Microsoft drivers, but that didn't help at all (it just caused IRQL problems for NOD32 until I uninstalled it).

I've run a CHKDSK /R (it came up clean), uninstalled NOD32, TightVNC and Virtual CloneDrive. I've upgraded the video driver and the sound driver, all to no avail. I have no idea what to try next. Any recommendations would be greatly appreciated.

Note that I tried to upload the 126K kernel dump 2 separate times, but it never got past "Uploading File..." Please let me know if I can send you the dump some alternate way.

Thanks in advance,

TanMan


Posted

Wow, it was really late last night. I was more punchy than I thought. The dump is 124MB, not 124KB. Jeez! No wonder I couldn't upload it!

cluberti, I uploaded the kernel dump to a new folder - TanMan.

A couple of more notes that I seem to have forgotten last night. :) The PC has a built-in diagnostic from PC-Doctor, and I ran that - no errors on anything it tested - processor, RAM, or disk. The machine is perfectly stable in Safe Mode, and it consistently blue screens in normal mode within a couple of minutes.

Device Manager showed an exclamation point next to the DVD burner. I disconnected that from the system, but there was no change.

Well, I think that's everything. I really, really appreciate your help!

TanMan

Posted

// The thread crashing - all you can see is the kernel-mode portion of the stack, and once you're into
// nt!KiFastCallEntry with the parameters of the crash, it's too late. Note we came out of user mode
// already crashing:
1: kd> !thread
THREAD 88de29e0 Cid 03e8.03f8 Teb: 7ffda000 Win32Thread: e27a55a0 RUNNING on processor 1
Impersonation token: e400f420 (Level Impersonation)
Owning Process 0 Image: <Unknown>
Attached Process 88e3a240 Image: csrss.exe
Wait Start TickCount 8518 Ticks: 0
Context Switch Count 970 LargeStack
UserTime 00:00:00.234
KernelTime 00:00:00.140
Win32 Start Address 0x00004998
LPC Server thread working on message Id 4998
Start Address 0x75b44616
Stack Init b909e000 Current b909d6fc Base b909e000 Limit b909b000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr Args to Child
b909dd00 805d03ab 000000f4 00000003 88e3a240 nt!KeBugCheckEx+0x1b (FPO: [Non-Fpo]) (CONV: stdcall)
b909dd24 805d12af 805d1204 88e3a240 88e3a3b4 nt!PspCatchCriticalBreak+0x75 (FPO: [Non-Fpo]) (CONV: stdcall)
b909dd54 8054088c 88e3a488 c0000005 0052ebcc nt!NtTerminateProcess+0x7d (FPO: [Non-Fpo]) (CONV: stdcall)
b909dd54 7c90eb94 88e3a488 c0000005 0052ebcc nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b909dd64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
0052ebcc 00000000 00000000 00000000 00000000 0x7c90eb94

// Walking the LPC message back, we find the client thread to this server thread:
1: kd> !lpc message 4998
Searching message 4998 in threads ...
Server thread 88de29e0 is working on message 4998
Client thread 8879e228 waiting a reply from 4998
Searching thread 8879e228 in port rundown queues ...

Server communication port 0xe364f780
Handles: 1 References: 1
The LpcDataInfoChainHead queue is empty
Connected port: 0xe2dcaa78 Server connection port: 0xe1740cc8

Client communication port 0xe2dcaa78
Handles: 1 References: 3
The LpcDataInfoChainHead queue is empty

Server connection port e1740cc8 Name: ApiPort
Handles: 1 References: 155
Server process : 88e3a240 (csrss.exe)
Queue semaphore : 891d3e08
Semaphore state 0 (0x0)
The message queue is empty
The LpcDataInfoChainHead queue is empty
Done.

// The thread at the other end of this LPC chain (the "client"):
1: kd> !thread 8879e228
THREAD 8879e228 Cid 0e10.0e18 Teb: 7ffdf000 Win32Thread: e400a368 WAIT: (WrLpcReply) UserMode Non-Alertable
8879e41c Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00004998:
Current LPC port e2dcaa78
Not impersonating
DeviceMap e1001130
Owning Process 0 Image: <Unknown>
Attached Process 8879e5c8 Image: sqlcmd.exe
Wait Start TickCount 8517 Ticks: 1 (0:00:00:00.015)
Context Switch Count 64 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.015
Win32 Start Address 0x01019521
Start Address 0x7c810867
Stack Init 9f973000 Current 9f972c50 Base 9f973000 Limit 9f96f000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr Args to Child
9f972c68 80502d46 8879e298 8879e228 804faf40 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])
9f972c74 804faf40 8879e41c 8879e3f0 8879e228 nt!KiSwapThread+0x8a (FPO: [0,0,0]) (CONV: fastcall)
9f972c9c 805a1e87 00000001 00000011 0006d401 nt!KeWaitForSingleObject+0x1c2 (FPO: [Non-Fpo]) (CONV: stdcall)
9f972d50 8054088c 000007ec 0006d4a0 0006d4a0 nt!NtRequestWaitReplyPort+0x63d (FPO: [Non-Fpo]) (CONV: stdcall)
9f972d50 7c90eb94 000007ec 0006d4a0 0006d4a0 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ 9f972d64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
0006d46c 00000000 00000000 00000000 00000000 0x7c90eb94

I can't see the user-mode section of this, but something is interfering with the RPC on this box (in user-mode) enough to cause CSRSS to crash. Since I can't see what's happening in user-mode, I'm not sure what for sure would be interfering with LRPC on the machine. It's definitely a driver, but because you didn't configure for a complete memory dump (this is only a kernel dump) I can't see anything outside the kernel-mode stack, and whatever happened in this thread happened in user-mode first. You'll need a *complete* memory dump (see the sticky at the top of this section) before I can help you further, really. And, honestly, making sure all of the drivers on the system (including the filter drivers, not just hardware drivers - filter drivers like any antivirus, antispyware, etc) will probably stabilize the box.

If you get a complete dump, upload it and we'll take a look.

Posted

I took the full dump and am uploading the full 2GB to your server now. Even though I pay for 5Gb up, I'm only getting about 360KB/s, so it'll still be another 70 minutes or so until it's done.

However.

I saw what you did in WinDBG (THANK YOU!!!!), so I did the same thing with the new dump. I found the same program at the requesting end of the RPC - sqlcmd.exe. The PC has MS SQL Server 2005 installed on it, so this makes sense.

I saw that the machine had updates pending, but it couldn't get through the updates since it would blue-screen before it could complete. I thought nothing of it, but I did notice that MS SQL Server was one of the updates waiting to be installed. So I'm thinking that perhaps the SQL Server update did not complete normally (she's been known to turn off the machine while it was busy), and SQL Server got corrupted.

I would not have thought SQL Server could blue-screen the machine, but maybe...

Since Windows Update doesn't work in Safe Mode (the ActiveX control won't activate), I'm downloading the SQL Server 2005 SP3 and assorted tools now. I'll be done with my updates before the dump is done uploading. I'll update this ticket with my results.

Thanks again, cluberti!

Posted

Yup, that was it! A corrupt SQL Server Express 2005 was causing CSRSS to abend. Nice. Good job, MS!!!

I was not able to update SQL Server from Safe Mode. So I changed all the SQL Server services to DISABLED and started Windows. When Windows didn't blue-screen, I knew I had it. Windows Update started at that point, and wanted to update SQL Server, so I let it. After that was done, I changed the services back to Automatic and started them. The system still hasn't blue-screened, so that was it.

My guess is she turned off the computer in the middle of the Windows Update, so SQL Server got corrupted. How this in fact caused CSRSS to abend, I still don't understand (what a crappy system design!), but at least it's fixed now.

The dump will be done uploading in about 12 minutes, so I'll let it finish. I'd still like to know how you use this to get more information. So if you have time, I'd appreciate an update on how you analyzed this dump.

cluberti, you are a scholar and a gentleman. If you're ever in New Jersey, drop me a line and I'll buy you a beer.

Thanks again,

TanMan

Posted

Looks better - still the same crash, obviously, but it makes more sense now - you can see sqlcmd.exe is somehow corrupt here:

1: kd> da 88efe4cc
88efe4cc "csrss.exe"
1: kd> da 805d1204
805d1204 "Terminating critical process 0x%"
805d1224 "p (%s)."
1: kd> !thread
THREAD 89114420 Cid 0404.048c Teb: 7ffd5000 Win32Thread: e2bd43c8 RUNNING on processor 1
Impersonation token: e2e039e0 (Level Impersonation)
Owning Process 0 Image: <Unknown>
Attached Process 88efe358 Image: csrss.exe
Wait Start TickCount 8079 Ticks: 0
Context Switch Count 401 LargeStack
UserTime 00:00:00.078
KernelTime 00:00:00.031
Win32 Start Address 0x00001cf7
LPC Server thread working on message Id 1cf7
Start Address CSRSRV!CsrApiRequestThread (0x75b44616)
Stack Init a509e000 Current a509d744 Base a509e000 Limit a509b000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr Args to Child
a509dd00 805d03ab 000000f4 00000003 88efe358 nt!KeBugCheckEx+0x1b (FPO: [5,0,0])
a509dd24 805d12af 805d1204 88efe358 88efe4cc nt!PspCatchCriticalBreak+0x75 (FPO: [3,0,0])
a509dd54 8054088c 88efe5a0 c0000005 00c7ebcc nt!NtTerminateProcess+0x7d (FPO: [2,4,4])
a509dd54 7c90eb94 88efe5a0 c0000005 00c7ebcc nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ a509dd64)
00c7eb88 7c90e89a 75b432c4 ffffffff c0000005 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
00c7eb8c 75b432c4 ffffffff c0000005 00000000 ntdll!ZwTerminateProcess+0xc (FPO: [2,0,0])
00c7ebcc 75b44aea 00c7ebf4 75b468b1 00c7ebfc CSRSRV!CsrUnhandledExceptionFilter+0xc0 (FPO: [1,10,0])
00c7ebd4 75b468b1 00c7ebfc 00000001 00c7ebfc CSRSRV!CsrApiRequestThread+0x4d4 (FPO: [Non-Fpo])
00c7ebfc 7c9037bf 00c7ece8 00c7ffe4 00c7ed04 CSRSRV!_except_handler3+0x61 (FPO: [Uses EBP] [3,0,7])
00c7ec20 7c90378b 00c7ece8 00c7ffe4 00c7ed04 ntdll!ExecuteHandler2+0x26
00c7ecd0 7c90eafa 00000000 00c7ed04 00c7ece8 ntdll!ExecuteHandler+0x24
00c7ecd0 7c9106c3 00000000 00c7ed04 00c7ece8 ntdll!KiUserExceptionDispatcher+0xe (FPO: [2,0,0]) (CONTEXT @ 00c7ed04)
00c7f1f0 75ea2137 00160000 00000000 0000009c ntdll!RtlAllocateHeap+0x1da (FPO: [Non-Fpo])
00c7f238 75e92f21 75e92f38 0000005b 75e9c578 sxs!CSxsPointerBase<CXMLNamespaceManager::CNamespacePrefix,CSxsPointer<CXMLNamespaceManager::CNamespacePrefix,CXMLNamespaceManager::CNamespacePrefix::ms_szTypeName> >::HrAllocateBase+0x59 (FPO: [3,11,0])
00c7f4bc 75e938d2 00179c38 00000000 00000005 sxs!CXMLNamespaceManager::OnCreateNode+0x12e (FPO: [4,153,4])
00c7f520 75e9435f 0017ac28 00179c38 00000000 sxs!CNodeFactory::CreateNode+0xa3 (FPO: [5,16,4])
00c7f5a8 75e98baa 00179c38 00000005 00177580 sxs!XMLParser::Run+0x2fc (FPO: [2,24,4])
00c7f914 75e99a0f 00177580 00173e68 00177580 sxs!SxspIncorporateAssembly+0x8b8 (FPO: [2,212,4])
00c7f960 75e998cd 00177580 00000000 00c7ff14 sxs!SxspCloseManifestGraph+0x98 (FPO: [1,12,4])
00c7fdfc 75b5a5ed 00c7fe5c 00000004 00c7ff14 sxs!SxsGenerateActivationContext+0x54c (FPO: [1,289,4])
00c7fe9c 75b5a760 00000054 000006f4 01c7ff14 basesrv!BaseSrvSxsCreateActivationContextFromStruct+0x194 (FPO: [4,34,4])
00c7fed0 75b44a47 00c7feec 00c7ffd8 00000005 basesrv!BaseSrvSxsCreateActivationContextFromMessage+0x79 (FPO: [2,4,4])
00c7fff4 00000000 00000000 000000c8 000001e6 CSRSRV!CsrApiRequestThread+0x431 (FPO: [Non-Fpo])

1: kd> !lpc message 1cf7
Searching message 1cf7 in threads ...
Server thread 89114420 is working on message 1cf7
Client thread 8876e6e0 waiting a reply from 1cf7
Searching thread 8876e6e0 in port rundown queues ...

Server communication port 0xe118b870
Handles: 1 References: 1
The LpcDataInfoChainHead queue is empty
Connected port: 0xe2ecabf0 Server connection port: 0xe1742b28

Client communication port 0xe2ecabf0
Handles: 1 References: 3
The LpcDataInfoChainHead queue is empty

Server connection port e1742b28 Name: ApiPort
Handles: 1 References: 99
Server process : 88efe358 (csrss.exe)
Queue semaphore : 891ce628
Semaphore state 0 (0x0)
The message queue is empty
The LpcDataInfoChainHead queue is empty
Done.

1: kd> !thread 8876e6e0
THREAD 8876e6e0 Cid 0bd0.0bd4 Teb: 7ffdf000 Win32Thread: e13262d0 WAIT: (WrLpcReply) UserMode Non-Alertable
8876e8d4 Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00001cf7:
Current LPC port e2ecabf0
Not impersonating
DeviceMap e1001130
Owning Process 0 Image: <Unknown>
Attached Process 89160020 Image: sqlcmd.exe
Wait Start TickCount 8078 Ticks: 1 (0:00:00:00.015)
Context Switch Count 62 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.015
Win32 Start Address 0x01019521
Start Address KERNEL32!BaseProcessStartThunk (0x7c810867)
Stack Init ba198000 Current ba197c50 Base ba198000 Limit ba194000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr Args to Child
ba197c68 80502d46 8876e750 8876e6e0 804faf40 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])
ba197c74 804faf40 8876e8d4 8876e8a8 8876e6e0 nt!KiSwapThread+0x8a (FPO: [0,0,0])
ba197c9c 805a1e87 00000001 00000011 0006d401 nt!KeWaitForSingleObject+0x1c2 (FPO: [5,5,4])
ba197d50 8054088c 000007ec 0006d4a0 0006d4a0 nt!NtRequestWaitReplyPort+0x63d (FPO: [Non-Fpo])
ba197d50 7c90eb94 000007ec 0006d4a0 0006d4a0 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ ba197d64)
0006d448 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd ntdll!KiFastSystemCallRet (FPO: [0,0,0])
WARNING: Frame IP not in any known module. Following frames may be wrong.
0006d46c cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d470 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d474 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d478 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d47c cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d480 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d484 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d488 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d48c cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d490 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d494 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d498 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d49c cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4a0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4a4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4a8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4ac cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4b0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4b4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4b8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4bc cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4c0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4c4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4c8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4cc cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4d0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4d4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4d8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4dc cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4e0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4e4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4e8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4ec cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd
0006d4f0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd

// Looks like a SQL update *was* actually running at the time - very interesting indeed:
PROCESS 885bb020 SessionId: 0 Cid: 0b48 Peb: 7ffde000 ParentCid: 0b10
DirBase: 0a6403a0 ObjectTable: e154e5f0 HandleCount: 42.
Image: SQLServer2005ExpressSP3-KB955706-x86-ENU.exe

It looks like something has gone in and corrupted memory - knowing stopping SQL and updating it fixed it, I'd say sqlcmd.exe itself was probably corrupted in some way (I can't check, as the dump doesn't contain binary data about processes, so I can't !chkimg it, but the above is pretty damning). I guess the update didn't work and caused the crash.

Posted

Interesting that someone else had the same problem, and I'm glad my solution worked for him, too. Perhaps there's a bigger problem going on.

The machine has SQL Server 2005 Express installed. After I got the machine running again, I tried running Windows Update again, and along with a bunch of other updates, it again offered an SP3 update for SQL Server 2005 (this is after I manually installed SQL Server 2005 Express SP3 and fixed the machine). Curious, I attempted to install the update, and the machine crashed with the same STOP in CSRSS. So maybe she didn't turn off the computer mid-update, maybe it's the update itself that's causing the problem. I turned off Windows Update.

Note that I rebuilt the machine for her last year with XP SP2 (she had too many problems with the Vista that came pre-installed), and I had delivered it with Windows Update turned off. Somehow, Windows Update got turned on recently, and that appears to be when this problem happened. The machine still has XP SP2, not SP3. It may have some post-SP2 updates (I don't know what else Windows Update installed before the problem), but I know there are still a bunch of other updates Windows Update wanted to install.

I noticed there were about 25 folders with GUID names on the external drive, and each folder appeared to have the same contents (they all had the same files named with SQL, like SQLCODE.EXE). So when the machine was set to reboot after a STOP, it appears Windows Update redownloaded the same SQL Server 2005 SP3 update and tried to install it after every reboot. That's why it took several minutes to crash. I tried to delete all the GUID folders, but the delete failed because something was in use. So I gave up, made sure the system was still stable, and just returned the machine.

So I think the crash is happening from Windows Update trying to install the SQL Server 2005 SP3 update, not from SQL Server just running. I think Windows Update is downloading a bad update, am not that it was just a bad download - I think perhaps it's downloading SQL Server 2005 SP3 for the full version, not the Express version. Or perhaps the version of SQL Server 2005 Express SP3 on Windows Update is corrupted. Either way, I think the downloaded version of the update appears to be what's causing the problem.

HTH,

TanMan

Posted
For what it's worth, it looks like this was documented

Well, a blog entry describing the problem is not exactly documenting the problem. ;) Acknowledging the problem, yes, documenting it, no. Since the blog entry was made on Jan 7, I would have thought Microsoft would have fixed this by now. Especially since 921337 identifies the manifest problem as being caused by Visual Studio 2005, and that ticket was opened in 2006. :o

Thanks for finding the blog entry, though. My searches had not uncovered this post.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...