TanMan Posted March 20, 2009 Posted March 20, 2009 One of my friend's machines started getting this STOP 0xF4 error this morning. The machine starts normally, but after it's been on for a few minutes, it blue screens with this error. This error does NOT occur in Safe Mode - the machine happily stays on. The machine is an HP designed for Vista (with an ICH9 chipset, I believe), but I installed XP SP2 for them. This was a year ago, and they've been running happily until now.The error indicates that CSRSS has terminated. This is probably because some driver is causing this error I see in the dump:The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s". but I can't figure out which driver it is.I've tried turning on verifier on all the non-Microsoft drivers, but that didn't help at all (it just caused IRQL problems for NOD32 until I uninstalled it).I've run a CHKDSK /R (it came up clean), uninstalled NOD32, TightVNC and Virtual CloneDrive. I've upgraded the video driver and the sound driver, all to no avail. I have no idea what to try next. Any recommendations would be greatly appreciated.Note that I tried to upload the 126K kernel dump 2 separate times, but it never got past "Uploading File..." Please let me know if I can send you the dump some alternate way.Thanks in advance,TanMan
TanMan Posted March 20, 2009 Author Posted March 20, 2009 Wow, it was really late last night. I was more punchy than I thought. The dump is 124MB, not 124KB. Jeez! No wonder I couldn't upload it!cluberti, I uploaded the kernel dump to a new folder - TanMan.A couple of more notes that I seem to have forgotten last night. The PC has a built-in diagnostic from PC-Doctor, and I ran that - no errors on anything it tested - processor, RAM, or disk. The machine is perfectly stable in Safe Mode, and it consistently blue screens in normal mode within a couple of minutes.Device Manager showed an exclamation point next to the DVD burner. I disconnected that from the system, but there was no change.Well, I think that's everything. I really, really appreciate your help!TanMan
cluberti Posted March 20, 2009 Posted March 20, 2009 // The thread crashing - all you can see is the kernel-mode portion of the stack, and once you're into// nt!KiFastCallEntry with the parameters of the crash, it's too late. Note we came out of user mode// already crashing:1: kd> !threadTHREAD 88de29e0 Cid 03e8.03f8 Teb: 7ffda000 Win32Thread: e27a55a0 RUNNING on processor 1Impersonation token: e400f420 (Level Impersonation)Owning Process 0 Image: <Unknown>Attached Process 88e3a240 Image: csrss.exeWait Start TickCount 8518 Ticks: 0Context Switch Count 970 LargeStackUserTime 00:00:00.234KernelTime 00:00:00.140Win32 Start Address 0x00004998LPC Server thread working on message Id 4998Start Address 0x75b44616Stack Init b909e000 Current b909d6fc Base b909e000 Limit b909b000 Call 0Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 16ChildEBP RetAddr Args to Child b909dd00 805d03ab 000000f4 00000003 88e3a240 nt!KeBugCheckEx+0x1b (FPO: [Non-Fpo]) (CONV: stdcall)b909dd24 805d12af 805d1204 88e3a240 88e3a3b4 nt!PspCatchCriticalBreak+0x75 (FPO: [Non-Fpo]) (CONV: stdcall)b909dd54 8054088c 88e3a488 c0000005 0052ebcc nt!NtTerminateProcess+0x7d (FPO: [Non-Fpo]) (CONV: stdcall)b909dd54 7c90eb94 88e3a488 c0000005 0052ebcc nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b909dd64)WARNING: Frame IP not in any known module. Following frames may be wrong.0052ebcc 00000000 00000000 00000000 00000000 0x7c90eb94// Walking the LPC message back, we find the client thread to this server thread:1: kd> !lpc message 4998Searching message 4998 in threads ... Server thread 88de29e0 is working on message 4998 Client thread 8879e228 waiting a reply from 4998 Searching thread 8879e228 in port rundown queues ...Server communication port 0xe364f780 Handles: 1 References: 1 The LpcDataInfoChainHead queue is empty Connected port: 0xe2dcaa78 Server connection port: 0xe1740cc8Client communication port 0xe2dcaa78 Handles: 1 References: 3 The LpcDataInfoChainHead queue is emptyServer connection port e1740cc8 Name: ApiPort Handles: 1 References: 155 Server process : 88e3a240 (csrss.exe) Queue semaphore : 891d3e08 Semaphore state 0 (0x0) The message queue is empty The LpcDataInfoChainHead queue is emptyDone. // The thread at the other end of this LPC chain (the "client"):1: kd> !thread 8879e228 THREAD 8879e228 Cid 0e10.0e18 Teb: 7ffdf000 Win32Thread: e400a368 WAIT: (WrLpcReply) UserMode Non-Alertable 8879e41c Semaphore Limit 0x1Waiting for reply to LPC MessageId 00004998:Current LPC port e2dcaa78Not impersonatingDeviceMap e1001130Owning Process 0 Image: <Unknown>Attached Process 8879e5c8 Image: sqlcmd.exeWait Start TickCount 8517 Ticks: 1 (0:00:00:00.015)Context Switch Count 64 LargeStackUserTime 00:00:00.000KernelTime 00:00:00.015Win32 Start Address 0x01019521Start Address 0x7c810867Stack Init 9f973000 Current 9f972c50 Base 9f973000 Limit 9f96f000 Call 0Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0ChildEBP RetAddr Args to Child 9f972c68 80502d46 8879e298 8879e228 804faf40 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])9f972c74 804faf40 8879e41c 8879e3f0 8879e228 nt!KiSwapThread+0x8a (FPO: [0,0,0]) (CONV: fastcall)9f972c9c 805a1e87 00000001 00000011 0006d401 nt!KeWaitForSingleObject+0x1c2 (FPO: [Non-Fpo]) (CONV: stdcall)9f972d50 8054088c 000007ec 0006d4a0 0006d4a0 nt!NtRequestWaitReplyPort+0x63d (FPO: [Non-Fpo]) (CONV: stdcall)9f972d50 7c90eb94 000007ec 0006d4a0 0006d4a0 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ 9f972d64)WARNING: Frame IP not in any known module. Following frames may be wrong.0006d46c 00000000 00000000 00000000 00000000 0x7c90eb94I can't see the user-mode section of this, but something is interfering with the RPC on this box (in user-mode) enough to cause CSRSS to crash. Since I can't see what's happening in user-mode, I'm not sure what for sure would be interfering with LRPC on the machine. It's definitely a driver, but because you didn't configure for a complete memory dump (this is only a kernel dump) I can't see anything outside the kernel-mode stack, and whatever happened in this thread happened in user-mode first. You'll need a *complete* memory dump (see the sticky at the top of this section) before I can help you further, really. And, honestly, making sure all of the drivers on the system (including the filter drivers, not just hardware drivers - filter drivers like any antivirus, antispyware, etc) will probably stabilize the box.If you get a complete dump, upload it and we'll take a look.
TanMan Posted March 21, 2009 Author Posted March 21, 2009 I took the full dump and am uploading the full 2GB to your server now. Even though I pay for 5Gb up, I'm only getting about 360KB/s, so it'll still be another 70 minutes or so until it's done.However.I saw what you did in WinDBG (THANK YOU!!!!), so I did the same thing with the new dump. I found the same program at the requesting end of the RPC - sqlcmd.exe. The PC has MS SQL Server 2005 installed on it, so this makes sense.I saw that the machine had updates pending, but it couldn't get through the updates since it would blue-screen before it could complete. I thought nothing of it, but I did notice that MS SQL Server was one of the updates waiting to be installed. So I'm thinking that perhaps the SQL Server update did not complete normally (she's been known to turn off the machine while it was busy), and SQL Server got corrupted.I would not have thought SQL Server could blue-screen the machine, but maybe...Since Windows Update doesn't work in Safe Mode (the ActiveX control won't activate), I'm downloading the SQL Server 2005 SP3 and assorted tools now. I'll be done with my updates before the dump is done uploading. I'll update this ticket with my results.Thanks again, cluberti!
TanMan Posted March 21, 2009 Author Posted March 21, 2009 Yup, that was it! A corrupt SQL Server Express 2005 was causing CSRSS to abend. Nice. Good job, MS!!!I was not able to update SQL Server from Safe Mode. So I changed all the SQL Server services to DISABLED and started Windows. When Windows didn't blue-screen, I knew I had it. Windows Update started at that point, and wanted to update SQL Server, so I let it. After that was done, I changed the services back to Automatic and started them. The system still hasn't blue-screened, so that was it.My guess is she turned off the computer in the middle of the Windows Update, so SQL Server got corrupted. How this in fact caused CSRSS to abend, I still don't understand (what a crappy system design!), but at least it's fixed now.The dump will be done uploading in about 12 minutes, so I'll let it finish. I'd still like to know how you use this to get more information. So if you have time, I'd appreciate an update on how you analyzed this dump.cluberti, you are a scholar and a gentleman. If you're ever in New Jersey, drop me a line and I'll buy you a beer.Thanks again,TanMan
TanMan Posted March 21, 2009 Author Posted March 21, 2009 That upload is finished should you care to show me more of your magic.
cluberti Posted March 21, 2009 Posted March 21, 2009 Looks better - still the same crash, obviously, but it makes more sense now - you can see sqlcmd.exe is somehow corrupt here:1: kd> da 88efe4cc88efe4cc "csrss.exe"1: kd> da 805d1204805d1204 "Terminating critical process 0x%"805d1224 "p (%s)."1: kd> !threadTHREAD 89114420 Cid 0404.048c Teb: 7ffd5000 Win32Thread: e2bd43c8 RUNNING on processor 1Impersonation token: e2e039e0 (Level Impersonation)Owning Process 0 Image: <Unknown>Attached Process 88efe358 Image: csrss.exeWait Start TickCount 8079 Ticks: 0Context Switch Count 401 LargeStackUserTime 00:00:00.078KernelTime 00:00:00.031Win32 Start Address 0x00001cf7LPC Server thread working on message Id 1cf7Start Address CSRSRV!CsrApiRequestThread (0x75b44616)Stack Init a509e000 Current a509d744 Base a509e000 Limit a509b000 Call 0Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 16ChildEBP RetAddr Args to Child a509dd00 805d03ab 000000f4 00000003 88efe358 nt!KeBugCheckEx+0x1b (FPO: [5,0,0])a509dd24 805d12af 805d1204 88efe358 88efe4cc nt!PspCatchCriticalBreak+0x75 (FPO: [3,0,0])a509dd54 8054088c 88efe5a0 c0000005 00c7ebcc nt!NtTerminateProcess+0x7d (FPO: [2,4,4])a509dd54 7c90eb94 88efe5a0 c0000005 00c7ebcc nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ a509dd64)00c7eb88 7c90e89a 75b432c4 ffffffff c0000005 ntdll!KiFastSystemCallRet (FPO: [0,0,0])00c7eb8c 75b432c4 ffffffff c0000005 00000000 ntdll!ZwTerminateProcess+0xc (FPO: [2,0,0])00c7ebcc 75b44aea 00c7ebf4 75b468b1 00c7ebfc CSRSRV!CsrUnhandledExceptionFilter+0xc0 (FPO: [1,10,0])00c7ebd4 75b468b1 00c7ebfc 00000001 00c7ebfc CSRSRV!CsrApiRequestThread+0x4d4 (FPO: [Non-Fpo])00c7ebfc 7c9037bf 00c7ece8 00c7ffe4 00c7ed04 CSRSRV!_except_handler3+0x61 (FPO: [Uses EBP] [3,0,7])00c7ec20 7c90378b 00c7ece8 00c7ffe4 00c7ed04 ntdll!ExecuteHandler2+0x2600c7ecd0 7c90eafa 00000000 00c7ed04 00c7ece8 ntdll!ExecuteHandler+0x2400c7ecd0 7c9106c3 00000000 00c7ed04 00c7ece8 ntdll!KiUserExceptionDispatcher+0xe (FPO: [2,0,0]) (CONTEXT @ 00c7ed04)00c7f1f0 75ea2137 00160000 00000000 0000009c ntdll!RtlAllocateHeap+0x1da (FPO: [Non-Fpo])00c7f238 75e92f21 75e92f38 0000005b 75e9c578 sxs!CSxsPointerBase<CXMLNamespaceManager::CNamespacePrefix,CSxsPointer<CXMLNamespaceManager::CNamespacePrefix,CXMLNamespaceManager::CNamespacePrefix::ms_szTypeName> >::HrAllocateBase+0x59 (FPO: [3,11,0])00c7f4bc 75e938d2 00179c38 00000000 00000005 sxs!CXMLNamespaceManager::OnCreateNode+0x12e (FPO: [4,153,4])00c7f520 75e9435f 0017ac28 00179c38 00000000 sxs!CNodeFactory::CreateNode+0xa3 (FPO: [5,16,4])00c7f5a8 75e98baa 00179c38 00000005 00177580 sxs!XMLParser::Run+0x2fc (FPO: [2,24,4])00c7f914 75e99a0f 00177580 00173e68 00177580 sxs!SxspIncorporateAssembly+0x8b8 (FPO: [2,212,4])00c7f960 75e998cd 00177580 00000000 00c7ff14 sxs!SxspCloseManifestGraph+0x98 (FPO: [1,12,4])00c7fdfc 75b5a5ed 00c7fe5c 00000004 00c7ff14 sxs!SxsGenerateActivationContext+0x54c (FPO: [1,289,4])00c7fe9c 75b5a760 00000054 000006f4 01c7ff14 basesrv!BaseSrvSxsCreateActivationContextFromStruct+0x194 (FPO: [4,34,4])00c7fed0 75b44a47 00c7feec 00c7ffd8 00000005 basesrv!BaseSrvSxsCreateActivationContextFromMessage+0x79 (FPO: [2,4,4])00c7fff4 00000000 00000000 000000c8 000001e6 CSRSRV!CsrApiRequestThread+0x431 (FPO: [Non-Fpo])1: kd> !lpc message 1cf7Searching message 1cf7 in threads ... Server thread 89114420 is working on message 1cf7 Client thread 8876e6e0 waiting a reply from 1cf7 Searching thread 8876e6e0 in port rundown queues ...Server communication port 0xe118b870 Handles: 1 References: 1 The LpcDataInfoChainHead queue is empty Connected port: 0xe2ecabf0 Server connection port: 0xe1742b28Client communication port 0xe2ecabf0 Handles: 1 References: 3 The LpcDataInfoChainHead queue is emptyServer connection port e1742b28 Name: ApiPort Handles: 1 References: 99 Server process : 88efe358 (csrss.exe) Queue semaphore : 891ce628 Semaphore state 0 (0x0) The message queue is empty The LpcDataInfoChainHead queue is emptyDone. 1: kd> !thread 8876e6e0THREAD 8876e6e0 Cid 0bd0.0bd4 Teb: 7ffdf000 Win32Thread: e13262d0 WAIT: (WrLpcReply) UserMode Non-Alertable 8876e8d4 Semaphore Limit 0x1Waiting for reply to LPC MessageId 00001cf7:Current LPC port e2ecabf0Not impersonatingDeviceMap e1001130Owning Process 0 Image: <Unknown>Attached Process 89160020 Image: sqlcmd.exeWait Start TickCount 8078 Ticks: 1 (0:00:00:00.015)Context Switch Count 62 LargeStackUserTime 00:00:00.000KernelTime 00:00:00.015Win32 Start Address 0x01019521Start Address KERNEL32!BaseProcessStartThunk (0x7c810867)Stack Init ba198000 Current ba197c50 Base ba198000 Limit ba194000 Call 0Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0ChildEBP RetAddr Args to Child ba197c68 80502d46 8876e750 8876e6e0 804faf40 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])ba197c74 804faf40 8876e8d4 8876e8a8 8876e6e0 nt!KiSwapThread+0x8a (FPO: [0,0,0])ba197c9c 805a1e87 00000001 00000011 0006d401 nt!KeWaitForSingleObject+0x1c2 (FPO: [5,5,4])ba197d50 8054088c 000007ec 0006d4a0 0006d4a0 nt!NtRequestWaitReplyPort+0x63d (FPO: [Non-Fpo])ba197d50 7c90eb94 000007ec 0006d4a0 0006d4a0 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ ba197d64)0006d448 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd ntdll!KiFastSystemCallRet (FPO: [0,0,0])WARNING: Frame IP not in any known module. Following frames may be wrong.0006d46c cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d470 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d474 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d478 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d47c cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d480 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d484 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d488 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d48c cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d490 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d494 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d498 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d49c cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4a0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4a4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4a8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4ac cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4b0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4b4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4b8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4bc cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4c0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4c4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4c8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4cc cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4d0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4d4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4d8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4dc cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4e0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4e4 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4e8 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4ec cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd0006d4f0 cdcdcdcd cdcdcdcd cdcdcdcd cdcdcdcd 0xcdcdcdcd// Looks like a SQL update *was* actually running at the time - very interesting indeed:PROCESS 885bb020 SessionId: 0 Cid: 0b48 Peb: 7ffde000 ParentCid: 0b10 DirBase: 0a6403a0 ObjectTable: e154e5f0 HandleCount: 42. Image: SQLServer2005ExpressSP3-KB955706-x86-ENU.exeIt looks like something has gone in and corrupted memory - knowing stopping SQL and updating it fixed it, I'd say sqlcmd.exe itself was probably corrupted in some way (I can't check, as the dump doesn't contain binary data about processes, so I can't !chkimg it, but the above is pretty damning). I guess the update didn't work and caused the crash.
cluberti Posted March 23, 2009 Posted March 23, 2009 Note that a day after working on this, there was another system with the exact same problem.
TanMan Posted March 23, 2009 Author Posted March 23, 2009 Interesting that someone else had the same problem, and I'm glad my solution worked for him, too. Perhaps there's a bigger problem going on.The machine has SQL Server 2005 Express installed. After I got the machine running again, I tried running Windows Update again, and along with a bunch of other updates, it again offered an SP3 update for SQL Server 2005 (this is after I manually installed SQL Server 2005 Express SP3 and fixed the machine). Curious, I attempted to install the update, and the machine crashed with the same STOP in CSRSS. So maybe she didn't turn off the computer mid-update, maybe it's the update itself that's causing the problem. I turned off Windows Update.Note that I rebuilt the machine for her last year with XP SP2 (she had too many problems with the Vista that came pre-installed), and I had delivered it with Windows Update turned off. Somehow, Windows Update got turned on recently, and that appears to be when this problem happened. The machine still has XP SP2, not SP3. It may have some post-SP2 updates (I don't know what else Windows Update installed before the problem), but I know there are still a bunch of other updates Windows Update wanted to install.I noticed there were about 25 folders with GUID names on the external drive, and each folder appeared to have the same contents (they all had the same files named with SQL, like SQLCODE.EXE). So when the machine was set to reboot after a STOP, it appears Windows Update redownloaded the same SQL Server 2005 SP3 update and tried to install it after every reboot. That's why it took several minutes to crash. I tried to delete all the GUID folders, but the delete failed because something was in use. So I gave up, made sure the system was still stable, and just returned the machine.So I think the crash is happening from Windows Update trying to install the SQL Server 2005 SP3 update, not from SQL Server just running. I think Windows Update is downloading a bad update, am not that it was just a bad download - I think perhaps it's downloading SQL Server 2005 SP3 for the full version, not the Express version. Or perhaps the version of SQL Server 2005 Express SP3 on Windows Update is corrupted. Either way, I think the downloaded version of the update appears to be what's causing the problem.HTH,TanMan
cluberti Posted March 23, 2009 Posted March 23, 2009 For what it's worth, it looks like this was documented:http://blogs.msdn.com/psssql/archive/2009/...ermination.aspx
TanMan Posted March 23, 2009 Author Posted March 23, 2009 For what it's worth, it looks like this was documentedWell, a blog entry describing the problem is not exactly documenting the problem. Acknowledging the problem, yes, documenting it, no. Since the blog entry was made on Jan 7, I would have thought Microsoft would have fixed this by now. Especially since 921337 identifies the manifest problem as being caused by Visual Studio 2005, and that ticket was opened in 2006. Thanks for finding the blog entry, though. My searches had not uncovered this post.
Recommended Posts
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now