mongo66 Posted December 23, 2008 Posted December 23, 2008 I've been given the wonderful task of troubleshooting an older machine. Without further ado, here goes...System Specs:Athlon XP 2600+ (Barton)Gigabyte GA-7N400S2x256mb DDR PC3200 RAMATI Radeon 9550 (RV360) AGPC-Media CMI8738 6CH-PCI80GB Seagate ATA-100400W PSUWindows XP Service Pack 3Initial observations:Upon turning on the computer, during bios POST, I noticed the following:TRAP 00000006 ================== EXCEPTION =================================tr=0028 cr0=00000011 cr2=00F0836B cr3=00000000gdt limit=03FF base=00017000 idt limit=07FFcs:eip=0008:00060EC4 ss:eip=0010:00060E6C errcode=0000flags=00010002 NoCy NoZr IntDis Down TrapDiseax=00000026 ebx=00000FFF ecx=00000413 edx=0000046B ds=0010 es=0010edi=0046A165 esi=00000000 ebp=00060E62 cr0=00000011 fs=0030 gs=0000This has appeared 3-4 times since I got the computer a week ago. I don't know exactly what it means, but it seems to only appear after changing the memory frequency setting in the bios with dual channel enabled.On the software side of things, I cannot install Windows if RAM is operating in dual channel mode; the system hangs during text-mode setup (specifically while copying driver.cab). The only solution is to disable it. I've also swapped the memory modules around, as well as use a single stick of 256mb DDR. The computer doesn't experience any startup issues in single channel mode, whether I use one or both sticks of RAM. On the other hand, Windows will not boot if dual channel is enabled AND memory frequency is configured by SPD (400mhz):Windows could not start because the following file is missing or corrupt:\WINDOWS\SYSTEM32\CONFIG\SYSTEMIf I change the memory frequency to 100% or Auto (333mhz) in the bios, Windows starts up just fine with dual channel enabled.Crash n' burn !!The crash scenarios mentioned below occur regardless of memory configuration and without any overclocking.(1) Music playback (soundcard; onboard AC97 disabled)Occasionally, the mouse, keyboard and screen freezes after a while. Playback however, remains uninterrupted. Computer can still be shutdown (normally) by pressing the power button. Inserting the soundcard into a different pci slot produced the same results. The soundcard isn't faulty, btw. (2) Music playback (integrated 6-channel AC97 audio)The system comes to a halt with a loud, constant screeching noise. Computer cannot be shutdown normally; must hold down the power button to power off...(3) Video playback (WMP, MPC, etc) results in a BSOD after a few minutes.STOP 0x0000007F (UNEXPECTED_KERNEL_MODE_TRAP)or,TRAP_CAUSE_UNKNOWNSTOP 0x00000012 (0x00000001,0x00000000,0x00000000,0x00000000)*** Note: Using Windows default display drivers, I can *almost* watch an entire movie before any BSOD or system freeze. I also tested the Radeon card in another machine with Catalyst / Omega drivers installed. No video playback issues to speak of... Definitely nothing wrong with the graphics card.(4) Web browsing -- If I have multiple tabs open, or visit flash intensive web pages, the system either freezes or internet connectivity is lost. The system bogs down and eventually crashes... Only after "forced" shutdown or reset, do things return to normal and I'm able to browse again.Steps taken (so far):- Reset bios to default settings- Ran Memtest86... RESULT: no errors.- Checked PSU voltages... RESULT: nominal readings.- Checked harddisk for bad sectors... RESULT: none found.- Reinstalled Windows XP and updated device drivers with no positive effects... Also, there are no heat issues as far as I can tell, nor are there any leaks or bulging capacitors on the motherboard itself. The system is already running the latest bios version. After nearly a week of troubleshooting, I suspect the CPU and/or motherboard is "dying"...What are your thoughts?
cluberti Posted December 23, 2008 Posted December 23, 2008 Let's see a dump file to see if there's anything obvious, or if it looks like hardware.
mongo66 Posted December 23, 2008 Author Posted December 23, 2008 After my initial post, the computer seems to have behaved somewhat... I actually managed to shutdown the computer through the Start menu for a change. I've worked with the GA-7N400 series boards before -- both PRO and entry level boards like this one. Never ran into any problems, up until now. With this particular machine, it certainly doesn't take a lot of effort to initiate a system crash... Playing a game of spider solitaire (with no other programs running) causes a hard freeze. Something I forgot to mention earlier... While checking system temps, I noticed the CPU was running rather cool, around 42-46c. This is quite unusual for an Athlon XP 2600+ with "stock cooling". Idle temps should be in the 50-55c range. Perhaps the temp sensors have gone mad as well... Anyways, here are the dump files as requested (16 in total).Minidump.7z
cluberti Posted December 23, 2008 Posted December 23, 2008 Well, you aren't likely to enjoy reading this analysis, but...//From the very first dump, I saw something very odd:kd> !threadGetPointerFromAddress: unable to read from 8055fbd4THREAD 817e26a0 Cid 0158.0184 Teb: 7ffd8000 Win32Thread: e1778be0 RUNNING on processor 0IRP List: Unable to read nt!_IRP @ 817ece20Not impersonatingGetUlongFromAddress: unable to read from 8055fc6cOwning Process 817afda0 Image: csrss.exeAttached Process N/A Image: N/Affdf0000: Unable to get shared dataWait Start TickCount 13394 Context Switch Count 5316 LargeStackReadMemory error: Cannot get nt!KeMaximumIncrement value.UserTime 00:00:00.000KernelTime 00:00:00.000Start Address 0x75b67cdfStack Init f98a4000 Current f98a39c8 Base f98a4000 Limit f98a0000 Call 0Priority 15 BasePriority 13 PriorityDecrement 0 DecrementCount 0ChildEBP RetAddr Args to Child f98a3998 804f170c 00000100 806f02d0 817e2710 hal!KfLowerIrql+0x17 (FPO: [0,0,0])f98a39d4 804ecae9 00000000 00000000 00000000 nt!KiDeliverApc+0x118 (FPO: [Non-Fpo]) (CONV: stdcall)f98a39ec 804e3b7d 804e3a0d e1778be0 00000000 nt!KiSwapThread+0x64 (FPO: [0,0,0]) (CONV: fastcall)f98a3a24 bf807aec 00000003 817c8af0 00000001 nt!KeWaitForMultipleObjects+0x284 (FPO: [Non-Fpo]) (CONV: stdcall)f98a3a5c bf89b7c4 00000002 817c8af0 bf89e712 win32k!xxxMsgWaitForMultipleObjects+0xb0 (FPO: [Non-Fpo]) (CONV: stdcall)f98a3d30 bf884773 bf9aae80 00000001 f98a3d54 win32k!xxxDesktopThread+0x339 (FPO: [Non-Fpo]) (CONV: stdcall)f98a3d40 bf80110a bf9aae80 f98a3d64 0073fff4 win32k!xxxCreateSystemThreads+0x6a (FPO: [Non-Fpo]) (CONV: stdcall)f98a3d54 804de7ec 00000000 00000022 00000000 win32k!NtUserCallOneParam+0x23 (FPO: [Non-Fpo]) (CONV: stdcall)f98a3d54 7c90e4f4 00000000 00000022 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f98a3d64)WARNING: Frame IP not in any known module. Following frames may be wrong.00000000 00000000 00000000 00000000 00000000 0x7c90e4f4kd> rLast set context:eax=00000000 ebx=804e3a0d ecx=804ecae9 edx=f98a3a24 esi=804e3b7d edi=817e26a0eip=00000001 esp=f98a3a0c ebp=e1778be0 iopl=1 nv up di pl nz ac pe cycs=3a18 ss=0010 ds=562d es=0000 fs=0000 gs=7c40 efl=bf80101d3a18:00000001 ?? ???// Note that the ESP and EBP registers look suspicious - they should be in range of the base limit thread address:Base f98a4000 Limit f98a0000esp=f98a3a0c ebp=e1778be0 <-!!// A dps to the base address:...f98a3fc8 8181adf0f98a3fcc 00000004f98a3fd0 80562340 nt!ExWorkerQueue+0x80f98a3fd4 00167398f98a3fd8 00160168f98a3fdc 00000000f98a3fe0 00000000f98a3fe4 00167398f98a3fe8 00000040f98a3fec 001673a0f98a3ff0 00000000f98a3ff4 00000000f98a3ff8 00000000f98a3ffc 00000000f98a4000 ???????? <- Base address looks to be corruptIf you've tested the memory, then this indicates a bad CPU or motherboard (or both) - this wouldn't happen if the hardware underneath windows didn't have some issue, and register issues are almost always CPU-related or motherboard-related problems...
mongo66 Posted December 24, 2008 Author Posted December 24, 2008 @cluberti,Thanks for your analysis, I really appreciate it The analysis seems to have confirmed my initial suspicions...If you've tested the memory, then this indicates a bad CPU or motherboard (or both) - this wouldn't happen if the hardware underneath windows didn't have some issue, and register issues are almost always CPU-related or motherboard-related problems...I did test the memory, albeit for a couple of hours. Due to recurring system crashes, I figured if the memory was at fault, memtest86 would have detected errors in a short amount of time... I'll test the RAM again later today -- just to be sure. Only this time, I'll let it run for 12 hours.Other interesting developments...As I was using the machine today, it froze up again (no surprise there). After a forced shutdown and restart, I was greeted with: NTLDR is missing ... Press Ctrl-Alt-Del to restartWTF? So I fired up BartPE just to see what was going on... I couldn't access drive C: but was able to access the logical drive (D:). Ran chkdsk on drive C: with the following results:CHKDSK.CMD: Starting...Please enter the drive, mount point or volume name to check (for example "c:")...Enter drive:c:Do you want to fix errors on the disk *and*locate bad sectors and recover readable information (Yes/No)...Enter "y" or "n":nDo you want to fix errors on the disk (Yes/No)...Enter "y" or "n":y----------------------------------------------------------------You have specified to check drive/volume c:With the following options:- Fix errors on the disk----------------------------------------------------------------Start check disk? (Yes/No)...Enter "y" or "n":yRunning: chkdsk.exe c: /fThe type of the file system is NTFS.CHKDSK is verifying files (stage 1 of 3)...File verification completed.CHKDSK is verifying indexes (stage 2 of 3)...Correcting error in index $I30 for file 5.Correcting error in index $I30 for file 5.Sorting index $I30 in file 5.Index verification completed.CHKDSK is recovering lost files.Recovering orphaned file $MFT (0) into directory file 5.Recovering orphaned file $MFTMirr (1) into directory file 5.Recovering orphaned file $LogFile (2) into directory file 5.Recovering orphaned file $Volume (3) into directory file 5.Recovering orphaned file $AttrDef (4) into directory file 5.Recovering orphaned file . (5) into directory file 5.Recovering orphaned file $Bitmap (6) into directory file 5.Recovering orphaned file $Boot (7) into directory file 5.Recovering orphaned file $BadClus (8) into directory file 5.Recovering orphaned file $Secure (9) into directory file 5.Recovering orphaned file $UpCase (10) into directory file 5.Recovering orphaned file $Extend (11) into directory file 5.Recovering orphaned file SYSTEM~1 (27) into directory file 5.Recovering orphaned file System Volume Information (27) into directory file 5.Recovering orphaned file WINDOWS (30) into directory file 5.Recovering orphaned file ntldr (3376) into directory file 5.Recovering orphaned file NTDETECT.COM (3380) into directory file 5.Recovering orphaned file boot.ini (3413) into directory file 5.Recovering orphaned file PROFILES (3419) into directory file 5.Recovering orphaned file VOLUMEID.EXE (3722) into directory file 5.Recovering orphaned file PROGRA~1 (3732) into directory file 5.Recovering orphaned file Program Files (3732) into directory file 5.Recovering orphaned file RECYCLER (9223) into directory file 5.Recovering orphaned file pagefile.sys (9245) into directory file 5.Recovering orphaned file symstore (9318) into directory file 5.CHKDSK is verifying security descriptors (stage 3 of 3)...Security descriptor verification completed.CHKDSK is verifying Usn Journal...Usn Journal verification completed.Correcting errors in the Master File Table (MFT) mirror.Correcting errors in the Volume Bitmap.Windows has made corrections to the file system. 15631213 KB total disk space. 2820732 KB in 11067 files. 3200 KB in 1470 indexes. 0 KB in bad sectors. 92901 KB in use by the system. 65536 KB occupied by the log file. 12714380 KB available on disk. 4096 bytes in each allocation unit. 3907803 total allocation units on disk. 3178595 allocation units available on disk.Unable to obtain a handle to the event log.CHKDSK.CMD: Check disk done...Press any key to continue . . .What else could go wrong?! lol. Luckily enough, chkdsk fixed the errors and I was able to get into windows...
cluberti Posted December 26, 2008 Posted December 26, 2008 Well, an unclean shutdown during any NTFS flush operation would cause it, so if you had to kill power at a "bad" time that could do it (and since chkdsk cleaned it, that's a very likely scenario).
mongo66 Posted December 28, 2008 Author Posted December 28, 2008 I've completed another round of memory tests over the weekend. Memtest86 had been running for almost 16 hours without a single error. Therefore, it's safe to assume the RAM modules aren't faulty. These tests were carried out using the default Front Side Bus (FSB) of 166Mhz (no overclocking). Note: I did not run any tests with dual channel enabled, as I felt it was unnecessary.DRAM Frequency : By SPD << Bios default settingMemory timings : RAM 200Mhz (DDR400) / CAS : 2.5-3-3-8 / Single Channel (64-bits)With all the problems I've encountered with this machine including the minidump analysis, I'm forced to conclude either the processor, motherboard or a combination thereof, has gone "bad". Now I have to figure out how to explain this to the owner of the computer when I return it to him tomorrow.Case closed.
cluberti Posted December 31, 2008 Posted December 31, 2008 With my dump analysis above and your thorough memory test, I think it's pretty safe to say that should be enough.
Recommended Posts
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now