Jump to content

Having some serious hardware issues


mongo66

Recommended Posts

I've been given the wonderful task of troubleshooting an older machine. Without further ado, here goes...

System Specs:

Athlon XP 2600+ (Barton)

Gigabyte GA-7N400S

2x256mb DDR PC3200 RAM

ATI Radeon 9550 (RV360) AGP

C-Media CMI8738 6CH-PCI

80GB Seagate ATA-100

400W PSU

Windows XP Service Pack 3

Initial observations:

Upon turning on the computer, during bios POST, I noticed the following:

TRAP 00000006  ================== EXCEPTION =================================

tr=0028 cr0=00000011 cr2=00F0836B cr3=00000000

gdt limit=03FF base=00017000 idt limit=07FF

cs:eip=0008:00060EC4 ss:eip=0010:00060E6C errcode=0000

flags=00010002 NoCy NoZr IntDis Down TrapDis

eax=00000026 ebx=00000FFF ecx=00000413 edx=0000046B ds=0010 es=0010
edi=0046A165 esi=00000000 ebp=00060E62 cr0=00000011 fs=0030 gs=0000

This has appeared 3-4 times since I got the computer a week ago. I don't know exactly what it means, but it seems to only appear after changing the memory frequency setting in the bios with dual channel enabled.

On the software side of things, I cannot install Windows if RAM is operating in dual channel mode; the system hangs during text-mode setup (specifically while copying driver.cab). The only solution is to disable it.

I've also swapped the memory modules around, as well as use a single stick of 256mb DDR. The computer doesn't experience any startup issues in single channel mode, whether I use one or both sticks of RAM. On the other hand, Windows will not boot if dual channel is enabled AND memory frequency is configured by SPD (400mhz):

Windows could not start because the following file is missing or corrupt:

\WINDOWS\SYSTEM32\CONFIG\SYSTEM

If I change the memory frequency to 100% or Auto (333mhz) in the bios, Windows starts up just fine with dual channel enabled.

Crash n' burn !!

The crash scenarios mentioned below occur regardless of memory configuration and without any overclocking.

(1) Music playback (soundcard; onboard AC97 disabled)

Occasionally, the mouse, keyboard and screen freezes after a while. Playback however, remains uninterrupted. Computer can still be shutdown (normally) by pressing the power button. Inserting the soundcard into a different pci slot produced the same results. The soundcard isn't faulty, btw.

(2) Music playback (integrated 6-channel AC97 audio)

The system comes to a halt with a loud, constant screeching noise. Computer cannot be shutdown normally; must hold down the power button to power off...

(3) Video playback (WMP, MPC, etc) results in a BSOD after a few minutes.

STOP 0x0000007F (UNEXPECTED_KERNEL_MODE_TRAP)

or,

TRAP_CAUSE_UNKNOWN

STOP 0x00000012 (0x00000001,0x00000000,0x00000000,0x00000000)

*** Note: Using Windows default display drivers, I can *almost* watch an entire movie before any BSOD or system freeze. I also tested the Radeon card in another machine with Catalyst / Omega drivers installed. No video playback issues to speak of... Definitely nothing wrong with the graphics card.

(4) Web browsing -- If I have multiple tabs open, or visit flash intensive web pages, the system either freezes or internet connectivity is lost. The system bogs down and eventually crashes... Only after "forced" shutdown or reset, do things return to normal and I'm able to browse again.

Steps taken (so far):

- Reset bios to default settings

- Ran Memtest86... RESULT: no errors.

- Checked PSU voltages... RESULT: nominal readings.

- Checked harddisk for bad sectors... RESULT: none found.

- Reinstalled Windows XP and updated device drivers with no positive effects...

Also, there are no heat issues as far as I can tell, nor are there any leaks or bulging capacitors on the motherboard itself. The system is already running the latest bios version. After nearly a week of troubleshooting, I suspect the CPU and/or motherboard is "dying"...

What are your thoughts?

Link to comment
Share on other sites


After my initial post, the computer seems to have behaved somewhat... I actually managed to shutdown the computer through the Start menu for a change. :P

I've worked with the GA-7N400 series boards before -- both PRO and entry level boards like this one. Never ran into any problems, up until now. With this particular machine, it certainly doesn't take a lot of effort to initiate a system crash... Playing a game of spider solitaire (with no other programs running) causes a hard freeze. :blink:

Something I forgot to mention earlier... While checking system temps, I noticed the CPU was running rather cool, around 42-46c. This is quite unusual for an Athlon XP 2600+ with "stock cooling". Idle temps should be in the 50-55c range. Perhaps the temp sensors have gone mad as well...

Anyways, here are the dump files as requested (16 in total).

Minidump.7z

Link to comment
Share on other sites

Well, you aren't likely to enjoy reading this analysis, but...

//From the very first dump, I saw something very odd:
kd> !thread
GetPointerFromAddress: unable to read from 8055fbd4
THREAD 817e26a0 Cid 0158.0184 Teb: 7ffd8000 Win32Thread: e1778be0 RUNNING on processor 0
IRP List:
Unable to read nt!_IRP @ 817ece20
Not impersonating
GetUlongFromAddress: unable to read from 8055fc6c
Owning Process 817afda0 Image: csrss.exe
Attached Process N/A Image: N/A
ffdf0000: Unable to get shared data
Wait Start TickCount 13394
Context Switch Count 5316 LargeStack
ReadMemory error: Cannot get nt!KeMaximumIncrement value.
UserTime 00:00:00.000
KernelTime 00:00:00.000
Start Address 0x75b67cdf
Stack Init f98a4000 Current f98a39c8 Base f98a4000 Limit f98a0000 Call 0
Priority 15 BasePriority 13 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr Args to Child
f98a3998 804f170c 00000100 806f02d0 817e2710 hal!KfLowerIrql+0x17 (FPO: [0,0,0])
f98a39d4 804ecae9 00000000 00000000 00000000 nt!KiDeliverApc+0x118 (FPO: [Non-Fpo]) (CONV: stdcall)
f98a39ec 804e3b7d 804e3a0d e1778be0 00000000 nt!KiSwapThread+0x64 (FPO: [0,0,0]) (CONV: fastcall)
f98a3a24 bf807aec 00000003 817c8af0 00000001 nt!KeWaitForMultipleObjects+0x284 (FPO: [Non-Fpo]) (CONV: stdcall)
f98a3a5c bf89b7c4 00000002 817c8af0 bf89e712 win32k!xxxMsgWaitForMultipleObjects+0xb0 (FPO: [Non-Fpo]) (CONV: stdcall)
f98a3d30 bf884773 bf9aae80 00000001 f98a3d54 win32k!xxxDesktopThread+0x339 (FPO: [Non-Fpo]) (CONV: stdcall)
f98a3d40 bf80110a bf9aae80 f98a3d64 0073fff4 win32k!xxxCreateSystemThreads+0x6a (FPO: [Non-Fpo]) (CONV: stdcall)
f98a3d54 804de7ec 00000000 00000022 00000000 win32k!NtUserCallOneParam+0x23 (FPO: [Non-Fpo]) (CONV: stdcall)
f98a3d54 7c90e4f4 00000000 00000022 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f98a3d64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
00000000 00000000 00000000 00000000 00000000 0x7c90e4f4

kd> r
Last set context:
eax=00000000 ebx=804e3a0d ecx=804ecae9 edx=f98a3a24 esi=804e3b7d edi=817e26a0
eip=00000001 esp=f98a3a0c ebp=e1778be0 iopl=1 nv up di pl nz ac pe cy
cs=3a18 ss=0010 ds=562d es=0000 fs=0000 gs=7c40 efl=bf80101d
3a18:00000001 ?? ???

// Note that the ESP and EBP registers look suspicious - they should be in range of the base limit thread address:
Base f98a4000 Limit f98a0000
esp=f98a3a0c ebp=e1778be0 <-!!

// A dps to the base address:
...
f98a3fc8 8181adf0
f98a3fcc 00000004
f98a3fd0 80562340 nt!ExWorkerQueue+0x80
f98a3fd4 00167398
f98a3fd8 00160168
f98a3fdc 00000000
f98a3fe0 00000000
f98a3fe4 00167398
f98a3fe8 00000040
f98a3fec 001673a0
f98a3ff0 00000000
f98a3ff4 00000000
f98a3ff8 00000000
f98a3ffc 00000000
f98a4000 ???????? <- Base address looks to be corrupt

If you've tested the memory, then this indicates a bad CPU or motherboard (or both) - this wouldn't happen if the hardware underneath windows didn't have some issue, and register issues are almost always CPU-related or motherboard-related problems...

Link to comment
Share on other sites

@cluberti,

Thanks for your analysis, I really appreciate it :) The analysis seems to have confirmed my initial suspicions...

If you've tested the memory, then this indicates a bad CPU or motherboard (or both) - this wouldn't happen if the hardware underneath windows didn't have some issue, and register issues are almost always CPU-related or motherboard-related problems...

I did test the memory, albeit for a couple of hours. Due to recurring system crashes, I figured if the memory was at fault, memtest86 would have detected errors in a short amount of time... I'll test the RAM again later today -- just to be sure. Only this time, I'll let it run for 12 hours.

Other interesting developments...

As I was using the machine today, it froze up again (no surprise there). After a forced shutdown and restart, I was greeted with:

NTLDR is missing ... 
Press Ctrl-Alt-Del to restart

WTF? So I fired up BartPE just to see what was going on... I couldn't access drive C: but was able to access the logical drive (D:). Ran chkdsk on drive C: with the following results:

CHKDSK.CMD: Starting...

Please enter the drive, mount point or volume name to check (for example "c:")..
.
Enter drive:c:

Do you want to fix errors on the disk *and*
locate bad sectors and recover readable information (Yes/No)...
Enter "y" or "n":n

Do you want to fix errors on the disk (Yes/No)...
Enter "y" or "n":y

----------------------------------------------------------------
You have specified to check drive/volume c:

With the following options:
- Fix errors on the disk
----------------------------------------------------------------

Start check disk? (Yes/No)...
Enter "y" or "n":y
Running: chkdsk.exe c: /f
The type of the file system is NTFS.

CHKDSK is verifying files (stage 1 of 3)...
File verification completed.
CHKDSK is verifying indexes (stage 2 of 3)...
Correcting error in index $I30 for file 5.
Correcting error in index $I30 for file 5.
Sorting index $I30 in file 5.
Index verification completed.
CHKDSK is recovering lost files.
Recovering orphaned file $MFT (0) into directory file 5.
Recovering orphaned file $MFTMirr (1) into directory file 5.
Recovering orphaned file $LogFile (2) into directory file 5.
Recovering orphaned file $Volume (3) into directory file 5.
Recovering orphaned file $AttrDef (4) into directory file 5.
Recovering orphaned file . (5) into directory file 5.
Recovering orphaned file $Bitmap (6) into directory file 5.
Recovering orphaned file $Boot (7) into directory file 5.
Recovering orphaned file $BadClus (8) into directory file 5.
Recovering orphaned file $Secure (9) into directory file 5.
Recovering orphaned file $UpCase (10) into directory file 5.
Recovering orphaned file $Extend (11) into directory file 5.
Recovering orphaned file SYSTEM~1 (27) into directory file 5.
Recovering orphaned file System Volume Information (27) into directory file 5.
Recovering orphaned file WINDOWS (30) into directory file 5.
Recovering orphaned file ntldr (3376) into directory file 5.
Recovering orphaned file NTDETECT.COM (3380) into directory file 5.
Recovering orphaned file boot.ini (3413) into directory file 5.
Recovering orphaned file PROFILES (3419) into directory file 5.
Recovering orphaned file VOLUMEID.EXE (3722) into directory file 5.
Recovering orphaned file PROGRA~1 (3732) into directory file 5.
Recovering orphaned file Program Files (3732) into directory file 5.
Recovering orphaned file RECYCLER (9223) into directory file 5.
Recovering orphaned file pagefile.sys (9245) into directory file 5.
Recovering orphaned file symstore (9318) into directory file 5.
CHKDSK is verifying security descriptors (stage 3 of 3)...
Security descriptor verification completed.
CHKDSK is verifying Usn Journal...
Usn Journal verification completed.
Correcting errors in the Master File Table (MFT) mirror.
Correcting errors in the Volume Bitmap.
Windows has made corrections to the file system.

15631213 KB total disk space.
2820732 KB in 11067 files.
3200 KB in 1470 indexes.
0 KB in bad sectors.
92901 KB in use by the system.
65536 KB occupied by the log file.
12714380 KB available on disk.

4096 bytes in each allocation unit.
3907803 total allocation units on disk.
3178595 allocation units available on disk.
Unable to obtain a handle to the event log.

CHKDSK.CMD: Check disk done...
Press any key to continue . . .

What else could go wrong?! lol. Luckily enough, chkdsk fixed the errors and I was able to get into windows...

Link to comment
Share on other sites

I've completed another round of memory tests over the weekend. Memtest86 had been running for almost 16 hours without a single error. Therefore, it's safe to assume the RAM modules aren't faulty. These tests were carried out using the default Front Side Bus (FSB) of 166Mhz (no overclocking). Note: I did not run any tests with dual channel enabled, as I felt it was unnecessary.

DRAM Frequency : By SPD   << Bios default setting
Memory timings : RAM 200Mhz (DDR400) / CAS : 2.5-3-3-8 / Single Channel (64-bits)

With all the problems I've encountered with this machine including the minidump analysis, I'm forced to conclude either the processor, motherboard or a combination thereof, has gone "bad".

Now I have to figure out how to explain this to the owner of the computer when I return it to him tomorrow.

Case closed.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...