Jump to content

[Help] - WindowsXP random reboots.


Synomenon

Recommended Posts


you sure that xp is not set to restart on system errors ? (in system properties / startup and recovery section)
The system is bugchecking and auto-restarting - this was apparent from the SaveDump event reported earlier.

There are also many mindumps being created and they are not consistent with their STOP error codes, which usually indicates a hardware problem, here are the last few dump analysis summaries:

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M (1000007e)

This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem.

Always note this address as well as the link date of the driver/image that contains this address.

Some common problems are exception code 0x80000003. This means a hardcoded breakpoint or assertion was hit, but this system was booted /NODEBUG.

This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ...

If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening.

Arguments:

Arg1: c0000005, The exception code that was not handled

Arg2: 80564dce, The address that the exception occurred at

Arg3: f7b6eb2c, Exception Record Address

Arg4: f7b6e828, Context Record Address

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

CUSTOMER_CRASH_COUNT: 4

SYMBOL_NAME: NDIS!ndisWorkerThread+4b

IMAGE_NAME: NDIS.sys

----------

UNEXPECTED_KERNEL_MODE_TRAP_M (1000007f)

This means a trap occurred in kernel mode, and it's a trap of a kind that the kernel isn't allowed to have/catch (bound trap) or that is always instant death (double fault).

The first number in the bugcheck params is the number of the trap (8 = double fault, etc)

Consult an Intel x86 family manual to learn more about what these traps are. Here is a *portion* of those codes:

If kv shows a taskGate

use .tss on the part before the colon, then kv.

Else if kv shows a trapframe

use .trap on that value

Else

.trap on the appropriate frame will show where the trap was taken

(on x86, this will be the ebp that goes with the procedure KiTrap)

Endif

kb will then show the corrected stack.

Arguments:

Arg1: 00000008, EXCEPTION_DOUBLE_FAULT

Arg2: 80042000

Arg3: 00000000

Arg4: 00000000

CUSTOMER_CRASH_COUNT: 3

SYMBOL_NAME: ati2cqag+1d0a8

IMAGE_NAME: ati2cqag.dll

----------

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high.

This is usually caused by drivers using improper addresses.

If kernel debugger is available get stack backtrace.

Arguments:

Arg1: ccc35d5e, memory referenced

Arg2: 00000007, IRQL

Arg3: 00000000, value 0 = read operation, 1 = write operation

Arg4: ccc35d5e, address which referenced memory

CUSTOMER_CRASH_COUNT: 2

IMAGE_NAME: memory_corruption

----------

IRQL_NOT_LESS_OR_EQUAL (a)

An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high.

This is usually caused by drivers using improper addresses.

If a kernel debugger is available get the stack backtrace.

Arguments:

Arg1: ffffffef, memory referenced

Arg2: 000000ff, IRQL

Arg3: 00000001, value 0 = read operation, 1 = write operation

Arg4: 804e0417, address which referenced memory

CUSTOMER_CRASH_COUNT: 1

SYMBOL_NAME: win32k!EnterCrit+21

IMAGE_NAME: win32k.sys

----------

BAD_POOL_CALLER (c2)

The current thread is making a bad pool request. Typically this is at a bad IRQL level or double freeing the same allocation, etc.

Arguments:

Arg1: 00000007, Attempt to free pool which was already freed

Arg2: 00000cd4, (reserved)

Arg3: 01010101, Memory contents of the pool block

Arg4: 806ec6b8, Address of the block of pool being deallocated

CUSTOMER_CRASH_COUNT: 5

SYMBOL_NAME: USBSTOR!USBSTOR_IssueBulkOrInterruptRequest+9c

IMAGE_NAME: USBSTOR.SYS

----------

PAGE_FAULT_IN_NONPAGED_AREA (50)

Invalid system memory was referenced. This cannot be protected by try-except, it must be protected by a Probe.

Typically the address is just plain bad or it is pointing at freed memory.

Arguments:

Arg1: 9a8b4435, memory referenced.

Arg2: 00000000, value 0 = read operation, 1 = write operation.

Arg3: bf814678, If non-zero, the instruction address which referenced the bad memory address.

Arg4: 00000000, (reserved)

CUSTOMER_CRASH_COUNT: 3

SYMBOL_NAME: win32k!RGNOBJ::iCombine+37

IMAGE_NAME: win32k.sys

----------

KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e)

This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem.

Always note this address as well as the link date of the driver/image that contains this address.

Some common problems are exception code 0x80000003. This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG.

This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ...

If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening.

Arguments:

Arg1: c0000005, The exception code that was not handled

Arg2: 80564dce, The address that the exception occurred at

Arg3: f74e4af8, Trap Frame

Arg4: 00000000

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

CUSTOMER_CRASH_COUNT: 2

SYMBOL_NAME: nt!ObGetObjectSecurity+34

IMAGE_NAME: ntoskrnl.exe

----------

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high.

This is usually caused by drivers using improper addresses.

If kernel debugger is available get stack backtrace.

Arguments:

Arg1: 00000033, memory referenced

Arg2: 00000005, IRQL

Arg3: 00000000, value 0 = read operation, 1 = write operation

Arg4: f762b5ea, address which referenced memory

CUSTOMER_CRASH_COUNT: 1

SYMBOL_NAME: atapi!IdePortInterrupt+a

IMAGE_NAME: atapi.sys

Driver faults are usually consistent in the bugchecks, creating 1 or 2 different STOP codes - but even the 2 0xD1 bugchecks above are not consistent in their stack traces.
According to the spec link you gave it's 180 watts? That's a bit on the small side... even small PCs should have at least 200W.
Integrated GPU, NIC & audio and high quality PSUs in Shuttles usually mean there isn't a problem providing enough juice - I ran an SN41G2 system with a 9800 Pro AGP card, a 1Gbps NIC, a DVD-RW and 2x HDDs without a problem on the stock PSU.

From what I understand, the problem with cheap PSUs isn't their maximum wattage but their ability to provide a "clean" input without loads of fluctuation.

Heat, however, can be an issue with Shuttles - though poorly mounted CPU heatsinks usually mean the system refuses to even present the POST screen (I have had this a couple of times).

@IsLNdbOi:

What other hardware is installed inside the Shuttle?

What do you have for RAM (amount, number of DIMMs & type), CPU, HDD(s) & CD/DVD?

Anything plugged in the PCI slot?

I suspect faulty RAM, CPU or mainboard - but I would install and run Motherboard Monitor Lite and log all temperature sensors and fan speeds - it could be a chipset fan failure or something.

Check the fans are also clear of dust and check the fans all spin up correctly - chipset(s) & ICE.

Link to comment
Share on other sites

These are all the components I put in the ST62K:

INTEL CELERON D 345 3.06GHZ W/256KB CACHE 533MHZ 478-PIN RETAIL BOXED W/COOLING FAN

http://www.mwave.com/mwave/viewspec.hmx?scriteria=BA21471

KINGSTON KVR400X64C3AK2/1G 2X512MB (MATCH PAIR) PC3200 400MHZ CL3 (3-3-3) DDR DIMM

http://www.mwave.com/mwave/viewspec.hmx?scriteria=BA19405

WD 80GB WD800JB EIDE UATA 66-100 8.9MS 7200RPM 8MB (Bare drive)

http://www.mwave.com/mwave/viewspec.hmx?scriteria=AA17640

LINKSYS WIRELESS-G WMP54GS 802.11G WIRELESS PCI ADAPTER W/SPEEDBOOSTER

http://www.mwave.com/mwave/viewspec.hmx?scriteria=3420198

NEC ND3520 DVD+-R/RW burner

http://www.mwave.com/mwave/viewspec.hmx?scriteria=AA35710

Could these random reboot problems be caused by incorrect application of thermal compound on the CPU?

Edited by IsLNdbOi
Link to comment
Share on other sites

Thermal compound I've heard only a couple of people have issues with, from applying too much paste - but this would imply a heat issue anyway so something that Motherboard Monitor Lite should show you.

Did you use the fan that came with the CPU, or the ICE heatpipe from the Shuttle?

I would take the following troubleshooting approach:

- verify the fans in the system are operational

- install MBM Lite and configure it to show:

> fan speeds

> temperatures

> voltages

If the problem does not appear to be rising temperatures, failing fans, or fluctuating power on the rails then I would start stripping the system to the bare minimum and see if the problem is still present.

Take out the wifi card, 1 of the DIMMs and the DVD-RW drive - if the problem persists still, swap the DIMM with the one you removed.

This will let you rule out (or identify) one of the components if the system is stable or not.

Link to comment
Share on other sites

I'm using the ICE heatpipe that came with the Shuttle as it cools better than the stock Celeron D heatsink. As soon as I get a chance, I'll install Prime95 and MBM Lite to check the temps, voltages and fan speeds.

The reason I asked about the thermal paste thing is that I may have applied too much Arctic Silver 5.

Edited by IsLNdbOi
Link to comment
Share on other sites

The reason I asked about the thermal paste thing is that I may have applied too much Arctic Silver 5.
Intel CPUs don't have anything to short out on the top of the package. Unless you really did put so much that it ran down the sides and onto the mobo, there should be no problem.

All those random bugchecks do point to a hardware problem though... software problems usually don't cause such a wide variety of errors.

Edited by LLXX
Link to comment
Share on other sites

And CPU problems generally don't produce bugchecks like a memory error would - usually you get complete shutdowns or lockups if the CPU has caused the issue (not always, but commonly). I'd say a good memory test is in order here - the only thing in common about all of the errors is that they seem to occur in processes that would run in kernel nonpaged pool, which would have it's virtual address space mapped to physical RAM at all times.

Edited by cluberti
Link to comment
Share on other sites

And CPU problems generally don't produce bugchecks like a memory error would - usually you get complete shutdowns or lockups if the CPU has caused the issue (not always, but commonly). I'd say a good memory test is in order here - the only thing in common about all of the errors is that they seem to occur in processes that would run in kernel nonpaged pool, which would have it's virtual address space mapped to physical RAM at all times.

Actually, BSODs can occur if the motherboard has a Vcore issue.

Link to comment
Share on other sites

Another thing to try - which I'm sure you have done - is check for a updated bios.

Something I had to do in order for a Shuttle I had to "calm down". I also changed the RAM [to kingston ValueRam from some other cheap brand].

Check their site for bios updates I'd strongly advise :)

Also, is the target machine capable of supporting the CPU you have? [in a HSF sense also that is] and have you got the momory timings as relaxed as possible?

Regards,

N.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...