Jump to content

Mr Snrub

Patron
  • Posts

    765
  • Joined

  • Last visited

  • Donations

    0.00 USD 
  • Country

    Sweden

Posts posted by Mr Snrub

  1. Did you test uninstalling Symantec AV?

    >> No I haven't tested that yet as I wanted to get to the root cause before attempt for any workaround..

    For easily reproducible issues it can be quicker to do simple "one at a time" tests, so considered part of root cause analysis (even if it rules the component out by the problem still being present without its presence).
    Yes, I agree that the nonpaged pool is exhausted through allocations to "Irp "

    Can you throw some light what exactly poinits to Symantec AV ?

    Experience :)
    The I/Os themselves are completed, but the pool allocations not freed, most likley due to some driver.

    >> can we determine exactly which drivers?

    Smarter people than me might be able to, but due to the way device and filter drivers work it's more of a "go with your gut" from me ;)
  2. Did you test uninstalling Symantec AV?

    The dump still has it loaded, with those modules from 2006 present...

    The pool tagging just confirms what we suspected - the nonpaged pool is exhausted through allocations to "Irp ", which is from I/O request packets.

    The I/Os themselves are completed, but the pool allocations not freed, most likley due to some driver.

    The I/Os also seem to be aimed at the various USB root hubs, which is why I also asked about any USB devices that may have been connected to the system recently.

    If I was a betting man, I would say it's Symantec AV causing the problem from the information we have so far - I would start by uninstalling that and watching the system for ~20 hours (the dumps so far seem to take 16-19 hours to get the point where they crash).

  3. I don't know how conclusive it is, but I tried launching \Windows\explorer.exe from my Vista x64 partition whilst booted into Windows 7 x64, and it just throws error 0xc0000142 immediately - I had no intention of trying to replace any system/shell DLLs to test further.

    Personally the taskbar in Windows 7 has really grown on me, and I no longer miss the quicklaunch bar.

  4. While we wait for the dump with pool tagging enabled...

    2) Can you explain in details what the below means? curious to know what those number indicates too.

    ========

    NonPagedPool Usage: 65534 ( 262136 Kb)

    NonPagedPool Max: 65536 ( 262144 Kb)

    ********** Excessive NonPaged Pool Usage *****

    ===========

    Nonpaged (or nonpageable) pool memory is for dynamic memory allocations in the kernel that cannot be paged out to disk - drivers have to use this pool for data that must be available at all times, as an page fault (request for a virtual page not resident in physical RAM, but in the page file on disk) is not allowed when they have control.

    This is the classic IRQL_NOT_LESS_THAN_OR_EQUAL bugcheck, if the driver developer makes this assumption.

    Because the nonpaged pool region has to take physical memory, and is a subset of the 2GB kernel space, its absolute maximum is capped at 256MB (but systems with less than ~768MB RAM, or using /3GB would have less than this as their limit).

    Because it is a finite system resource, once it is no longer required an allocation is meant to be returned to the pool by marking is as free.

    (The other, larger pool is paged pool - this is the same concept of dynamic memory allocations in the kernel, but these ones are non-critical data that we can put into the page file as needed to free physical memory.)

    What do you have in the way of USB devices connected to the machine?

    I ask because I had a poke around the nonpaged pool region to see if there are any clues, and saw a lot of Irps (I/O request packets), and so ran the !irpfind command to get a summary:

    1: kd> !irpfind
    unable to get large pool allocation table - either wrong symbols or pool tagging is disabled
    Searching NonPaged pool (827b6000 : 8a7b6000) for Tag: Irp?
    Irp [ Thread ] irpStack: (Mj,Mn) DevObj [Driver] MDL Process
    827b64a8 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    827b6b28 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    827b8008 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    827b83c0 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    827b8b20 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    827b9008 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    827b9d98 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    827bad98 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    827bb008 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    ...
    ffbddb28 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    ffbde008 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    ffbde3d8 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    ffbde648 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    ffbdeb28 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)
    ffbded98 [00000000] Irp is complete (CurrentLocation 4 > StackCount 3)

    There are 148,962 Irps listed in the output in total.

    Taking a look at the first in the list... !pool lets us confirm the allocation is from nonpaged pool and is an IRP, then !irp can give us some details on the I/O taking place, and !devstack lets us see the underlying device:

    1: kd> !pool 827b64a8 
    Pool page 827b64a8 region is Nonpaged pool
    827b6000 size: 270 previous size: 0 (Allocated) P_. (Protected)
    827b6270 size: 230 previous size: 270 (Free) ....
    *827b64a0 size: 270 previous size: 230 (Allocated) *Irp
    Pooltag Irp : Io, IRP packets
    827b6710 size: 270 previous size: 270 (Allocated) ..3. (Protected)
    827b6980 size: 1a0 previous size: 270 (Free) Attv
    827b6b20 size: 270 previous size: 1a0 (Allocated) Irp
    827b6d90 size: 270 previous size: 270 (Allocated) P_. (Protected)

    1: kd> !irp 827b64a8
    Irp is active with 3 stacks 4 is current (= 0x827b6584)
    No Mdl: No System Buffer: Thread 00000000: Irp is completed.
    cmd flg cl Device File Completion-Context
    [ 0, 0] 0 0 00000000 00000000 00000000-00000000
    Args: 00000000 00000000 00000000 00000000
    [ 0, 0] 0 0 00000000 00000000 00000000-00000000
    Args: 00000000 00000000 00000000 00000000
    [ f, 0] 0 0 89764618 00000000 bad750ac-89763748
    \Driver\usbuhci usbhub!USBH_FdoIdleNotificationRequestComplete
    Args: 00000000 00000000 00000000 00000000

    1: kd> !devstack 89764618
    !DevObj !DrvObj !DevExt ObjectName
    89763690 \Driver\usbhub 89763748 000000f6
    > 89764618 \Driver\usbuhci 897646d0 USBPDO-0
    !DevNode 89b2fa90 :
    DeviceInst is "USB\ROOT_HUB\4&56cb44e&0"
    ServiceName is "usbhub"

    I can see some processes that hint at something related to communications (USB, IrDA, Bluetooth):

    PROCESS 884f5020  SessionId: 0  Cid: 0554	Peb: 7ffd9000  ParentCid: 0400
    DirBase: 2f333000 ObjectTable: e15bd2e8 HandleCount: 62.
    Image: btwdins.exe

    PROCESS 88043430 SessionId: 0 Cid: 0c84 Peb: 7ffdf000 ParentCid: 04d4
    DirBase: 3dc31000 ObjectTable: e7f42c78 HandleCount: 235.
    Image: BTSTAC~1.EXE

    PROCESS facf5020 SessionId: 0 Cid: 1908 Peb: 7ffde000 ParentCid: 1560
    DirBase: 5d729000 ObjectTable: e8e88850 HandleCount: 67.
    Image: NclUSBSrv.exe

    PROCESS fa91c8c0 SessionId: 0 Cid: 1954 Peb: 7ffd9000 ParentCid: 1560
    DirBase: 47e38000 ObjectTable: e16a4260 HandleCount: 145.
    Image: NclBCBTSrv.exe

    PROCESS f9f7c020 SessionId: 0 Cid: 1708 Peb: 7ffd8000 ParentCid: 1560
    DirBase: 7ed19000 ObjectTable: e17ba830 HandleCount: 47.
    Image: NclIrSrv.exe

    PROCESS facf0020 SessionId: 0 Cid: 1504 Peb: 7ffdf000 ParentCid: 1560
    DirBase: 46b65000 ObjectTable: e67a6b60 HandleCount: 45.
    Image: NclRSSrv.exe

    And then there's always AV to consider:

    a6c30000 a6c441e0   naveng   \??\C:\PROGRA~1\COMMON~1\SYMANT~1\VIRUSD~1\20090705.003\naveng.sys
    a6c45000 a6d19440 navex15 \??\C:\PROGRA~1\COMMON~1\SYMANT~1\VIRUSD~1\20090705.003\navex15.sys
    a9c23000 a9c40000 EraserUtilRebootDrv \??\C:\Program Files\Common Files\Symantec Shared\EENGINE\EraserUtilRebootDrv.sys
    a9c40000 a9c9e000 eeCtrl \??\C:\Program Files\Common Files\Symantec Shared\EENGINE\eeCtrl.sys
    a9d60000 a9dc2000 SPBBCDrv \??\C:\Program Files\Common Files\Symantec Shared\SPBBC\SPBBCDrv.sys
    a9e2c000 a9e6e000 symidsco \??\C:\PROGRA~1\COMMON~1\SYMANT~1\SymcData\SCFIDS~1\20090625.001\symidsco.sys
    a9e6e000 a9e97000 SYMFW \SystemRoot\System32\Drivers\SYMFW.SYS
    aa19a000 aa1ae000 Savrtpel \??\C:\Program Files\Symantec Client Security\Symantec AntiVirus\Savrtpel.sys
    aa1ae000 aa1d0000 SYMEVENT \??\C:\Program Files\Symantec\SYMEVENT.SYS
    aa1d0000 aa228000 savrt \??\C:\Program Files\Symantec Client Security\Symantec AntiVirus\savrt.sys

    First rule of troubleshooting a new problem - did you change or install anything recently?

    In particular anything related to USB, bluetooth or chipset drivers?

    Maybe mobile phone sync software, or even fingerprint scanner drivers?

    Secondly, try to reduce the problem to its bare minimum - is there a particular piece of software that causes the problem to occur?

    Whilst running without AV is not a long-term solution, it's a valid test for problems that occur routinely - I would uninstall the Symantec software and see if the symptom disappears (note: disabling is not the same as uninstalling, the kernel drivers are still present and get involved in I/O).

  5. NonPagedPool Usage: 65534 ( 262136 Kb)

    NonPagedPool Max: 65536 ( 262144 Kb)

    ********** Excessive NonPaged Pool Usage *****

    ********** 19498 pool allocations have failed **********

    Nonpaged pool totally exhausted, something has leaked.

    The output from !poolused 7 will be long - it is sorted in descending order in nonpaged bytes, so the first few lines are the most interesting.

    This will give a clue as to the pooltags used for the allocations, and maybe a direct indicator as to who might have made them.

    AV filter drivers are common leakers of pool memory - what AV do you have installed?

    My comment on SP3 was intended as: "why isn't SP3 installed?" ;)

  6. I think given the speed of Vista installation on a newly-created partition, it's a quick format used in that GUI stage.

    When I use a completely brand-spanking-new hard disk and first partition it, I tend to do a full format, and that's the only time I do.

    During Vista or Win7 setup, at the partition selection/setup stage, I hit Shift-F10 to get the command prompt up and use format from there before selecting the target partition.

    Newly created partitions on a brand new disk - full format.

    Existing partition which contains data - quick format.

  7. Actually, it might be some system resource getting exhausted... as you found, csrss.exe was the critical process that got killed:

    CRITICAL_OBJECT_TERMINATION (f4)

    A process or thread crucial to system operation has unexpectedly exited or been terminated.

    Several processes and threads are necessary for the operation of the system; when they are terminated (for any reason), the system can no longer function.

    Arguments:

    Arg1: 00000003, Process

    Arg2: 88575da0, Terminating object

    Arg3: 88575f14, Process image file name

    Arg4: 80604528, Explanatory message (ascii)

    PROCESS_OBJECT: 88575da0

    1: kd> !process ffffffff88575da0 3

    PROCESS 88575da0 SessionId: 0 Cid: 03bc Peb: 7ffd8000 ParentCid: 038c

    DirBase: 20fd0000 ObjectTable: e194ee90 HandleCount: 996.

    Image: csrss.exe

    The line I think of interest, and its breakdown:

    Inpage operation failed at 75b7b399, due to I/O error c000009a

    EXCEPTION_CODE: (NTSTATUS) 0xc0000006 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The required data was not placed into memory because of an I/O error status of "0x%08lx".

    IO_ERROR: (NTSTATUS) 0xc000009a - Insufficient system resources exist to complete the API.

    And the "failed at" address is the module address in the thread that raised the exception (the process, csrss.exe):

    STACK_TEXT:

    a92be520 80634281 000000f4 00000003 88575da0 nt!KeBugCheckEx+0x1b

    a92be544 806044e6 80604528 88575da0 88575f14 nt!PspCatchCriticalBreak+0x75

    a92be574 804dd99f 88575fe8 c0000006 a92be9b0 nt!NtTerminateProcess+0x7d

    a92be574 804e46a7 88575fe8 c0000006 a92be9b0 nt!KiFastCallEntry+0xfc

    a92be5f4 80522128 ffffffff c0000006 a92be9f8 nt!ZwTerminateProcess+0x11

    a92be9b0 80505460 a92be9d8 00000000 a92bed64 nt!KiDispatchException+0x3a0

    a92bed34 804e12a8 0375fbe8 0375fc08 00000000 nt!KiRaiseException+0x175

    a92bed50 804dd99f 0375fbe8 0375fc08 00000000 nt!NtRaiseException+0x33

    a92bed50 75b7b399 0375fbe8 0375fc08 00000000 nt!KiFastCallEntry+0xfc

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    0375fff4 00000000 00000000 00000000 00000000 0x75b7b399

    I would guess the page in the virtual address space for csrss.exe was paged out to disk, then at some point a context switch occurred to continue executing which incurred the inpage operation - but when pulling the data from disk the I/O failed, making the thread go boom, which terminates the process, and it was a critical process so we bugcheck.

    Most commonly in my experience the cause of failing inpage operations is a disk or disk controller failure (the device suddenly vanishes from the system), sometimes due to a driver fault or an I/O mode setting in the BIOS (e.g. AHCI being used)... however here there is the extra bit of info "Insufficient system resources exist to complete the API".

    The output from !vm might be useful, to see if it's pool memory or PTE shortage - of course there's a chance it could be a bogus status code if the origin is a dodgy CPU or heat related...

    Not running SP3?

  8. Thanks for pointing that out!

    Been a while since I cracked the Russinovich book. Got to make some time (sigh).

    No worries, I see this a lot due to the unfortunate naming.

    OT - you know the 5th Edition of Windows Internals is out now, covering NT 6? Waiting for my copy to arrive :)

  9. I am pretty sure that modifying the registry branches for the .default user will only affect new users that are created later (after such registry modifications are entered).
    The .DEFAULT key under HKEY_USERS is actually used by the Local System user account, it has nothing to do with interactive or default users.

    The NTUSER.DAT in the Default User profile (on disk, not in the registry) is the template user profile registry hive used when users log on for the first time.

  10. There is a reference to AntiVir in there too (though not in the list of running processes at the time of the report - maybe tested and removed?).

    If you have more than one of Prevx, Avast! and AntiVir in the Add/Remove Programs list, uninstall (don't just disable) all but 1 to ensure the kernel filter drivers are not loaded.

    If the problem continues after a reboot, use Process Explorer to hover the mouse over the svchost.exe with high CPU utilization and the tooltip will show the services hosted by the process.

    Make a note of the list of services, then from an elevated command prompt you can enter the following (where XXX is the service name):

    sc config XXX type= own

    When you restart the service it will now create a separate svchost.exe for it - now you can track the CPU time back to an individual service in Process Explorer.

    To restore a service back its default state, enter (in an elevated command prompt again):

    sc config XXX type= shared

  11. @Glen9999:

    By far the best mitigation you have already mentioned - running as a standard user rather than Administrator.

    The vast majority of malicious activity in my experience has been through social engineering and users not understanding the implications of clicking flashy things on the screen - reduce the user's power and the system becomes more secure implicitly.

    This has much more value when NTFS is used as the file system, otherwise there is no way to protect the OS files from any user able to log on (I've not seen first-hand any malware employing alternate data streams or locking down ACLs that the user could not unlock that would warrant using no form of protection on the file system).

    Deploying a client behind a NAT router (basically any home broadband router on the market these days) should provide protection against drive-by scans, but it's still worth having the Windows Firewall service running as it's so lightweight.

    Reading between the lines it looks like you may be setting up a PC for a not-so-IT-literate person and want to keep the system ticking over by itself - I would enable Automatic Updates to install hotfixes as it detects them, and have an AV product with realtime scanning and automatic updates (set up for a weekly full system scan too).

    "Security is the enemy of useability" a colleague of mine loves to cite frequently, so it depends on how far you want to go protect the system from the user - if there are USB ports present and there will never be any USB devices connected, you can consider disabling them in the BIOS to cover another potential back door, for example.

    Automating cleaning of temp files can be dangerous, due to how they may be present during the lifetime of an application, or until they are cleaned up after a reboot - if you clean out *.TMP, for example, on a scheduled basis then you may run into a problem only after restarting (typically this can be seen for anything doing a self update).

    Teaching the user how to make backups could be useful too - a system restore to a known good point in time can be much quicker than a reinstall of the OS and all the apps (though this is more a "reactive recovery" point in the event the system has been compromised or become unstable).

    @JustinStacey:

    The DNS Client service is the DNS name resolution cache, it's not a listening service - just curious as to what security hardening this achieves?

    Also, the Client for Microsoft Networks is the plumbing of the Workstation service on a per-interface basis, so it's necessary for outbound SMB and disabling this would break the machine's ability to browse other machines, if there are any.

    The File and Printer Sharing setting is the per-interface SMB plumbing for the Server service, so I agree it can be useful to disable this if you don't share resources on the LAN.

  12. let me rephrase that: explorer is back, but I cannot enter any of my drives... :(

    (though I can now see the drive's properties)

    By "explorer" do you mean the desktop & icons, or you can now successfully start an explorer.exe process and get a window up, or you can click on your user name from the Start menu and actually get a window up in which you can navigate between folders?

    And when you say you cannot enter any of your drives but can see the properties (through the right-click context menu?), do you get any error when you double-click on a drive letter, or does nothing happen, or does the window process hang?

  13. I already tried that when I activated the Administrator account - it took a few minutes to create the desktop, but it was still the same.

    but now I've tried again, with a new test user, the Explorer is back!!!!

    Creating a new user account is always a useful test, as the Administrator account is already present, only disabled.

    Having a new, never-logged-on-before and non-well-known-GUID user account log on is a useful method of determining between a user profile and a system issue.

    So is the desktop back for all users, or just when you log on as this new test user?

  14. Can you create a new user account and log on as that user to verify if they have the same problem?

    Also, did I read correctly that if you boot from the Vista DVD you get nothing but the blue-ish background and a mouse pointer, you don't even get any menu at all?

  15. It sounds like one or more of the many shell DLLs has a problem, or maybe some shell extension - did you do any kind of "takeown" under %systemroot%, or clean up of the WinSxS folder at some point in the past?

    Was the installation vLite'd?

    Or was there some custom/unattended installation used? I see a fair bit of file recovery being done in your Component Based Servicing log...

  16. Is there anything special about the location of gallery.exe?

    Is it on a removable drive, a UNC path, a folder with a custom ACL?

    I would test by renaming gallery.exe to temp.exe, then copy notepad.exe to where gallery.exe was and go through the test again - if the second Notepad entry appears then it would appear to be something up with gallery.exe itself (and you could afterwards close the Notepad-that-is-gallery.exe, delete it and rename temp.exe back to gallery.exe).

  17. essentially i need a command line command that will run a batch file as administrator (or with admin rites) without the need to know user name or password
    As would every malware author out there, I bet.

    So the logged-on user isn't an admin, and each of the programs called from the batch file is attempting to run elevated (hence the multiple prompts) from a batch file that was not launched elevated... this is expected behaviour.

    The only way I would expect a user to be able to select such an option and not have privilege issues would be if a call was being made to a service running under a privileged user account - then the idea would be that the service does the job on behalf of the user (hopefully in a secured, non-exploitable manner).

    Similar to how AV interfaces allow a regular user to "clean" an infection when they would not have the explicit privilege to access the file.

    What exactly are these programs trying to do with the contents of the Recycle Bin?

  18. When I double click the .gal file, Windows does prompt me to select a program to open it. However, after clicking the browse button and find the Gallery.exe, after selecting it and double clicking it, nothing happened. It just returns me to the previous dialog box, asking me to select a program again.
    Admittedly I'm doing this test on Win7, not Vista, but the principle should be the same...

    I created a file on my desktop named test.gal, then copied Notepad.exe to gallery.exe on my desktop.

    I double-clicked test.gal and got prompted with the following:

    Windows can't open this file:

    File: test.gal

    To open this file, Windows needs to know what program you want to use to open it. Windows can go online to look it up automatically, or you can manually select from a list of programs that are installed on your computer.

    What do you want to do?

    (x) Use the Web service to find the correct program

    ( ) Select a program from a list of installed programs

    I selected the second option and clicked OK, then I got the "Open with" dialogue window:

    Choose the program you want to use to open this file:

    File: test.gal

    HHD Software Hex Editor Neo (x64)

    Internet Explorer

    IrfanView

    Notepad

    Paint

    Rights Management Add-on for Internet Explorer

    Windows Media Center

    Windows Media Player

    Windows Photo Viewer

    WordPad

    Type a description that you want to use for this kind of file:

    [x] Always use the selected program to open this kind of file

    [browse]

    (The list of programs is built from apps registered in Windows through an installer, so it's unlikely to be the same on any 2 machines.)

    I clicked the Browse button and navigated to my Desktop folder, then double-clicked on the gallery.exe icon - I was returned to the previous "Open with" dialogue now with th additional icon selected (it had the icon and name for "Notepad" as this comes out of the PE header, not the exectuable name).

    I clicked OK and the icon for the file on my desktop changed to that of Notepad, indicating the new association had worked.

    I double-clicked the file and it opened in Notepad.

    I ran Task Manager and on the Processes tab I verified the command line for the process was C:\Users\{user}\Desktop\gallery.exe (not C:\Windows\Notepad.exe).

    So when you go through the process of selecting gallery.exe and get returned to the "Open with" dialogue, it hasn't added Gallery to that list and selected it for you?

  19. Bit of a change from August 2008, now looks like this:

    computer-network-091006.png

    Main machines for myself and my wife were upgraded to Core i7 w/12GB - my wife likes to play with rendering in Poser, and I play with games and virtual machines so we get the use of the RAM.

    My wife's old machine now acts as the server, Virtual Server replaced with Hyper-V and the web server finally migrated from a virtual W2K3 SP2 to virtual W2K8 SP2.

    With the higher-spec, Hyper-V capable server it made sense to set up a domain and run the DC as 1 virtual machine and a separate virtual machine for a file (and Squeezebox) server.

    The host machine is still standalone, but the VMs and clients 1 & 2 are now domain-joined (makes life easier for roaming profiles & folder redirection when testing builds of Windows 7).

    Client3 will likely end up being a pet project for having clustered Hyper-V hosts for highly available VMs, though I'd need to figure out something for the iSCSI targets...

    Edit:

    6th October - upgraded Internet connection to 50/10, changed details for servers upgraded from W2K8 SP2 to W2K8 R2

  20. By the way, can't I do an association with registry twist if it does not support command line parameters?
    Well poking the registry directly should be a last resort - it's only a database which has APIs specifically to update it in a controlled manner.

    If you double-click a .gal file, do you get prompted to select a program with which to open the file?

    If nothing at all happens, do you have and "Open with" option if you right-click the file?

×
×
  • Create New...