nmX.Memnoch Posted April 25, 2006 Share Posted April 25, 2006 We've been having an issue that I hope some of you can help me figure out...We have two servers setup in a cluster running Windows Server 2003, Enterprise Edition SP1 and SQL Server 2000 SP4. These servers were originally running Windows 2000 Advanced Server SP4, but were upgraded to Windows Server 2003, Enterprise Edition using the rolling upgrade method.Server Clusters: Rolling Upgrades. Upgrading to Windows Server 2003http://www.microsoft.com/technet/prodtechn...g/rllupnet.mspxhttp://technet2.microsoft.com/WindowsServe...3bea3a1033.mspxUpgrading to Windows Server 2003, Enterprise Edition, and Windows Server 2003, Datacenter Edition on Cluster Nodeshttp://technet2.microsoft.com/WindowsServe...a406de1033.mspxWe went through an "audit" of sorts last year where we basically have to certify and document that the security settings meet the policy requirements. During this "audit" if anything was found non-compliant we had to either fix it or document why we couldn't fix it. We started this process with Windows 2000 Advanced Server so I do not remember if this issue appeared before or after the upgrade (although I'm pretty sure it was before). Anyway, on to the actual problem.When I try to make a remote connection to the cluster using Cluster Adminstrator I recieve the following error:An error occurred attempting to open cluster node "XXXXXXXX".Not enough server storage is available to process this command.Error ID: 1130 (0000046a)If anyone has any insight please share it. If you require more information please let me know and I'll provide what I can. Link to comment Share on other sites More sharing options...
fizban2 Posted April 26, 2006 Share Posted April 26, 2006 no luck, only found issues with older NT issues, seen others have that issue, but never saw a resolution for it... have you tried the KB articles that apply to the error. i know the state for NT and 2000 but it might help, i will try and find the KB articles i saw againis there a backup client being used on these machines? backup exec etc?also what AV do you guys run? Norton? mcAfee? Link to comment Share on other sites More sharing options...
nmX.Memnoch Posted April 26, 2006 Author Share Posted April 26, 2006 seen others have that issue, but never saw a resolution for it...That's pretty much the same result I had...have you tried the KB articles that apply to the error.I don't think I ever found any.is there a backup client being used on these machines? backup exec etc?Nope...just Server 2003, Enterprise Edition SP1, SQL Server 2000 SP4, Office XP SP3 and a few other things apps that were pre-existing to the problem.We are, however, also getting a ton of Event 537 and 560 Failure Audits though.Event Type: Failure AuditEvent Source: SecurityEvent Category: Logon/Logoff Event ID: 537Date: 4/26/2006Time: 10:06:48 AMUser: NT AUTHORITY\SYSTEMComputer: XXXXXXXXXXDescription:Logon Failure: Reason: An error occurred during logon User Name: Domain: Logon Type: 3 Logon Process: ÐùWD Authentication Package: NTLM Workstation Name: Status code: 0x80090302 Substatus code: 0x0 Caller User Name: - Caller Domain: - Caller Logon ID: - Caller Process ID: - Transited Services: - Source Network Address: - Source Port: -Event Type: Failure AuditEvent Source: SecurityEvent Category: Object Access Event ID: 560Date: 4/26/2006Time: 10:06:35 AMUser: NT AUTHORITY\SYSTEMComputer: XXXXXXXXXXXXDescription:Object Open: Object Server: Security Object Type: Key Object Name: \REGISTRY\USER\.DEFAULT Handle ID: - Operation ID: {0,568296711} Process ID: 1396 Image File Name: C:\WINNT\system32\svchost.exe Primary User Name: XXXXXXXXXXXX$ Primary Domain: XXXXXXXX Primary Logon ID: (0x0,0x3E7) Client User Name: XXXXXXXXXXXX$ Client Domain: XXXXXXXX Client Logon ID: (0x0,0x3E7) Accesses: MAX_ALLOWED Privileges: - Restricted Sid Count: 0 Access Mask: 0x2000000 Link to comment Share on other sites More sharing options...
cluberti Posted April 26, 2006 Share Posted April 26, 2006 Usually, when you get an 1130 or 1450, you're out of kernel nonpaged pool or kernel paged pool memory on one of the nodes (or you have some, but not enough available to service the request). I've got a poolmon utility script you can run to get that information from these machines - PM me and I'll provide it for you (it's much better than poolmon.exe, before anyone complains that you can just run poolmon.exe ). Link to comment Share on other sites More sharing options...
nmX.Memnoch Posted April 26, 2006 Author Share Posted April 26, 2006 PM sent. After reading what you said about the kernel paged pool memory I have a feeling that the /3GB switch may be doing it. The servers have 8GB RAM each so I need the /PAE switch, but I keep forgetting to remove the /3GB switch. I honestly don't remember if the issue appeared before or after I added the /3GB for testing. It didn't "seem" to cause any problems so I wasn't in any real rush to remove it. Link to comment Share on other sites More sharing options...
cluberti Posted April 26, 2006 Share Posted April 26, 2006 The use of /3GB means you have 50% less kernel paged and nonpaged pool memory available from the start, and if you do a lot of heavy lifting via SQL, you'll find yourself running out of one (or both) relatively quickly. Since SQL is pretty good at it's own memory management your servers haven't crashed or hung with 2019s or 2020s, but you're probably dangerously close all of the time . Link to comment Share on other sites More sharing options...
nmX.Memnoch Posted April 27, 2006 Author Share Posted April 27, 2006 I'll edit the boot.ini tomorrow and reboot the servers this weekend.Do you think it'd be worthwhile to run the poolmon script before and after? I may do it anyway if for no other reason than my own education. Link to comment Share on other sites More sharing options...
cluberti Posted April 27, 2006 Share Posted April 27, 2006 Yes, especially before and after - the error should go away when you remove /3GB, but it'd be wise to have the numbers just in case.I've heard so many differing opinions regarding /3GB and /PAE when running a SQL server, and I've heard some say you need /3GB below 16GB of RAM, some say you don't need /3GB at all, etc. It's true that you may need it if you do find your SQL processes running out of their default 2GB process space, but otherwise /3GB shouldn't be used on a SQL server (especially if no other applications on the server can take advantage of the additional 1GB of process space) due to the kernel memory restrictions, which can be exacerbated in a cluster environment which requires more kernel memory to run efficiently.Ultimately, my stance is that I would not recommend use of the /3GB switch with /PAE because of the limitation of kernel mode memory with the /3GB switch, and the use of /3GB with /PAE should only be used when absolutely necessary. With SQL 2005 64bit and Windows Server 2003 x64, it is now much more efficient to upgrade to a 64bit database product on a 64bit OS than to fight with the limitations of the 32bit architecture . This will also apply to Exchange 12 when it's released as well - we've reached a wall with 32bit architectures and the way Windows is able to manage memory, and the only real option to make memory-hungry databases and email servers (etc) is to go to 64bit. Even SQL 2000 SP4 on a Server 2003 x64 box gets the benefit of all 4GB of process space, due to it not being split into 2GB/2GB or 3GB/1GB as with 32bit Windows. Link to comment Share on other sites More sharing options...
nmX.Memnoch Posted April 27, 2006 Author Share Posted April 27, 2006 Yes, especially before and after - the error should go away when you remove /3GB, but it'd be wise to have the numbers just in case.Sounds like a plan. I've heard so many differing opinions regarding /3GB and /PAE when running a SQL server, and I've heard some say you need /3GB below 16GB of RAM, some say you don't need /3GB at all, etc.Same here...which is why I wanted to test it for myself. We're about to upgrade both nodes to 16GB anyway.Ultimately, my stance is that I would not recommend use of the /3GB switch with /PAE because of the limitation of kernel mode memory with the /3GB switch, and the use of /3GB with /PAE should only be used when absolutely necessary.After learning more about /3GB over the last few months I'd have to say I agree with that completely.With SQL 2005 64bit and Windows Server 2003 x64, it is now much more efficient to upgrade to a 64bit database product on a 64bit OS than to fight with the limitations of the 32bit architecture .We have plans to make that move about this time next year (no funding this year). We're doing a complete cluster replacement to either a pair of quad-socket dual-core 64-bit systems or quad-socket quad-core 64-bit systems...depending on what's out and costs. I've also talked them into going fiber channel with the drives. They're starting to see I/O errors on one of the drive sets but that's just because they're processing too much on the one drive set. We're adding some drives and moving processes around to bandage up that problem. Link to comment Share on other sites More sharing options...
cluberti Posted April 27, 2006 Share Posted April 27, 2006 SCSI clusters were the way of NT4.0, and with the advances in HBA/fiber tech, there's really no reason (other than cost) not to go with a fiber solution at this point, especially on busy clusters.Good luck . Link to comment Share on other sites More sharing options...
fizban2 Posted April 27, 2006 Share Posted April 27, 2006 Win NT paged pool errorsbah,i had looked at this yesteday about that paged pool errors, no point in asking about the swtiches now as it seems to be figured out. we need more interesting questions like this. makes me think more Link to comment Share on other sites More sharing options...
nmX.Memnoch Posted April 27, 2006 Author Share Posted April 27, 2006 Any idea on the 537 and 560 Failure Audits then? I can tell you that I know it only happens when the DBA's/Programmers have Enterprise Manager opened on their workstation and connected to one/any of the SQL instances on the cluster. Link to comment Share on other sites More sharing options...
fizban2 Posted April 27, 2006 Share Posted April 27, 2006 possibly when 2 dba's are trying to work with EM on the cluster at once? either connecting or editing 537 Failure Audit Logon Failure:Reason: The NetLogon component is not active 560: Success Audit Object Open From your logs we have the Audit of the object happening first trying to initiate something with the svchost, 13 seconds later the logon error occurs, could one user is trying to attach to a DB or node that is in use or locked for editing? throwing ideas out but trying to get the think tank started Link to comment Share on other sites More sharing options...
cluberti Posted April 27, 2006 Share Posted April 27, 2006 The real question is, which svchost.exe process is running at that PID? It seems odd that the svchost would be doing any logon operations, although reading from that key wouldn't be entirely odd if it were the netsvcs or dcom svchost process. It'll likely be a permissions issue, but we need to know which svchost process is causing it first. Link to comment Share on other sites More sharing options...
fizban2 Posted April 27, 2006 Share Posted April 27, 2006 (edited) The errors also occurred after upgrading to Windows 2003 Service Pack 1. The error would be generated every second continuously on the SQL server whenever a user was connected to the server via SQL Enterprise Manager, SQL Analysis Services, or when users tried to connect remotely via the Computer Management console. After following the KB article M907460, the problem was solved.<a href="http://eventid.net/display.asp?eventid=560&eventno=57&source=Security&phase=1" target="http://eventid.net/display.asp?eventid=560&eventno=57&source=Security&phase=1"></a>they have the same issue but silly event ID want you pay them to look and the anwser wow a slow day at work From a newsgroup post: "The 537 event is common when Kerberos fails. The operation will not necessarily fail, as the Kerberos failure might be followed immediately by a successful NTLM logon (look up "SNEGO" on MSDN to see how we try Kerberos first, then NTLM, for many authentication operations).There are two likely reasons why this occurred:1) No explicit Kerberos trust between the domain containing the machine doing the accessing and the domain containing the machine being accessed; in other words only an external trust or no trust between the domains.2) The SPN for the target machine was unavailable to the requesting machine, at the time of the request. This could be due to a lack of routing hints on the trust, or due to the absence of the SPN in the directory. The SETSPN utility in the Windows 2000 Resource Kit can be used to see if the SPN is in place, and to re-register it if not (SETSPN.EXE -L COMPUTERNAME)".From a newsgroup post: "If you are using protocol transition, this means you have to satisfy the following requirements:1) The Domain must be in Windows 2003 native mode.2) Act as part of operating system (TCB) privilege has to be granted to the process that calls “WindowsIdentity” on the front-end machine (where the code runs) and not on the domain controller. Please see the Kerberos protocol transition whitepaper for more details on these requirements".again off the EventID.net webpage, good info there. Edited April 27, 2006 by fizban2 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now