Jump to content

Cluster Administrator


Recommended Posts

We've been having an issue that I hope some of you can help me figure out...

We have two servers setup in a cluster running Windows Server 2003, Enterprise Edition SP1 and SQL Server 2000 SP4. These servers were originally running Windows 2000 Advanced Server SP4, but were upgraded to Windows Server 2003, Enterprise Edition using the rolling upgrade method.

Server Clusters: Rolling Upgrades. Upgrading to Windows Server 2003

http://www.microsoft.com/technet/prodtechn...g/rllupnet.mspx

http://technet2.microsoft.com/WindowsServe...3bea3a1033.mspx

Upgrading to Windows Server 2003, Enterprise Edition, and Windows Server 2003, Datacenter Edition on Cluster Nodes

http://technet2.microsoft.com/WindowsServe...a406de1033.mspx

We went through an "audit" of sorts last year where we basically have to certify and document that the security settings meet the policy requirements. During this "audit" if anything was found non-compliant we had to either fix it or document why we couldn't fix it. We started this process with Windows 2000 Advanced Server so I do not remember if this issue appeared before or after the upgrade (although I'm pretty sure it was before). Anyway, on to the actual problem.

When I try to make a remote connection to the cluster using Cluster Adminstrator I recieve the following error:

An error occurred attempting to open cluster node "XXXXXXXX".

Not enough server storage is available to process this command.

Error ID: 1130 (0000046a)

If anyone has any insight please share it. If you require more information please let me know and I'll provide what I can.

Link to comment
Share on other sites


:( no luck,

only found issues with older NT issues, seen others have that issue, but never saw a resolution for it... have you tried the KB articles that apply to the error. i know the state for NT and 2000 but it might help, i will try and find the KB articles i saw again

is there a backup client being used on these machines? backup exec etc?

also what AV do you guys run? Norton? mcAfee?

Link to comment
Share on other sites

seen others have that issue, but never saw a resolution for it...

That's pretty much the same result I had...

have you tried the KB articles that apply to the error.

I don't think I ever found any.

is there a backup client being used on these machines? backup exec etc?

Nope...just Server 2003, Enterprise Edition SP1, SQL Server 2000 SP4, Office XP SP3 and a few other things apps that were pre-existing to the problem.

We are, however, also getting a ton of Event 537 and 560 Failure Audits though.

Event Type:	Failure Audit
Event Source: Security
Event Category: Logon/Logoff
Event ID: 537
Date: 4/26/2006
Time: 10:06:48 AM
User: NT AUTHORITY\SYSTEM
Computer: XXXXXXXXXX
Description:
Logon Failure:
Reason: An error occurred during logon
User Name:
Domain:
Logon Type: 3
Logon Process: ÐùWD
Authentication Package: NTLM
Workstation Name:
Status code: 0x80090302
Substatus code: 0x0
Caller User Name: -
Caller Domain: -
Caller Logon ID: -
Caller Process ID: -
Transited Services: -
Source Network Address: -
Source Port: -

Event Type:	Failure Audit
Event Source: Security
Event Category: Object Access
Event ID: 560
Date: 4/26/2006
Time: 10:06:35 AM
User: NT AUTHORITY\SYSTEM
Computer: XXXXXXXXXXXX
Description:
Object Open:
Object Server: Security
Object Type: Key
Object Name: \REGISTRY\USER\.DEFAULT
Handle ID: -
Operation ID: {0,568296711}
Process ID: 1396
Image File Name: C:\WINNT\system32\svchost.exe
Primary User Name: XXXXXXXXXXXX$
Primary Domain: XXXXXXXX
Primary Logon ID: (0x0,0x3E7)
Client User Name: XXXXXXXXXXXX$
Client Domain: XXXXXXXX
Client Logon ID: (0x0,0x3E7)
Accesses: MAX_ALLOWED

Privileges: -
Restricted Sid Count: 0
Access Mask: 0x2000000

Link to comment
Share on other sites

Usually, when you get an 1130 or 1450, you're out of kernel nonpaged pool or kernel paged pool memory on one of the nodes (or you have some, but not enough available to service the request). I've got a poolmon utility script you can run to get that information from these machines - PM me and I'll provide it for you (it's much better than poolmon.exe, before anyone complains that you can just run poolmon.exe :)).

Link to comment
Share on other sites

PM sent. :)

After reading what you said about the kernel paged pool memory I have a feeling that the /3GB switch may be doing it. The servers have 8GB RAM each so I need the /PAE switch, but I keep forgetting to remove the /3GB switch. I honestly don't remember if the issue appeared before or after I added the /3GB for testing. It didn't "seem" to cause any problems so I wasn't in any real rush to remove it.

Link to comment
Share on other sites

The use of /3GB means you have 50% less kernel paged and nonpaged pool memory available from the start, and if you do a lot of heavy lifting via SQL, you'll find yourself running out of one (or both) relatively quickly. Since SQL is pretty good at it's own memory management your servers haven't crashed or hung with 2019s or 2020s, but you're probably dangerously close all of the time :).

Link to comment
Share on other sites

I'll edit the boot.ini tomorrow and reboot the servers this weekend.

Do you think it'd be worthwhile to run the poolmon script before and after? I may do it anyway if for no other reason than my own education. :)

Link to comment
Share on other sites

Yes, especially before and after - the error should go away when you remove /3GB, but it'd be wise to have the numbers just in case.

I've heard so many differing opinions regarding /3GB and /PAE when running a SQL server, and I've heard some say you need /3GB below 16GB of RAM, some say you don't need /3GB at all, etc. It's true that you may need it if you do find your SQL processes running out of their default 2GB process space, but otherwise /3GB shouldn't be used on a SQL server (especially if no other applications on the server can take advantage of the additional 1GB of process space) due to the kernel memory restrictions, which can be exacerbated in a cluster environment which requires more kernel memory to run efficiently.

Ultimately, my stance is that I would not recommend use of the /3GB switch with /PAE because of the limitation of kernel mode memory with the /3GB switch, and the use of /3GB with /PAE should only be used when absolutely necessary. With SQL 2005 64bit and Windows Server 2003 x64, it is now much more efficient to upgrade to a 64bit database product on a 64bit OS than to fight with the limitations of the 32bit architecture :). This will also apply to Exchange 12 when it's released as well - we've reached a wall with 32bit architectures and the way Windows is able to manage memory, and the only real option to make memory-hungry databases and email servers (etc) is to go to 64bit. Even SQL 2000 SP4 on a Server 2003 x64 box gets the benefit of all 4GB of process space, due to it not being split into 2GB/2GB or 3GB/1GB as with 32bit Windows.

Link to comment
Share on other sites

Yes, especially before and after - the error should go away when you remove /3GB, but it'd be wise to have the numbers just in case.

Sounds like a plan. :)

I've heard so many differing opinions regarding /3GB and /PAE when running a SQL server, and I've heard some say you need /3GB below 16GB of RAM, some say you don't need /3GB at all, etc.

Same here...which is why I wanted to test it for myself. :)

We're about to upgrade both nodes to 16GB anyway.

Ultimately, my stance is that I would not recommend use of the /3GB switch with /PAE because of the limitation of kernel mode memory with the /3GB switch, and the use of /3GB with /PAE should only be used when absolutely necessary.

After learning more about /3GB over the last few months I'd have to say I agree with that completely.

With SQL 2005 64bit and Windows Server 2003 x64, it is now much more efficient to upgrade to a 64bit database product on a 64bit OS than to fight with the limitations of the 32bit architecture :).

We have plans to make that move about this time next year (no funding this year). We're doing a complete cluster replacement to either a pair of quad-socket dual-core 64-bit systems or quad-socket quad-core 64-bit systems...depending on what's out and costs. I've also talked them into going fiber channel with the drives. They're starting to see I/O errors on one of the drive sets but that's just because they're processing too much on the one drive set. We're adding some drives and moving processes around to bandage up that problem.

Link to comment
Share on other sites

SCSI clusters were the way of NT4.0, and with the advances in HBA/fiber tech, there's really no reason (other than cost) not to go with a fiber solution at this point, especially on busy clusters.

Good luck :).

Link to comment
Share on other sites

Any idea on the 537 and 560 Failure Audits then? :)

I can tell you that I know it only happens when the DBA's/Programmers have Enterprise Manager opened on their workstation and connected to one/any of the SQL instances on the cluster.

Link to comment
Share on other sites

possibly when 2 dba's are trying to work with EM on the cluster at once? either connecting or editing

537 Failure Audit Logon Failure:

Reason: The NetLogon component is not active

560: Success Audit Object Open

From your logs we have the Audit of the object happening first trying to initiate something with the svchost, 13 seconds later the logon error occurs, could one user is trying to attach to a DB or node that is in use or locked for editing? throwing ideas out but trying to get the think tank started :)

Link to comment
Share on other sites

The real question is, which svchost.exe process is running at that PID? It seems odd that the svchost would be doing any logon operations, although reading from that key wouldn't be entirely odd if it were the netsvcs or dcom svchost process. It'll likely be a permissions issue, but we need to know which svchost process is causing it first.

Link to comment
Share on other sites

The errors also occurred after upgrading to Windows 2003 Service Pack 1. The error would be generated every second continuously on the SQL server whenever a user was connected to the server via SQL Enterprise Manager, SQL Analysis Services, or when users tried to connect remotely via the Computer Management console. After following the KB article M907460, the problem was solved.

<a href="http://eventid.net/display.asp?eventid=560&eventno=57&source=Security&phase=1" target="http://eventid.net/display.asp?eventid=560&eventno=57&source=Security&phase=1"></a>

they have the same issue but silly event ID want you pay them to look and the anwser :(

wow a slow day at work :)

From a newsgroup post: "The 537 event is common when Kerberos fails. The operation will not necessarily fail, as the Kerberos failure might be followed immediately by a successful NTLM logon (look up "SNEGO" on MSDN to see how we try Kerberos first, then NTLM, for many authentication operations).

There are two likely reasons why this occurred:

1) No explicit Kerberos trust between the domain containing the machine doing the accessing and the domain containing the machine being accessed; in other words only an external trust or no trust between the domains.

2) The SPN for the target machine was unavailable to the requesting machine, at the time of the request. This could be due to a lack of routing hints on the trust, or due to the absence of the SPN in the directory. The SETSPN utility in the Windows 2000 Resource Kit can be used to see if the SPN is in place, and to re-register it if not (SETSPN.EXE -L COMPUTERNAME)".

From a newsgroup post: "If you are using protocol transition, this means you have to satisfy the following requirements:

1) The Domain must be in Windows 2003 native mode.

2) Act as part of operating system (TCB) privilege has to be granted to the process that calls “WindowsIdentity” on the front-end machine (where the code runs) and not on the domain controller. Please see the Kerberos protocol transition whitepaper for more details on these requirements".

again off the EventID.net webpage, good info there.

Edited by fizban2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...