adamt Posted September 28, 2010 Share Posted September 28, 2010 Dear all,We have several Win2k3 servers which are accessing files from an OnStor Bobcat NAS appliance.Sometimes, files seem to go missing. They appear in the directory listings, and I can use "type" on the command line, but not "start".The NAS device is returning error 58 in response to a request for these files, eg:Smb: R; Nt Create Andx - NT Status: System - Error, Code = (58) STATUS_OBJECT_PATH_NOT_FOUNDRebooting the servers seems to resolve the issue.Handle.exe isn't showing anything with a handle open to the files in question. And bizarrely, not all servers are affected at the same time.Eg, one server will show:C:\Control>start \\uk6nas03\share14\img\thumbs\shoe180.jpgThe system cannot find the file \\uk6nas03\share14\img\thumbs\shoe180.jpgWhereas other servers will be able to open the file just fine.What puzzles me is that even when an affected server shows the issue, you can still access the file by either using the FQDN, or the IP address.Eg:\\uk6nas03\share14\img\thumbs\shoe180.jpg - will fail\\uk6nas03.my.fqdn.example.com\share14\img\thumbs\shoe180.jpg - will work\\10.79.3.25\share14\img\thumbs\shoe180.jpg - will also work.In netmon, the SMB requests seem to have the same flags set for things like oplocks, signing, compression and longfiles. ProcMon/Filemon don't seem to be showing me requests made for files on the NAS (successful or otherwise).Has anyone seen this sort of behaviour from an SMB/CIFS server before? Link to comment Share on other sites More sharing options...
adamt Posted September 30, 2010 Author Share Posted September 30, 2010 In netmon, the SMB requests seem to have the same flags set for things like oplocks, signing, compression and longfiles. ProcMon/Filemon don't seem to be showing me requests made for files on the NAS (successful or otherwise).Has anyone seen this sort of behaviour from an SMB/CIFS server before?Just to reply to my own posting, in case anyone else runs in to similar issues.The issue appears to be DFS on the NAS device. Occasionally, in response to a request for a file, the NAS device responds with GET_DFS_REFERRAL.When this happens, the request for the file fails (as the path is duplicated). When file requests are working, there is no dfs referral.I still don't know the root cause of the issue, but running dfsutil.exe /pktflush gets things working again for a little while, at least. Link to comment Share on other sites More sharing options...
adamt Posted October 4, 2010 Author Share Posted October 4, 2010 Curiouser and curiouser...When files are becoming unavailable in this manner, both Netmon and Wireshark show something very odd happening.If I request the following file:\\uk6nas03\share14\img\thumbs\DK83X.TTFWhat I'm actually seeing in the packet capture is a request for:\uk6nas03\share14\img\thumbs\DK83X.TTF\uk6nas03\share14\img\thumbs\DK83X.TTF- The entire SMB path has been duplicated.This is not happening for all files, nor even some files all of the time. It seems to crop up on a few files, from specific servers, every few hours or so.It can't be MUP.SYS, since I don't need to purge the MUP cache. That leaves mrxsmb and rdbss. ... Link to comment Share on other sites More sharing options...
adamt Posted November 30, 2010 Author Share Posted November 30, 2010 If I request the following file:\\uk6nas03\share14\img\thumbs\DK83X.TTFWhat I'm actually seeing in the packet capture is a request for:\uk6nas03\share14\img\thumbs\DK83X.TTF\uk6nas03\share14\img\thumbs\DK83X.TTF- The entire SMB path has been duplicated.This is not happening for all files, nor even some files all of the time. It seems to crop up on a few files, from specific servers, every few hours or so.It can't be MUP.SYS, since I don't need to purge the MUP cache. That leaves mrxsmb and rdbss. ...Just to reply to my own thread... I found that running dfsutil.exe /pktflush would alleviate the issue temporarily (which is easier to do than reboot the machine).For this specific issue, running dfsutil.exe /pktinfo will show you an entry with "State:0x09" for each file you are unable to access:Entry: \uk6nas03\nas-l4\mb2c\stage\ZZ_915939_IN\ruby\config.xmlShortEntry: \uk6nas03\nas-l4\mb2c\stage\ZZ_915939_IN\ruby\config.xmlExpires in 0 secondsUseCount: 0 Type:0x81 ( REFERRAL_SVC DFS ) 0:[\uk6nas03\nas-l4\mb2c\stage\ZZ_915939_IN\ruby\config.xml] State:0x09 ( )The troublesome NAS devices were OnStor/LSI BobCat devices, and they had 'widelinks' enabled. Once this was disabled (we weren't using it), the issue never returned.Still seems bizarre, and I've been unable to find out what state 0x09 maps to. At least everything's working now, though. Link to comment Share on other sites More sharing options...
cluberti Posted November 30, 2010 Share Posted November 30, 2010 Actually, that is likely this error from winerror.h: ERROR_INVALID_BLOCK winerror.h# The storage control block address is invalid.This is due to non-optimal ordering in the DFS referral cache, usually caused by problems underneath the DFS layer. See KB905846 - the updated srv.sys is for fixes to sysvol and netlogon via DFS, so the error can exist in other places even with the fix (and I've seen it before, almost always with 3rd party NAS devices adding their own layer below the FS or network - Windows DFS *hates* this). Given that Wide Links (according to the OnStor documentation) allows for file symlinks to span multiple volumes or even devices, I have no doubt this will make DFS mad - someone's monkeying with file blocks underneath the FS, hence the storage block error you see .Also, the documentation for Wide Links makes it sound like this is a *replacement* for DFS (dfs server without a windows server - done right on the NAS). After reading how it works, it would seem you really can't use both at the same time - the documentation doesn't say you cannot, but given how it is documented to work by OnStor and knowing how DFS works, having them both enabled at the same time seems like a recipe for failure. Link to comment Share on other sites More sharing options...
adamt Posted November 30, 2010 Author Share Posted November 30, 2010 Actually, that is likely this error from winerror.h: ERROR_INVALID_BLOCK winerror.h# The storage control block address is invalid.I didn't consider it to be a win32 error code because other healthy cache entries appear with status 0x19, and "The drive cannot locate a specific area or track on the disk." does not make any sense in this context.Also, the documentation for Wide Links makes it sound like this is a *replacement* for DFS (dfs server without a windows server - done right on the NAS). After reading how it works, it would seem you really can't use both at the same time - the documentation doesn't say you cannot, but given how it is documented to work by OnStor and knowing how DFS works, having them both enabled at the same time seems like a recipe for failure.Once the DFS element had been discovered, we disabled the WideLinks setting and all was resolved. Curiously, LSI/OnStor did not seem to think that WideLinks could possibly be the cause of such an issue. Link to comment Share on other sites More sharing options...
cluberti Posted November 30, 2010 Share Posted November 30, 2010 Given what Wide Links does, I'm guessing that state is actually a Win32 status (not error, but status) message. Status is also an "error" code, which is basically a return from whatever check produces that state. 0x09 given how DFS works (and what fixes it) means it really is likely an incorrect or invalid SCB in the SMB request by DFS from the client. Link to comment Share on other sites More sharing options...
adamt Posted December 2, 2010 Author Share Posted December 2, 2010 Given what Wide Links does, I'm guessing that state is actually a Win32 status (not error, but status) message.Interesting... where would that be defined if not in winerror.h or ntstatus.h? Link to comment Share on other sites More sharing options...
cluberti Posted December 2, 2010 Share Posted December 2, 2010 I think I fail to understand your question? Link to comment Share on other sites More sharing options...
adamt Posted December 8, 2010 Author Share Posted December 8, 2010 I think I fail to understand your question?Well... if it is a status code, or an error code - it must be defined somewhere, else all error messages would just generate a random number.Since it isn't in winerror.h or ntstatus.h, I was wondering where it is defined. Link to comment Share on other sites More sharing options...
cluberti Posted December 10, 2010 Share Posted December 10, 2010 It is coming from winerror.h. Link to comment Share on other sites More sharing options...
adamt Posted December 16, 2010 Author Share Posted December 16, 2010 It is coming from winerror.h.It can't be coming from winerror.h, because healthy cache entries appear with status 0x19, and "The drive cannot locate a specific area or track on the disk." does not make any sense in this context.Also, surely an error regarding tracks on disks would need to come from something like ntfs.sys, rather than mup.sys or rdr.sys? Link to comment Share on other sites More sharing options...
cluberti Posted December 20, 2010 Share Posted December 20, 2010 The error, given how the NAS is working, does make sense. Remember, error codes are also return codes, and this one fits. If I had a debugger and a NAS I could prove it, but experience says this is where you end up. Link to comment Share on other sites More sharing options...
adamt Posted December 29, 2010 Author Share Posted December 29, 2010 The error, given how the NAS is working, does make sense. Remember, error codes are also return codes, and this one fits. If I had a debugger and a NAS I could prove it, but experience says this is where you end up.Interesting idea, and one I'd like to try.Would you need a serial/kernel debug session, or should I just be able to attach to dfsutil and see what I'm looking for?Thanks,Adam. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now