Jump to content

CIFS NAS - intermittent "system cannot find the file specified&#34


adamt
 Share

Recommended Posts

Dear all,

We have several Win2k3 servers which are accessing files from an OnStor Bobcat NAS appliance.

Sometimes, files seem to go missing. They appear in the directory listings, and I can use "type" on the command line, but not "start".

The NAS device is returning error 58 in response to a request for these files, eg:

Smb: R; Nt Create Andx - NT Status: System - Error, Code = (58) STATUS_OBJECT_PATH_NOT_FOUND

Rebooting the servers seems to resolve the issue.

Handle.exe isn't showing anything with a handle open to the files in question. And bizarrely, not all servers are affected at the same time.

Eg, one server will show:

C:\Control>start \\uk6nas03\share14\img\thumbs\shoe180.jpg

The system cannot find the file \\uk6nas03\share14\img\thumbs\shoe180.jpg

Whereas other servers will be able to open the file just fine.

What puzzles me is that even when an affected server shows the issue, you can still access the file by either using the FQDN, or the IP address.

Eg:

\\uk6nas03\share14\img\thumbs\shoe180.jpg - will fail

\\uk6nas03.my.fqdn.example.com\share14\img\thumbs\shoe180.jpg - will work

\\10.79.3.25\share14\img\thumbs\shoe180.jpg - will also work.

In netmon, the SMB requests seem to have the same flags set for things like oplocks, signing, compression and longfiles. ProcMon/Filemon don't seem to be showing me requests made for files on the NAS (successful or otherwise).

Has anyone seen this sort of behaviour from an SMB/CIFS server before?

Link to comment
Share on other sites


In netmon, the SMB requests seem to have the same flags set for things like oplocks, signing, compression and longfiles. ProcMon/Filemon don't seem to be showing me requests made for files on the NAS (successful or otherwise).

Has anyone seen this sort of behaviour from an SMB/CIFS server before?

Just to reply to my own posting, in case anyone else runs in to similar issues.

The issue appears to be DFS on the NAS device. Occasionally, in response to a request for a file, the NAS device responds with GET_DFS_REFERRAL.

When this happens, the request for the file fails (as the path is duplicated). When file requests are working, there is no dfs referral.

I still don't know the root cause of the issue, but running dfsutil.exe /pktflush gets things working again for a little while, at least.

Link to comment
Share on other sites

Curiouser and curiouser...

When files are becoming unavailable in this manner, both Netmon and Wireshark show something very odd happening.

If I request the following file:

\\uk6nas03\share14\img\thumbs\DK83X.TTF

What I'm actually seeing in the packet capture is a request for:

\uk6nas03\share14\img\thumbs\DK83X.TTF\uk6nas03\share14\img\thumbs\DK83X.TTF

- The entire SMB path has been duplicated.

This is not happening for all files, nor even some files all of the time. It seems to crop up on a few files, from specific servers, every few hours or so.

It can't be MUP.SYS, since I don't need to purge the MUP cache. That leaves mrxsmb and rdbss. ...

Link to comment
Share on other sites

  • 1 month later...

If I request the following file:

\\uk6nas03\share14\img\thumbs\DK83X.TTF

What I'm actually seeing in the packet capture is a request for:

\uk6nas03\share14\img\thumbs\DK83X.TTF\uk6nas03\share14\img\thumbs\DK83X.TTF

- The entire SMB path has been duplicated.

This is not happening for all files, nor even some files all of the time. It seems to crop up on a few files, from specific servers, every few hours or so.

It can't be MUP.SYS, since I don't need to purge the MUP cache. That leaves mrxsmb and rdbss. ...

Just to reply to my own thread... I found that running dfsutil.exe /pktflush would alleviate the issue temporarily (which is easier to do than reboot the machine).

For this specific issue, running dfsutil.exe /pktinfo will show you an entry with "State:0x09" for each file you are unable to access:

Entry: \uk6nas03\nas-l4\mb2c\stage\ZZ_915939_IN\ruby\config.xml

ShortEntry: \uk6nas03\nas-l4\mb2c\stage\ZZ_915939_IN\ruby\config.xml

Expires in 0 seconds

UseCount: 0 Type:0x81 ( REFERRAL_SVC DFS )

0:[\uk6nas03\nas-l4\mb2c\stage\ZZ_915939_IN\ruby\config.xml] State:0x09 ( )

The troublesome NAS devices were OnStor/LSI BobCat devices, and they had 'widelinks' enabled. Once this was disabled (we weren't using it), the issue never returned.

Still seems bizarre, and I've been unable to find out what state 0x09 maps to. At least everything's working now, though.

Link to comment
Share on other sites

Actually, that is likely this error from winerror.h:

  ERROR_INVALID_BLOCK                                           winerror.h
# The storage control block address is invalid.

This is due to non-optimal ordering in the DFS referral cache, usually caused by problems underneath the DFS layer. See KB905846 - the updated srv.sys is for fixes to sysvol and netlogon via DFS, so the error can exist in other places even with the fix (and I've seen it before, almost always with 3rd party NAS devices adding their own layer below the FS or network - Windows DFS *hates* this). Given that Wide Links (according to the OnStor documentation) allows for file symlinks to span multiple volumes or even devices, I have no doubt this will make DFS mad - someone's monkeying with file blocks underneath the FS, hence the storage block error you see ;).

Also, the documentation for Wide Links makes it sound like this is a *replacement* for DFS (dfs server without a windows server - done right on the NAS). After reading how it works, it would seem you really can't use both at the same time - the documentation doesn't say you cannot, but given how it is documented to work by OnStor and knowing how DFS works, having them both enabled at the same time seems like a recipe for failure.

Link to comment
Share on other sites

Actually, that is likely this error from winerror.h:

  ERROR_INVALID_BLOCK                                           winerror.h
# The storage control block address is invalid.

I didn't consider it to be a win32 error code because other healthy cache entries appear with status 0x19, and "The drive cannot locate a specific area or track on the disk." does not make any sense in this context.

Also, the documentation for Wide Links makes it sound like this is a *replacement* for DFS (dfs server without a windows server - done right on the NAS). After reading how it works, it would seem you really can't use both at the same time - the documentation doesn't say you cannot, but given how it is documented to work by OnStor and knowing how DFS works, having them both enabled at the same time seems like a recipe for failure.

Once the DFS element had been discovered, we disabled the WideLinks setting and all was resolved. Curiously, LSI/OnStor did not seem to think that WideLinks could possibly be the cause of such an issue.

Link to comment
Share on other sites

Given what Wide Links does, I'm guessing that state is actually a Win32 status (not error, but status) message. Status is also an "error" code, which is basically a return from whatever check produces that state. 0x09 given how DFS works (and what fixes it) means it really is likely an incorrect or invalid SCB in the SMB request by DFS from the client.

Link to comment
Share on other sites

Given what Wide Links does, I'm guessing that state is actually a Win32 status (not error, but status) message.

Interesting... where would that be defined if not in winerror.h or ntstatus.h?

Link to comment
Share on other sites

I think I fail to understand your question?

Well... if it is a status code, or an error code - it must be defined somewhere, else all error messages would just generate a random number.

Since it isn't in winerror.h or ntstatus.h, I was wondering where it is defined.

Link to comment
Share on other sites

It is coming from winerror.h.

It can't be coming from winerror.h, because healthy cache entries appear with status 0x19, and "The drive cannot locate a specific area or track on the disk." does not make any sense in this context.

Also, surely an error regarding tracks on disks would need to come from something like ntfs.sys, rather than mup.sys or rdr.sys?

Link to comment
Share on other sites

The error, given how the NAS is working, does make sense. Remember, error codes are also return codes, and this one fits. If I had a debugger and a NAS I could prove it, but experience says this is where you end up.

Link to comment
Share on other sites

  • 2 weeks later...

The error, given how the NAS is working, does make sense. Remember, error codes are also return codes, and this one fits. If I had a debugger and a NAS I could prove it, but experience says this is where you end up.

Interesting idea, and one I'd like to try.

Would you need a serial/kernel debug session, or should I just be able to attach to dfsutil and see what I'm looking for?

Thanks,

Adam.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.


×
×
  • Create New...