Jump to content

Long File Name error with international characters


Multibooter

Recommended Posts

When I scan an external USB HDD (750GB, one partition 239GB, another partition 126GB) under Win98SE from my old Inspiron 7500 laptop with Norton Disk Doctor or MS ScanDisk, everything is Ok.

When I scan the same external USB HDD from my recent Desktop computer under Win98SE, both Norton Disk Doctor and ScanDisk display phoney long file name error messages. This long file name error is detected in files whose names contain international characters (e.g. characters with accents, like "María" instead of "Maria"). Windows Explorer works ok, and after I replace in the file name the international character with a standard US character (e.g. í -> i), no more phoney error message under NDD and ScanDisk.

NDD displays the following phoney error message:

"Long File Name Error on Drive x The xxx folder contains one or more long filenames that are no longer associated with files. NDD will correct these errors by deleting these entries." Bye-bye eMule downloads if I don't watch out.

ScanDisk displays the following phoney error message:

"The xxx folder contains incorrect information about the file or folder whose MS-DOS name is <DOS 8.3 filename>. The file or folders's long name <...> is either stored incorrectly on your disk or is incompletely associated with <DOS 8.3 filename>."

On the desktop, which does display LFN error messages, I have installed pure Internet Explorer 5.5 SP1 (v5.50.4522.1800), no updates, to avoid sluggish file deletes. On the laptop, which displays no phoney LFN error messages, I have installed Internet Explorer 6.0 (v6.00.2600.0000). On both computers I have ScanDisk v4.90.3000 (WinME) and MS Layer for Unicode v1.1.3790.0.

This phoney error message is possibly caused by a Win98 DLL, not by NDD or ScanDisk, since the LFN error msg occurs under both NDD and ScanDisk. My old laptop without this error message contains a lot of software with many updates, while the Desktop with the error messages has a relatively recent installation of Win98SE, with most DLLs in \Windows\System\ much closer to the condition after installing Win98SE.

I have already run NDD with both the ANSI and the Unicode versions of ATL.DLL, but this didn't help. I have updated \Windows\System\ with all the MS system files contained on the Norton System Works 2004 CD in \Support\Msredist\, it didn't help either. BTW, the NSW 2004 CD contains in \Msie\ MS Internet Explorer 6.00.2800.1106; I don't have the box of NSW 2004 anymore, but the NSW 2005 box lists as system requirement "MS Internet Explorer 5.5 or later" and Windows 98 (NOT Win95).

When I originally installed Internet Explorer 5.5 SP1, in section Multi-Language Support I selected Language Auto-Selection, but I did not select any specific language.

Any other ideas about how to get rid of these false and annoying error messages? With ScanDisk it is possible at least to de-select long file name checking, but not with NDD.

Edited by Multibooter
Link to comment
Share on other sites


Here the most likely cause of the file name errors messages: They are not phoney messages, they are real errors as seen by Win98. Win98 cannot properly read all file names of files created under WinXP FAT32. Or in other words: WinXP FAT32 filenames are NOT backward compatible to Win98 when they contain international characters.

I originally got into the mess because of the sluggish file delete problem under Win98 on my laptop, caused by the installation of Microsoft's Internet Explorer 6 on the laptop. As a workaround I did most copying/moving/deleting/renaming not under Win98 anymore, but under WinXP, which does not have this sluggish file delete problem. So when I process my eMule downloads, which contain a lot of files with international characters in their names, I boot from Win98 into WinXP, and move the eMule downloads under WinXP, from the internal HDD to an external USB HDD.

But WinXP uses UTF-16 encoding of international characters, while Win98 uses UCF-2 encoding for international characters. WinXP apparently can read UCF-2 and UTF-16 encoded characters, but can only write file names with UTF-16 encoded characters. Win98 apparently can only read and write UCF-2 encoded characters. So by moving Win98-created-files under WinXP to the USB HDD, the international characters in their file names get converted to UTF-16, and Win98 has problems it.

Here a reproduceable example:

1) create a file named María.txt (with an accent over the i) under WinXP. WinXP creates a UTF-16 encoded file.

2) boot into Win98. ScanDisk will display a file name error for María.txt, because Win98 wants UCF-2 encoded filenames.

Here some info:

"UTF-8 is, however, currently used primarily on AIX, HP-UX, Solaris, and Linux... UCS-2 encoding is a fixed two-byte encoding sequence and is a method for transforming Unicode values into byte sequences for Microsoft Windows platforms. It is the standard for Windows 95, Windows 98, Windows Me, and Windows NT... UTF-16 is a superset of UCS-2, with the addition of some special characters in surrogate pairs. UTF-16 is the standard encoding for Windows 2000, Windows XP, and Windows Server 2003" http://www.datadirect.com/developer/odbc/u...round/index.ssp

"UTF-8 [e.g. of Linux]... is able to represent any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII. For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages,[1] and other places where characters are stored or streamed" http://en.wikipedia.org/wiki/UTF-8

File name cleanup

When you manipulate files with international characters in their file names under WinXP, you create problems under Win98 (and probably also under Linux). The easiest workaround would be to use only ASCII characters in file names.

Are there any Win98/WinXP utilities which convert filenames with Western international characters to plain ASCII file names? (e.g. remove accents in filenames + language specific changes in file names, such as ß->ss, ä->ae, ë->e, ñ->n, etc)

Edited by Multibooter
Link to comment
Share on other sites

Part #2 of bug puzzle: Why did ScanDisk on computer #1 detect file name errors, but not ScanDisk on computer#2?

2nd reproduceable example

a ) on computer#1, with the USB drive containing the file María.txt connected, run ScanDisk and select Repair in the small window displaying "... ScanDisk repairs the error....

b ) rerun ScanDisk: no more error msg

c ) after "successfully" repairing the error, connect the USB drive containing María.txt, to computer#2 and re-run ScanDisk.

Lo und behold: the file name error with María.txt is reported again!

Apparently repairing the file with ScanDisk didn't do anything to the USB HDD containing María.txt. But why did ScanDisk not report a file name error anymore on computer #1, after the repair? Possibly because ScanDisk wrote somewhere in \Windows\ or in \Program Files\ a list of "repaired" files for which ScanDisk/Windows should not report file name error messages, and María.txt was not yet on such a hypothetical list on computer #2.

And such a hypothetical file list, if it exists at all, where could it be? In the cloaked files named Index.dat, or in \Temporary Internet Files\Content.IE5, which cannot be readily accessed, or in the registry? A special feat of this "repair" is that a file name error, once it has been repaired by ScanDisk, is not reported anymore by Norton Disk Doctor either - except if you connect the USB HDD to another computer and run Norton Disk Doctor from that other computer, which results in the "repaired" error being displayed again by NDD.

Re: File name cleanup

The only utility for filename conversions I have seen under Google is convmv under Linux http://www.linux.com/feature/58689 but convmv seems to be made for converting to UTF-8/Linux. Again: Does anybody know of Win98/WinXP utilities which convert filenames with Western international characters to plain ASCII file names?

Edited by Multibooter
Link to comment
Share on other sites

I recall bumping into this a few times while shuffling Hard Drives between XP and 9x boxes. If I'm not mistaken, isn't there an INI setting for Scandisk in Win9x that reduces its sensitivity to LFN errors?

Anyway, what I do is generate a complete filelist of the HDD on both systems, WinXP and Win9x (specifically 98se in my cases). The two filelists then get WinDIFF'ed to locate the incompatible filenames and then the HDD gets stuffed back into a WinXP box to manually correct them.

However, since the internal DIR command output is vastly different between Win9x Command.com and WinXP Cmd.exe it wasn't a simple DIR C:\ /S /A >FileList.txt after all. I ended up using ATTRIB to get a path+name-only list and supplemented it with a couple of 3rd-party file list generators that create identical output under both OS'es (you want WinDIFF to only find real LFN differences, not output peculiarities).

I cannot remember if on Win9x you can just copy the file called by its SFN and rename it to something Win9x legal. But this of course still leaves the un-deletable original and its problematic LFN.

Fixing illegal LFN's on Win9x can probably be painfully accomplished by directly editing the sectors containing the Directory entries using something like Svend's Findpart or maybe Acronis Disk Editor or even Briggs DirSnoop. I just never pursued it far enough to be sure. The method described above works if you don't mind disk juggling.

Anyway, I'm hoping you do locate some GUI tool that can natively edit/delete these filenames within Win9x!

Link to comment
Share on other sites

Welcome to i18n - internationali{z|s}ation hell !

The differences you noticed might be related to the following general remarks.

Under WinDOS (Win9x/ME) what characters are allowed in filenames, how they are processed and stored will vary depending on the code page as set in :

- CONFIG.SYS : COUNTRY= ...

That should be consistent with the "codepage" which was selected [probably, by default] at the time Windows was installed (this cannot be changed later, at least not easily and reliably, certainly not in any way sanctioned by MS).

If you do *not* have a DOS 'COUNTRY=' statement, or no CONFIG.SYS at all, then defaults apply - generally language settings of the USA and CP 437, but really the defaults are built-in to IO.SYS and some regional versions of Windows might vary.

When booting to Windows 9x safe(?) mode or command-prompt-only-safe(?)-mode, as might be useful precisely when trying to repair the kind of problmes with file names, your CONFIG.SYS if it exists is bypassed anyway : another possible source of puzzling problems ;=)

In addition, if multibooting with an NT-derivative such as Win 2k, XP or later... the way those later systems store "legacy" 8.3 aliases on FAT(32) systems is subtly different from the way it is done in Win 98 and similar, which will lead to Scandisk complaints.

There are additional complications with some DOS and Windows versions that try to accomodate double-byte character sets. Microsoft never managed to deal properly with i18n, Unicode support in later OSes made it slightly better but created new conflicts with legacy systems.

HTH

--

Ninho

Link to comment
Share on other sites

Welcome to i18n - internationali{z|s}ation hell ! ... Under WinDOS (Win9x/ME) what characters are allowed in filenames, how they are processed and stored will vary depending on the code page as set in : - CONFIG.SYS : COUNTRY= ...

That should be consistent with the "codepage" which was selected [probably, by default] at the time Windows was installed (this cannot be changed later, at least not easily and reliably, certainly not in any way sanctioned by MS).

Thanks Ninho. I don't remember under which code page I installed Win98SE originally on my laptops many years ago. I have been using several Inspiron 7500 laptops with different built-in national keyboards, with the corresponding code page set or remmed out in config.sys. I am using on my main laptop a US keyboard, with the US International keyboard set in Windows.

Is there a way to find out under which code page Win98SE was originally installed?

I always use ASCII characters when I enter the name of a file to be saved, or for renaming, which has kept me out of mischief, until I started to use eMule.

eMule downloads have created this file name problem on my computer, under Win98 at least, because many files downloaded with eMule have names containing non-ASCII characters:

a ) eMule downloads sometimes have finished downloading, but then are stuck at "completing" because their filenames contain e.g. Chinese characters. The solution is to select the file stuck at Completing in the Transfer window of eMule -> right-click Details -> Name tab -> click on Cleanup button ->Ok. This will remove the '?' (=Chinese, etc characters) from the filename. Then select in the Transfer window the file with the cleaned up name -> right-click Resume and the download will be completed ok

b ) eMule under Win98 sometimes creates download files which cannot be renamed etc under Win98, only under WinXP, probably because of something in the filename.

c) and now this ScanDisk/NDD problem, when I access eMule downloads having filenames with international characters, stored on an external USB HDD. Under ScanDisk I can select at least -> Advanced -> de-select "Invalid file names", but under Norton Disk Doctor I have to select "ignore" for each "erroneous" filename with international characters (the default selection of NDD is "delete"[the file!!], not "ignore").

Ninho, I guess you are using the French code page, which already contains ' í ' with an accent, so the file name "María' might be Ok for you, but do you get the same error as I do with María, when you create a file named 'Niñoß¿" (ñ with a tilde, German ß, Spanish punctuation mark ¿)?

Edited by Multibooter
Link to comment
Share on other sites

Theoretically, this can easily be accomplished by any tool that supports Regular Expressions. For example, Total Commander has a multi-rename feature with RE capabilities.
Thanks Drugwash, your suggestion looks like it will produce the right results, if one knows Regular Expressions. But learning Regular Expressions looks quite time-consuming, I would only need it for multi-renames of eMule downloads.

I looked at Beyond Compare, which I have installed, and it also supports Regular Expressions.

Link to comment
Share on other sites

Is there a way to find out under which code page Win98SE was originally installed?

A quick and dirty way would be to look inside %windir%\system.ini (text file) : scan the [boot] section for a line similar to :

oemfonts.fon=vga850.fon

This example is telling that Windows was installed using OEM CP 850.

CP437 (EN/US) would use vgaoem.fon, others use vgaxxx.fon where xxx is code page number.

This is a good guess, not 100% foolproof, for the oemfonts could have been changed by someone or something without affecting the CP used for DOS filenames...

Should you find discrepancies, there is (used to be) a utility on the Win 95 installation CD which supposedly could change CPs after the fact, unsupported and with all the caveats. The version from Win 95 does NOT work properly on Windows 98, and I don't know if there is an updated version on MS Windows 98 CDs - it is definitely not on my Windows 98 OEM preinstalled :=(

That whole internationalisation affair has been a total fiasco ever since the days of MS DOS 3.2/3.3, when MS started to take over the lucrative business of selling DOS to final users; previously they had been licensing DOS to OEMs who would do the necessary local adaptations themselves.

Ninho, I guess you are using the French code page, which already contains ' í ' with an accent, so the file name "María' might be Ok for you, but do you get the same error as I do with María, when you create a file named 'Niñoß¿" (ñ with a tilde, German ß, Spanish punctuation mark ¿)?

I'll do some testing later...

Edited by Ninho
Link to comment
Share on other sites

This example is telling that Windows was installed using OEM CP 850.

CP437 (EN/US) would use vgaoem.fon, others use vgaxxx.fon where xxx is code page number.

This is a good guess, not 100% foolproof, for the oemfonts could have been changed by someone or something without affecting the CP used for DOS filenames...

System.ini on my desktop, which has a relatively recent installation of US Win98SE, confirms that vgaoem.fon is an indicator of the US code page. Here some lines from system.ini of this desktop computer:

oemfonts.fon=vgaoem.fon

oemansi.bin=

aspect=100,96,96

woafont=dosapp.fon

On my laptop, however, which reflects an installation of Win98SE of originally maybe 9 years ago, I have the folling lines in System.ini:

oemfonts.fon=8514oem.fon

oemansi.bin=xlat850.bin

aspect=100,120,120

woafont=app850.fon

On the laptop I have selected the Windows setting of large screen fonts [large (120 DPI) display], which seems to have replaced whatever original setting there was with 8514oem.fon http://www.psc-consulting.ca/fenske/publish.htm

The other entries in system.ini look like I have code page 850 installed on the laptop http://en.wikipedia.org/wiki/Code_page_850 but I have no idea whether this would indicate the code page used during the original installation of Win98SE.

Link to comment
Share on other sites

I usually clean up file names right after adding to download list. Strange thing is, when adding a file to download, in eMule's statusbar (bottom-left) any Unicode characters are displayed properly! I tested by adding files with Chinese, Japanese, Vietnamese, Korean characters in their names mixed with English characters and they were all correctly displayed.

Same with the tooltips, although I suspect those are HTML-based (in XP there's a file icon displayed along with the text, which misses in 9x).

Link to comment
Share on other sites

Ninho, I guess you are using the French code page, which already contains ' í ' with an accent, so the file name "María' might be Ok for you, but do you get the same error as I do with María, when you create a file named 'Niñoß¿" (ñ with a tilde, German ß, Spanish punctuation mark ¿)?
It works for me. No error detected by scandisk.

In fact, I use both XP and 98 on a daily basis and I have never faced this problem, unless I use asian characters.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...