NLS .vs. Invalid chars for LFN

**Joseph_sw** · February 21, 2009

from varios post, it said that meddling with .nls & fiddling with NLS\Codepage\ACP key in registry may allow file with 'invalid character' to be created / accessed.

i tried by changing ACP value from 1252 to 1251, then i tried to create a file with Alt-0129, Alt-0141 to Alt-0144, Alt-1057, Alt-0128 character in it, looks like windows doesn't complain as it usualy do.

since it seems work, i tried to install a Japanse game which normaly refuse the installation as it create folder / filename that contain rejected characters (those Alt-####), it was a success, i can even play it on my english win98SE now.

...

however, if i keep ACP value to 1252, but copied cp_1251.nls into cp_1252.nls instead, (i backup the original cp_1252.nls, of course)

english win98SE, will again refuse such invalid characters in filename.

on flip side, i also tried the other way arround, set ACP to 1251, copy original cp_1252.nls from win98SE cabs into cp_1251.nls in SYSTEM directory,

and as i guessed, i can created such file using that 'invalid characters'.

how come?

there must be something else that dictates which (ASCII)character that considered to be legal as LFN entries...

Edited February 21, 2009 by Joseph_sw

**Joseph_sw** · February 24, 2009

it seems, setting ACP to 1251 not really a perfect solution, as with it, character Alt-0152 can't be used in file name (such character can be used while ACP was 1252)

based on very informative post by SlugFiller, i create a simple program to check wether CodePage -> Unicode -> Codepage conversion was correct on cp_1252.nls & cp_1251.nls

the program somehow like this:

u16 cp2u(cp)
{
  return codepage_to_unicode_datastruct[cp];
}

u16 u2cp(uni)
{
  pass_1st = uni >> 8;
  pass_2nd = (uni >> 4) & 0xF;
  pass_3rd = uni & 0xF;

  pass_2nd += unicode_to_codepage_datastruct[pass_1st];
  pass_3rd += unicode_to_codepage_datastruct[pass_2nd];
  return unicode_to_codepage_datastruct[pass_3rd];
}

check
{
   for (a=0;a++;a<256)
  {
	u = cp2u(a);
	c = u2cp(u);  
	if (a<>c)
	 {
	   // show pop-up & stuff
	 }
  }
}

i runs codepage indices {a} from 0 to 255 throughly to check wether resulted codepage {c} back from converted to unicode {u} was different somewhat,

both cp_1252.nls & cp_1251.nls result said for each of all 256 the value of {c} will equal to {a}.

so, if not from the cp_nnnn.nls files, from where or how did win98SE, declare such characters considered illegal in file name for certain codepage settings?

i want all character alt-0127 to alt-0255 can be used for file name.... (if possible, while keeping ACP as 1252)

**Tihiy** · February 24, 2009

I'm pretty sure it's in nls files.

**Joseph_sw** · February 25, 2009

i think there must be more than just .nls files

i try something like this:

Copy cp_1251.nls (size = 6,926 bytes) to cp_1299.nls on %windir%\system directory.

Apply these registry

REGEDIT4

[HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Nls\Codepage]
"1299"="cp_1299.nls"
"ACP"="1299"

then restart

try to rename / create filename with alt-0129, alt-0141, alt-0142, alt-0143, alt-0144, alt-0152, alt-0157, alt-0158
the result only alt-0152 can be used as filename.

now for second part of the test, this time i apply this registry

[HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Nls\Codepage]
"ACP"="1251"

then restart

try to rename / create filename with alt-0129, alt-0141, alt-0142, alt-0143, alt-0144, alt-0152, alt-0157, alt-0158
the result was DIFFERENT, this time only alt-0152 that can not be used as filename.

despite using binary wise, same .nls files, the result was diferent, depending on ACP values on registry.

Edited February 25, 2009 by Joseph_sw

**Chozo4** · April 18, 2009

I know it's slightly older of a topic but didn't see the point in making a whole new one as I'd thought this somewhat relevant.

I've tried the methods listed here but couldn't seem to copy certain files in the japanese language character set. Changing the Active Code Page (ACP) to 1251 did not seem to rectify the issue, nor did changing the ACP to 932 (Japanese NLS). However, after a lil fiddling around - I'd noticed the key 'OEMCP' and matched the ACP to that (CP_432.nls).

Needless to say, I was then able to copy over these files using the japanese subset even though most of the characters were listed by explorer as other random characters. The only side effect I'd noticed offhand were those items named using the ALT+0160 character (for 'label-less' icons) had their name changed to an 'a' or were simply accessable. Changing the names using the old character set rectified the issue. Then again, alt+0160 shouldn't have been used anyway as some disk scanners do not like that due to lack of dos accessibility. Better solution for this would be to make real label-less icons using the CLSID and Desktop Namespace sections of the registry.

**Drugwash** · April 19, 2009

I renamed 'My Computer' to ALT+0160 years ago and the system's still up and running.

Haven't been fiddling with changing codepages though.

**Joseph_sw** · April 26, 2009

i'm just adding information, that i found recently.

it seems the ACP value also dictate how windows handle strings.

i have installed some japanese games, with normal single byte ACP (1252, 1252, 437) the game will complain that theres syntax error on its internal scripts.

suspiciously, the said part of that script were in shift-JIS ...

to confirm my suspicion, i change the ACP value into 932, restart computer, and run that game again, this time the game works.

but still, with the ACP 932, window 98SE (englsih) still refuse any file operations, if the directory/file names contain any either of ALT+0129, ALT+0141, ALT+0142, ALT+0143, ALT+0144, ALT+0157, ALT+0158.

that quite bad, because above character (those ALT-####) were used as either first-byte or second-byte in CJK encodings

for example if want to use character 女 in filenames, windows (98SE) will transformed that character into 2 bytes: ALT+0143 followed with ALT+0151. (therefore such character will occupied 4 bytes in directories' LFN entries)

depending on ACP, the file operations could either failed or succeed.

**jaclaz** · April 26, 2009

Completely unrelated (but not much ) and JFY:

http://www.msfn.org/board/index.php?showtopic=131103

http://www.msfn.org/board/index.php?showto...31103&st=11

jaclaz

**Joseph_sw** · December 27, 2009

i just found another information that might be the true cause why win98SE (english) refuse to do any file operations, with such character.

i found data structures that very similar in those cp_####.nls in KERNEL32.DLL.

that structure is CodePage_to_UTF (100h words lookup) followed by UTF_to_CodePage data (this even using the same 3-layered lookup),

after checking i also found conversion from CodePage to UTF back to CodePage from that data in kernel32.dll, doesn't match for following CodePage indices:

80, 81, 8D, 8E, 8F, 90, 9D, 9E

with exception for 80, the rest were characters that can not be used in filenames! (in ACP 1252)

UTF_to_CodePage data was 2F0 words in size.

i intend to patch that data to prove the suspicion,

however, i not quite sure how to do that other than manual hex editing,

and i still having trouble forming/forging a proper UTF_to_CodePage table that will fit the size restriction.

edit:

i 'll try to fix the 81 & 8F 8E & 9E, i'll inform the result

UPDATE:

that also turn to be failed too, i still can't use that characters in filename.

Edited December 27, 2009 by Joseph_sw

Sign In

NLS .vs. Invalid chars for LFN

Recommended Posts

Joseph_sw

Joseph_sw

Tihiy

Joseph_sw

Chozo4

Drugwash

Joseph_sw

jaclaz

Joseph_sw

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Activity

Browse