Jump to content

NLS .vs. Invalid chars for LFN


Joseph_sw

Recommended Posts

from varios post, it said that meddling with .nls & fiddling with NLS\Codepage\ACP key in registry may allow file with 'invalid character' to be created / accessed.

i tried by changing ACP value from 1252 to 1251, then i tried to create a file with Alt-0129, Alt-0141 to Alt-0144, Alt-1057, Alt-0128 character in it, looks like windows doesn't complain as it usualy do.

since it seems work, i tried to install a Japanse game which normaly refuse the installation as it create folder / filename that contain rejected characters (those Alt-####), it was a success, i can even play it on my english win98SE now.

...

however, if i keep ACP value to 1252, but copied cp_1251.nls into cp_1252.nls instead, (i backup the original cp_1252.nls, of course)

english win98SE, will again refuse such invalid characters in filename.

on flip side, i also tried the other way arround, set ACP to 1251, copy original cp_1252.nls from win98SE cabs into cp_1251.nls in SYSTEM directory,

and as i guessed, i can created such file using that 'invalid characters'.

how come?

there must be something else that dictates which (ASCII)character that considered to be legal as LFN entries...

Edited by Joseph_sw
Link to comment
Share on other sites


it seems, setting ACP to 1251 not really a perfect solution, as with it, character Alt-0152 can't be used in file name (such character can be used while ACP was 1252)

based on very informative post by SlugFiller, i create a simple program to check wether CodePage -> Unicode -> Codepage conversion was correct on cp_1252.nls & cp_1251.nls

the program somehow like this:

u16 cp2u(cp)
{
return codepage_to_unicode_datastruct[cp];
}

u16 u2cp(uni)
{
pass_1st = uni >> 8;
pass_2nd = (uni >> 4) & 0xF;
pass_3rd = uni & 0xF;

pass_2nd += unicode_to_codepage_datastruct[pass_1st];
pass_3rd += unicode_to_codepage_datastruct[pass_2nd];
return unicode_to_codepage_datastruct[pass_3rd];
}

check
{
for (a=0;a++;a<256)
{
u = cp2u(a);
c = u2cp(u);
if (a<>c)
{
// show pop-up & stuff
}
}
}

i runs codepage indices {a} from 0 to 255 throughly to check wether resulted codepage {c} back from converted to unicode {u} was different somewhat,

both cp_1252.nls & cp_1251.nls result said for each of all 256 the value of {c} will equal to {a}.

so, if not from the cp_nnnn.nls files, from where or how did win98SE, declare such characters considered illegal in file name for certain codepage settings?

i want all character alt-0127 to alt-0255 can be used for file name.... (if possible, while keeping ACP as 1252)

Link to comment
Share on other sites

i think there must be more than just .nls files

i try something like this:

  1. Copy cp_1251.nls (size = 6,926 bytes) to cp_1299.nls on %windir%\system directory.
  2. Apply these registry
    REGEDIT4

    [HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Nls\Codepage]
    "1299"="cp_1299.nls"
    "ACP"="1299"

    then restart

  3. try to rename / create filename with alt-0129, alt-0141, alt-0142, alt-0143, alt-0144, alt-0152, alt-0157, alt-0158
    the result only alt-0152 can be used as filename.
  4. now for second part of the test, this time i apply this registry
    [HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Nls\Codepage]
    "ACP"="1251"

    then restart

  5. try to rename / create filename with alt-0129, alt-0141, alt-0142, alt-0143, alt-0144, alt-0152, alt-0157, alt-0158
    the result was DIFFERENT, this time only alt-0152 that can not be used as filename.

despite using binary wise, same .nls files, the result was diferent, depending on ACP values on registry.

Edited by Joseph_sw
Link to comment
Share on other sites

  • 1 month later...

I know it's slightly older of a topic but didn't see the point in making a whole new one as I'd thought this somewhat relevant.

I've tried the methods listed here but couldn't seem to copy certain files in the japanese language character set. Changing the Active Code Page (ACP) to 1251 did not seem to rectify the issue, nor did changing the ACP to 932 (Japanese NLS). However, after a lil fiddling around - I'd noticed the key 'OEMCP' and matched the ACP to that (CP_432.nls).

Needless to say, I was then able to copy over these files using the japanese subset even though most of the characters were listed by explorer as other random characters. The only side effect I'd noticed offhand were those items named using the ALT+0160 character (for 'label-less' icons) had their name changed to an 'a' or were simply accessable. Changing the names using the old character set rectified the issue. Then again, alt+0160 shouldn't have been used anyway as some disk scanners do not like that due to lack of dos accessibility. Better solution for this would be to make real label-less icons using the CLSID and Desktop Namespace sections of the registry.

Link to comment
Share on other sites

i'm just adding information, that i found recently.

it seems the ACP value also dictate how windows handle strings.

i have installed some japanese games, with normal single byte ACP (1252, 1252, 437) the game will complain that theres syntax error on its internal scripts.

suspiciously, the said part of that script were in shift-JIS ...

to confirm my suspicion, i change the ACP value into 932, restart computer, and run that game again, this time the game works.

but still, with the ACP 932, window 98SE (englsih) still refuse any file operations, if the directory/file names contain any either of ALT+0129, ALT+0141, ALT+0142, ALT+0143, ALT+0144, ALT+0157, ALT+0158.

that quite bad, because above character (those ALT-####) were used as either first-byte or second-byte in CJK encodings

for example if want to use character in filenames, windows (98SE) will transformed that character into 2 bytes: ALT+0143 followed with ALT+0151. (therefore such character will occupied 4 bytes in directories' LFN entries)

depending on ACP, the file operations could either failed or succeed.

Link to comment
Share on other sites

  • 8 months later...

i just found another information that might be the true cause why win98SE (english) refuse to do any file operations, with such character.

i found data structures that very similar in those cp_####.nls in KERNEL32.DLL.

that structure is CodePage_to_UTF (100h words lookup) followed by UTF_to_CodePage data (this even using the same 3-layered lookup),

after checking i also found conversion from CodePage to UTF back to CodePage from that data in kernel32.dll, doesn't match for following CodePage indices:

80, 81, 8D, 8E, 8F, 90, 9D, 9E

with exception for 80, the rest were characters that can not be used in filenames! (in ACP 1252)

UTF_to_CodePage data was 2F0 words in size.

i intend to patch that data to prove the suspicion,

however, i not quite sure how to do that other than manual hex editing,

and i still having trouble forming/forging a proper UTF_to_CodePage table that will fit the size restriction.

edit:

i 'll try to fix the 81 & 8F 8E & 9E, i'll inform the result

UPDATE:

that also turn to be failed too, i still can't use that characters in filename.

Edited by Joseph_sw
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...