Jump to content

Bizarre Language Encoding Problem


Cyker

Recommended Posts

Hi all,

As per the title, have a bizarre problem with a Word document.

The images explain the problem better than I can, but in short we have a .DOC which works just fine on the original author's computer but when loaded into anyone elses (I've tried mine, my place of work, friends) it replaces most of the characters with the equivalent letter from the greek alphabet, and if I try to open it in Open/Libre Office I just get the Unknown Character Squares of Doom.

The original author has tried saving it as an RTF and we *still* get the same problem - Oddly I can open the RTF in Notepad and find the original english text, but when loaded into Word or LibreOffice I get the greek characters or squares!

It's seems like some kind of freaky character set encoding issue, but I don't know how to fix it. I've tried a few things:

  • Setting the Language from Tools -> Language -> Set Language does nothing at all,
  • Setting the Confirm Encoding option in Tools -> Options -> General doesn't do anything for the .DOC and just spits out the raw text of the .RTF instead of the formatted text.
  • Changing fonts in LibreOffice does nothing, and in Word it changes all the text from the (literally!) greek'd characters to Unknown Character Squares of Doom!

Seriously, does anyone have any ideas?!

I've never come across such weird problems since my Windows 3.1 days (Where you HAD to specify character sets and encoding for non-english text!)

I bet it's something stupidly trivial too...!

post-296102-0-38284900-1309791520_thumb.

post-296102-0-46728500-1309791530_thumb.

post-296102-0-00197800-1309791536_thumb.

Edited by Cyker
Link to comment
Share on other sites


Maybe it's missing font.

For Office 2003: (menu)Tools>(item)Options>(tab)Compatibility>(button)Font Substitution. It will tell you what font your friend has, and you don't.

Word sometimes makes stupid choices for font substitution....

GL

Link to comment
Share on other sites

I had tried changing the font, even to Arial Unicode MS (which usually covers everything!), but to no avail - All have the Unknown Character Squares of Doom.

I did what you said anyway, and curiously it says the missing font is "SanSerif" ?!?

I don't know if that is an actual font or just a generic name for an unknown sans-serif font; I've not seen a font literally called that before...

I did find that the default substitution is to "Symbol", which is why the characters are appearing as greek alphabet instead of latin, but it's bizarre that that is the only font where any characters are being displayed; All others just display the square boxes!

Another interesting thing is any font which has a 'script' of Symbol (As opposed to OEM, Western, Cyrillic etc.; Word >97 won't let you pick that any more, but you can see it in Notepad or Wordpad when you select font preferences) WILL display some characters - For instance I have a bitmap font called WST_ENGL which *does* display the text correctly, but as soon as I switch away it's back to the boxes.

I have also found I can load it into Wordpad, change the font for Arial or something and the text is now readable! :D

However that buggers up all the formatting and pictures and I'm still no closer to understanding why I can't open the document in Word 2003 or LibreOffice...!

Link to comment
Share on other sites

Seems like a weird font your friend uses. The question is, how did he enter the chars through the keyboard so they ended in the symbol range... Or did Windows/Office do the translation... Oh, well.

You can confirm they are in that range through the "Insert Symbol" map in Word or the Character Map. For an individual character, you can see its ASCII code if you press Alt-X in the main area of Word (where you type).

GL

Edited by GrofLuigi
Link to comment
Share on other sites

Neat, didn't know about the Alt+X thing :)

Right, we're getting somewhere - The characters in the doc are coming back with F-codes, e.g. the first two chars are T and O, which should be 0054 and 004F respectively, but in the wrong document they are F054 and F04F.

If this was Linux I'd say it feels like a Unicode locale problem, but I don't know how this could come to be or how to fix it in Windows/Word! :wacko:

( Thanks for your help so far! :thumbup )

Edited by Cyker
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...