Viewing unicode characters

Discuss anything related to portable freeware here.
Post Reply
Message
Author
User avatar
joby_toss
Posts: 2971
Joined: Sat Feb 09, 2008 9:57 am
Location: Romania
Contact:

Viewing unicode characters

#1 Post by joby_toss »

[Moderator note: this thread was split from the Universal Extractor Update (Gora version) thread.]

---

I see the archive password in ANSI characters and can't extract this one...

Image

billon
Posts: 843
Joined: Sat Jun 23, 2012 4:28 pm

Re: Universal Extractor Update (Gora version)

#2 Post by billon »

@joby_toss:

LonerD - павлуша дергунов - ВОР.

User avatar
joby_toss
Posts: 2971
Joined: Sat Feb 09, 2008 9:57 am
Location: Romania
Contact:

Re: Universal Extractor Update (Gora version)

#3 Post by joby_toss »

Thank you very much!

OK, so the encoding used was Cyrillic (Windows-1251), but how was I supposed to know that? What should I have done to get it to display right? Is there any tool to identify the correct encoding of a text file? I'm asking because this is a problem I'm hoping not to have in the future.

Image

User avatar
Userfriendly
Posts: 430
Joined: Tue Nov 27, 2012 11:41 pm

Re: Universal Extractor Update (Gora version)

#4 Post by Userfriendly »

Open foreign language text files in notepad++. By default, it auto-detects the correct character encoding. If not, the option is in under misc.

User avatar
joby_toss
Posts: 2971
Joined: Sat Feb 09, 2008 9:57 am
Location: Romania
Contact:

Re: Universal Extractor Update (Gora version)

#5 Post by joby_toss »

You're absolutely right! Notepad++ is able to display the text correctly, using the default settings. Thank you!

Image

I'm quite frustrated with my beloved Notepad2 at the moment. I'm hoping that my settings are at fault here and not the app itself... :(

OK, I'll stop here with the off-topic, sorry about that!

User avatar
Nh
Posts: 35
Joined: Tue Jan 22, 2008 5:01 am
Location: Georgia

Re: Universal Extractor Update (Gora version)

#6 Post by Nh »

MS Word can be used as well to detect correct encoding.

User avatar
deathcubek
Posts: 221
Joined: Thu Jul 14, 2011 9:42 am
Location: Island of Lost Minds

Re: Universal Extractor Update (Gora version)

#7 Post by deathcubek »

joby_toss wrote:OK, so the encoding used was Cyrillic (Windows-1251), but how was I supposed to know that?
You can't! That's the dilemma with the whole messed up "codepage" system :roll:

Without additional signaling - and you clearly do not have that with plain text files - the receiver simply can not know which codepage the sender was using/intending.

The best you can do (and that's what Notepad++ probably does) is use some kind of heuristic to determine which codepage is probably right. But that's far from being a good/reliable solution.

And, regardless of how the codepage is signaled (or detected), you won't be able to "mix" characters from different codepages.

That's why smart people have invented Unicode about 20+ years ago - one character set that contains all characters mankind has ever invented. That plus well-defined and unambiguous encoding rules.

So, there's really no reason not to use Unicode, preferably with UTF-8 encoding, nowadays. Also, UTF-8 is straight forward to detect, provided the sender has included a proper BOM :wink:

Post Reply