Viewing unicode characters

Message

joby_toss · #1 Post by **joby_toss** » Tue Dec 22, 2015 11:42 am

[Moderator note: this thread was split from the Universal Extractor Update (Gora version) thread.]

---

I see the archive password in ANSI characters and can't extract this one...

billon · #2 Post by **billon** » Tue Dec 22, 2015 1:07 pm

@joby_toss:

LonerD - павлуша дергунов - ВОР.

joby_toss · #3 Post by **joby_toss** » Tue Dec 22, 2015 1:33 pm

Thank you very much!

OK, so the encoding used was Cyrillic (Windows-1251), but how was I supposed to know that? What should I have done to get it to display right? Is there any tool to identify the correct encoding of a text file? I'm asking because this is a problem I'm hoping not to have in the future.

Userfriendly · #4 Post by **Userfriendly** » Tue Dec 22, 2015 3:46 pm

Open foreign language text files in notepad++. By default, it auto-detects the correct character encoding. If not, the option is in under misc.

joby_toss · #5 Post by **joby_toss** » Tue Dec 22, 2015 11:56 pm

You're absolutely right! Notepad++ is able to display the text correctly, using the default settings. Thank you!

I'm quite frustrated with my beloved Notepad2 at the moment. I'm hoping that my settings are at fault here and not the app itself...

OK, I'll stop here with the off-topic, sorry about that!

Nh · #6 Post by Nh » Wed Dec 23, 2015 1:54 am

MS Word can be used as well to detect correct encoding.

deathcubek · #7 Post by **deathcubek** » Wed Dec 23, 2015 7:23 am

joby_toss wrote:OK, so the encoding used was Cyrillic (Windows-1251), but how was I supposed to know that?

You can't! That's the dilemma with the whole messed up "codepage" system

Without additional signaling - and you clearly do not have that with plain text files - the receiver simply can not know which codepage the sender was using/intending.

The best you can do (and that's what Notepad++ probably does) is use some kind of heuristic to determine which codepage is probably right. But that's far from being a good/reliable solution.

And, regardless of how the codepage is signaled (or detected), you won't be able to "mix" characters from different codepages.

That's why smart people have invented Unicode about 20+ years ago - one character set that contains all characters mankind has ever invented. That plus well-defined and unambiguous encoding rules.

So, there's really no reason not to use Unicode, preferably with UTF-8 encoding, nowadays. Also, UTF-8 is straight forward to detect, provided the sender has included a proper BOM

The Portable Freeware Collection Forums

Viewing unicode characters

Viewing unicode characters

Re: Universal Extractor Update (Gora version)

Re: Universal Extractor Update (Gora version)

Re: Universal Extractor Update (Gora version)

Re: Universal Extractor Update (Gora version)

Re: Universal Extractor Update (Gora version)

Re: Universal Extractor Update (Gora version)