Viewing unicode characters

Discuss anything related to portable freeware here.
Post Reply
Message
Author
User avatar
joby_toss
Posts: 2902
Joined: Sat Feb 09, 2008 9:57 am
Location: Romania
Contact:

Viewing unicode characters

#1 Post by joby_toss » Tue Dec 22, 2015 11:42 am

[Moderator note: this thread was split from the Universal Extractor Update (Gora version) thread.]

---

I see the archive password in ANSI characters and can't extract this one...

Image

billon
Posts: 785
Joined: Sat Jun 23, 2012 4:28 pm

Re: Universal Extractor Update (Gora version)

#2 Post by billon » Tue Dec 22, 2015 1:07 pm

@joby_toss:

LonerD - павлуша дергунов - ВОР.

User avatar
joby_toss
Posts: 2902
Joined: Sat Feb 09, 2008 9:57 am
Location: Romania
Contact:

Re: Universal Extractor Update (Gora version)

#3 Post by joby_toss » Tue Dec 22, 2015 1:33 pm

Thank you very much!

OK, so the encoding used was Cyrillic (Windows-1251), but how was I supposed to know that? What should I have done to get it to display right? Is there any tool to identify the correct encoding of a text file? I'm asking because this is a problem I'm hoping not to have in the future.

Image

User avatar
Userfriendly
Posts: 415
Joined: Tue Nov 27, 2012 11:41 pm

Re: Universal Extractor Update (Gora version)

#4 Post by Userfriendly » Tue Dec 22, 2015 3:46 pm

Open foreign language text files in notepad++. By default, it auto-detects the correct character encoding. If not, the option is in under misc.

User avatar
joby_toss
Posts: 2902
Joined: Sat Feb 09, 2008 9:57 am
Location: Romania
Contact:

Re: Universal Extractor Update (Gora version)

#5 Post by joby_toss » Tue Dec 22, 2015 11:56 pm

You're absolutely right! Notepad++ is able to display the text correctly, using the default settings. Thank you!

Image

I'm quite frustrated with my beloved Notepad2 at the moment. I'm hoping that my settings are at fault here and not the app itself... :(

OK, I'll stop here with the off-topic, sorry about that!

User avatar
Nh
Posts: 34
Joined: Tue Jan 22, 2008 5:01 am
Location: Georgia

Re: Universal Extractor Update (Gora version)

#6 Post by Nh » Wed Dec 23, 2015 1:54 am

MS Word can be used as well to detect correct encoding.

User avatar
deathcubek
Posts: 205
Joined: Thu Jul 14, 2011 9:42 am
Location: Island of Lost Minds

Re: Universal Extractor Update (Gora version)

#7 Post by deathcubek » Wed Dec 23, 2015 7:23 am

joby_toss wrote:OK, so the encoding used was Cyrillic (Windows-1251), but how was I supposed to know that?
You can't! That's the dilemma with the whole messed up "codepage" system :roll:

Without additional signaling - and you clearly do not have that with plain text files - the receiver simply can not know which codepage the sender was using/intending.

The best you can do (and that's what Notepad++ probably does) is use some kind of heuristic to determine which codepage is probably right. But that's far from being a good/reliable solution.

And, regardless of how the codepage is signaled (or detected), you won't be able to "mix" characters from different codepages.

That's why smart people have invented Unicode about 20+ years ago - one character set that contains all characters mankind has ever invented. That plus well-defined and unambiguous encoding rules.

So, there's really no reason not to use Unicode, preferably with UTF-8 encoding, nowadays. Also, UTF-8 is straight forward to detect, provided the sender has included a proper BOM :wink:
„One of my most productive days was throwing away 1,000 lines of code“ – Ken Thompson

Dreamatorium | In Search Of The Disembodied Sounds | Best Regards!

Post Reply