[Moderator note: this thread was split from the Universal Extractor Update (Gora version) thread.]
---
I see the archive password in ANSI characters and can't extract this one...
Viewing unicode characters
Re: Universal Extractor Update (Gora version)
@joby_toss:
LonerD - павлуша дергунов - ВОР.
LonerD - павлуша дергунов - ВОР.
Re: Universal Extractor Update (Gora version)
Thank you very much!
OK, so the encoding used was Cyrillic (Windows-1251), but how was I supposed to know that? What should I have done to get it to display right? Is there any tool to identify the correct encoding of a text file? I'm asking because this is a problem I'm hoping not to have in the future.
OK, so the encoding used was Cyrillic (Windows-1251), but how was I supposed to know that? What should I have done to get it to display right? Is there any tool to identify the correct encoding of a text file? I'm asking because this is a problem I'm hoping not to have in the future.
- Userfriendly
- Posts: 430
- Joined: Tue Nov 27, 2012 11:41 pm
Re: Universal Extractor Update (Gora version)
Open foreign language text files in notepad++. By default, it auto-detects the correct character encoding. If not, the option is in under misc.
Re: Universal Extractor Update (Gora version)
You're absolutely right! Notepad++ is able to display the text correctly, using the default settings. Thank you!
I'm quite frustrated with my beloved Notepad2 at the moment. I'm hoping that my settings are at fault here and not the app itself...
OK, I'll stop here with the off-topic, sorry about that!
I'm quite frustrated with my beloved Notepad2 at the moment. I'm hoping that my settings are at fault here and not the app itself...
OK, I'll stop here with the off-topic, sorry about that!
Re: Universal Extractor Update (Gora version)
MS Word can be used as well to detect correct encoding.
- deathcubek
- Posts: 221
- Joined: Thu Jul 14, 2011 9:42 am
- Location: Island of Lost Minds
Re: Universal Extractor Update (Gora version)
You can't! That's the dilemma with the whole messed up "codepage" systemjoby_toss wrote:OK, so the encoding used was Cyrillic (Windows-1251), but how was I supposed to know that?
Without additional signaling - and you clearly do not have that with plain text files - the receiver simply can not know which codepage the sender was using/intending.
The best you can do (and that's what Notepad++ probably does) is use some kind of heuristic to determine which codepage is probably right. But that's far from being a good/reliable solution.
And, regardless of how the codepage is signaled (or detected), you won't be able to "mix" characters from different codepages.
That's why smart people have invented Unicode about 20+ years ago - one character set that contains all characters mankind has ever invented. That plus well-defined and unambiguous encoding rules.
So, there's really no reason not to use Unicode, preferably with UTF-8 encoding, nowadays. Also, UTF-8 is straight forward to detect, provided the sender has included a proper BOM