HN2new | past | comments | ask | show | jobs | submitlogin

Latin-1 is region specifc. Windows Code Pages and ISO-8859 (depending on the era) would be better terms to use in your example.


What do you mean it's region specific? I don't think that is right. Here is the list of Latin-1 characters (as part of Unicode):

https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_bl...

There's nothing region specific there.

Also ISO-8859-1 is Latin-1:

> ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script.

Now there are a few code points in ISO 8859-1 that are undefined, whereas they are defined in Unicode Latin-1, and Windows-1252, but they're mostly the same. The major difference is € and the TM symbol.


> What do you mean it's region specific?

Exactly as it sounds. The characters encoded in Latin-1 are specific for Western Europe and thus may not appear in other ISO-8859 character sets.

> I don't think that is right. Here is the list of Latin-1 characters (as part of Unicode)...There's nothing region specific there.

Unicode is a different set of character sets (note: Unicode isn't even 1 specific character set!) yet again. Latin-1 is not unicode. In fact the point of Unicode was to address the problems that arose with region specific character sets like Latin-1. Hence why there's Latin-1 characters included in Unicode as well as characters from of locales. What you're referencing is the Latin-1 block within the UTF-8 character set.

> Also ISO-8859-1 is Latin-1

It is. But I was referencing ISO-8859 (without the -1) which covers Latin-1 as well as a bunch of other locales.

> Now there are a few code points in ISO 8859-1 that are undefined, whereas they are defined in Unicode Latin-1, and Windows-1252, but they're mostly the same. The major difference is € and the TM symbol.

You're drifting all over the place there:

1. there's no such thing as "Unicode Latin-1". They're different character sets albeit Unicode will have a Latin-1 block (much like Latin-1 has an ASCII block).

2. With regards to your point about the € and TM differences: that is precisely the reason I suggested using ISO-8859 (without the -1) as a reference rather than a region specific character set.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: