In: Computer Science
What is the relationship between the ASCII Latin-1 character set
and its Unicode
equivalent that makes conversion between the two simple?
The relationship between the American Standard Code for Information Interchange (ASCII) Latin-1 character set, and its Unicode equivalent what makes conversion between the two simple is:
Encodings need not have to handle every possible Unicode character, and in fact, most encodings do not. The rules for converting a Unicode string into the ASCII encoding are simple.
In character data, Unicode is a two-byte extension of the one-byte ISO Latin-1 character set, which in turn is an eight-bit superset of the seven-bit ASCII character set.
Latin-1 character set or encodings is also called as ISO-8859-1 or 8859, which is a similar encoding. Unicode code points ranging from 0 to 255 are identical to the Latin-1 values. Hence, converting to this encoding is simple due to the fact, it requires converting code points to byte values. In case, a code point is larger than 255 that is encountered, the string cannot be and is not encoded into Latin-1. Here, encodings are simple as it is a one-to-one mapping in Latin-1.
The first 256 characters of Unicode- the characters whose high-order byte is zero are identical to the characters of the ISO Latin-1 character set. Thus, 65 is ASCII A and Unicode A, 66 is ASCII B and Unicode B, and it goes on.