Celebrazio Net



Contact Us

Canonical Names of CharSets Chart

January, 2006

Ever get your Character Set tags and labels mixed up when switching between syntaxes in a web or XML production system? Refer to the handy chart on this page and keep yourself in order, keep your users happy. What perl knows as one name, Java may recognize differently, and XML (or HTML) may know still a different name for the same thing.

Java Perl XML HTML
ISO8859_1 iso-8859-1 ISO-8859-1 ISO-8859-1
ASCII ascii US-ASCII US-ASCII
UTF8 utf8 UTF-8 UTF-8
UTF-16 UTF-16 UTF-16 UTF-16
EUC_JP euc-jp EUC-JP EUC-JP
SJIS shiftjis Shift_JIS Shift_JIS
GBK cp936 GBK GBK
EUC_KR euc-kr EUC-KR EUC-KR
Big5 big5-eten Big5 Big5
Big5_HKSCS big5-hkscs Big5-HKSCS Big5-HKSCS
EUC_CN euc-cn GB2312 GB2312

References:
HTML and XML: IANA MIME types list

Java: Java Encodings Doc

Perl: easily found using Encode::resolve_alias($alias);

Perl internationalization requires 5.6.*, but 5.8 is highly recommended. Perl does automatic translation of an identified encoding alias to the canonical. Requesting encoding "sjis" in perl will automatically use "shiftjis". It's very flexible. Java, on the other hand, is less flexible: exact specification of target encoding is recommended. HTML and XML are flexible in the use of upper and lowercase letters, but still the exact canonical syntax shown here is required.
Use the HTML and XML syntax shown here in your page CharSet META HTTP-EQUIV declaration. You may view the source of this page for an example.





1998-2017 Celebrazio.net