Java and Unicode: Difference between revisions
Line 21: | Line 21: | ||
Java platform uses the [[Character_Encoding#UTF-16|UTF-16]] representation in char arrays and in the String, StringBuffer and StringBuilder classes. | Java platform uses the [[Character_Encoding#UTF-16|UTF-16]] representation in char arrays and in the String, StringBuffer and StringBuilder classes. | ||
The [[Character_Encoding#Basic_Multilingual_Plane_.28BMP.29|Basic Multilingual Plane characters]] are represented as <tt>char</tt> instances, | The [[Character_Encoding#Basic_Multilingual_Plane_.28BMP.29|Basic Multilingual Plane characters]] are represented as <tt>char</tt> instances, as the <tt>char</tt> data type provides sufficient storage capacity for the entire BMP range. | ||
The supplementary characters are represented as a pair of <tt>char</tt> values. [[Java#Java_5|Java 5]], which supports Unicode 4.0, introduced enhancements to correctly handle Unicode supplementary characters. |
Revision as of 19:03, 26 June 2018
External
- https://docs.oracle.com/javase/10/docs/api/java/lang/Character.html
- http://www.oracle.com/us/technologies/java/supplementary-142654.html
Internal
Overview
Character information is maintained in Java by the primitive type char, which was designed based on the original Unicode 1.0 specification that allowed only 216 code points, so it was defined as a fixed-with 16-bit/2-byte entity. Since then, the Unicode standard has evolved to allow for characters whose representation requires more than 16 bits. For details on how characters are represented internally by Java, see the "Character Representation" section.
U+n Notation Support
U+n notation is supported in Java as follows:
Character Representation
Java platform uses the UTF-16 representation in char arrays and in the String, StringBuffer and StringBuilder classes.
The Basic Multilingual Plane characters are represented as char instances, as the char data type provides sufficient storage capacity for the entire BMP range.
The supplementary characters are represented as a pair of char values. Java 5, which supports Unicode 4.0, introduced enhancements to correctly handle Unicode supplementary characters.