UTF-16

UTF-16
The first 216 Unicode code points. The white stripe near the bottom are the surrogate halves used by UTF-16.
Language(s)International
StandardUnicode Standard
ClassificationUnicode Transformation Format, variable-width encoding
ExtendsUCS-2
Transforms / EncodesISO/IEC 10646 (Unicode)

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding now known as UCS-2 (for 2-byte Universal Character Set),[1][2] once it became clear that more than 216 (65,536) code points were needed,[3] including most emoji and important CJK characters such as for personal and place names.[4]

UTF-16 is used by systems such as the Microsoft Windows API, the Java programming language and JavaScript/ECMAScript. It is also sometimes used for plain text and word-processing data files on Microsoft Windows. It is used by SMS (the SMS standard specifies UCS-2, but almost all users actually implement UTF-16 so that emojis work).[citation needed]

UTF-16 is the only encoding (still) allowed on the web that is incompatible with ASCII[5][nb 1] and never gained popularity on the web, where it is declared by under 0.004% of web pages[7] (and many of these are actually UTF-8 but incorrectly marked[citation needed]). UTF-8, by comparison, accounts for over 98% of all web pages.[8] The Web Hypertext Application Technology Working Group (WHATWG) considers UTF-8 "the mandatory encoding for all [text]" and that for security reasons browser applications should not use UTF-16.[9]

  1. ^ "C.2 Encoding Forms in ISO/IEC 10646" (PDF). The Unicode Standard, version 6.0. Mountain View, CA: Unicode Consortium. February 2011. p. 573. ISBN 978-1-936213-01-6. [...] the term UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard.
  2. ^ "FAQ: What is the difference between UCS-2 and UTF-16?". unicode.org. Archived from the original on 2003-08-18. Retrieved 2024-03-19. UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1 [...]
  3. ^ "What is UTF-16?". The Unicode Consortium. Unicode, Inc. Retrieved 7 January 2023. UTF-16 uses a single 16-bit code unit to encode over 60,000 of the most common characters in Unicode
  4. ^ Lunde, Ken (2022-01-09). "2022 Top Ten List: Why Support Beyond-BMP Code Points?". Medium. Retrieved 2024-01-07. I first came up with the idea for this Top Ten List over 10 years ago, which was prompted by some environments that still supported only BMP code points. The idea, of course, was to motivate the developers of such environments to support code points beyond the BMP by providing an enumerated list of reasons to do so. And yes, there are still some environments that support only BMP code points, such as the VivaDesigner app.
  5. ^ "HTML Living Standard". w3.org. 2020-06-10. Archived from the original on 2020-09-08. Retrieved 2020-06-15. UTF-16 encodings are the only encodings that this specification needs to treat as not being ASCII-compatible encodings.
  6. ^ "Encoding Standard". encoding.spec.whatwg.org. Retrieved 2023-04-22.
  7. ^ "Usage Statistics of UTF-16 for Websites, January 2024". w3techs.com. Retrieved 2024-01-07.
  8. ^ "Usage Statistics of UTF-8 for Websites, December 2023". w3techs.com. Retrieved 2023-12-01.
  9. ^ "Encoding Standard". encoding.spec.whatwg.org. Retrieved 2018-10-22. The UTF-8 encoding is the most appropriate encoding for interchange of Unicode, the universal coded character set. Therefore for new protocols and formats, as well as existing formats deployed in new contexts, this specification requires (and defines) the UTF-8 encoding. [..] The problems outlined here go away when exclusively using UTF-8, which is one of the many reasons that UTF-8 is now the mandatory encoding for all text things on the Web.


Cite error: There are <ref group=nb> tags on this page, but the references will not show without a {{reflist|group=nb}} template (see the help page).


© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search