Binary Ordered Compression for Unicode

Binary Ordered Compression for Unicode (BOCU) is a MIME compatible Unicode compression scheme. BOCU-1 combines the wide applicability of UTF-8 with the compactness of Standard Compression Scheme for Unicode (SCSU). This Unicode encoding is designed to be useful for compressing short strings, and maintains code point order. BOCU-1 is specified in a Unicode Technical Note.[1]

For comparison SCSU was adopted as standard Unicode compression scheme with a byte/code point ratio similar to language-specific code pages. SCSU has not been widely adopted, as it is not suitable for MIME "text" media types. For example, SCSU cannot be used directly in emails and similar protocols. SCSU requires a complicated encoder design for good performance. Usually, the zip, bzip2, and other industry standard algorithms compact larger amounts of Unicode text more efficiently.[2]

Both SCSU[3] and BOCU-1[4] are IANA registered charsets.

  1. ^ Markus Scherer, Mark Davis (2006-02-04). "UTN #6: BOCU-1". Retrieved 2008-05-18.
  2. ^ Ewell, Doug (2004-01-30). "UTN #14: A survey of Unicode compression" (PDF). Retrieved 2008-06-13.
  3. ^ IANA registration record for SCSU
  4. ^ IANA registration record for BOCU-1

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search