Arithmetic coding

Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic encoding, frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits, resulting in fewer bits used in total. Arithmetic coding differs from other forms of entropy encoding, such as Huffman coding, in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, an arbitrary-precision fraction q, where 0.0 ≤ q < 1.0. It represents the current information as a range, defined by two numbers.[1] A recent family of entropy coders called asymmetric numeral systems allows for faster implementations thanks to directly operating on a single natural number representing the current information.[2]

An arithmetic coding example assuming a fixed probability distribution of three symbols "A", "B", and "C". Probability of "A" is 50%, probability of "B" is 33% and probability of "C" is 17%. Furthermore, we assume that the recursion depth is known in each step. In step one we code "B" which is inside the interval [0.5, 0.83): The binary number "0.10x" is the shortest code that represents an interval that is entirely inside [0.5, 0.83). "x" means an arbitrary bit sequence. There are two extreme cases: the smallest x stands for zero which represents the left side of the represented interval. Then the left side of the interval is dec(0.10) = 0.5. At the other extreme, x stands for a finite sequence of ones which has the upper limit dec(0.11) = 0.75. Therefore, "0.10x" represents the interval [0.5, 0.75) which is inside [0.5, 0.83). Now we can leave out the "0." part since all intervals begin with "0." and we can ignore the "x" part because no matter what bit-sequence it represents, we will stay inside [0.5, 0.75).
  1. ^ Ze-Nian Li; Mark S. Drew; Jiangchuan Liu (9 April 2014). Fundamentals of Multimedia. Springer Science & Business Media. ISBN 978-3-319-05290-8.
  2. ^ J. Duda, K. Tahboub, N. J. Gadil, E. J. Delp, The use of asymmetric numeral systems as an accurate replacement for Huffman coding, Picture Coding Symposium, 2015.

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search