In other words, if one were to print out the compressed data as
a sequence of bytes, starting with the first byte at the
*right* margin and proceeding to the *left*, with the most-
significant bit of each byte on the left as usual, one would be
able to parse the result from right to left, with fixed-width
elements in the correct MSB-to-LSB order and Huffman codes in
bit-reversed order (i.e., with the first bit of the code in the
relative LSB position).
3.2. Compressed block format
3.2.1. Synopsis of prefix and Huffman coding
Prefix coding represents symbols from an a priori known
alphabet by bit sequences (codes), one code for each symbol, in
a manner such that different symbols may be represented by bit
sequences of different lengths, but a parser can always parse
an encoded string unambiguously symbol-by-symbol.
We define a prefix code in terms of a binary tree in which the
two edges descending from each non-leaf node are labeled 0 and
1 and in which the leaf nodes correspond one-for-one with (are
labeled with) the symbols of the alphabet; then the code for a
symbol is the sequence of 0's and 1's on the edges leading from
the root to the leaf labeled with that symbol. For example:
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
/\ Symbol Code
0 1 ------ ----
/ \ A 00
/\ B B 1
0 1 C 011
/ \ D 010
A /\
0 1
/ \
D C
A parser can decode the next symbol from an encoded input
stream by walking down the tree from the root, at each step
choosing the edge corresponding to the next input bit.
Given an alphabet with known symbol frequencies, the Huffman
algorithm allows the construction of an optimal prefix code
(one which represents strings with those symbol frequencies
using the fewest bits of any possible prefix codes for that
alphabet). Such a code is called a Huffman code. (See
reference [1] in Chapter 5, references for additional
information on Huffman codes.)
Note that in the "deflate" format, the Huffman codes for the
various alphabets must not exceed certain maximum code lengths.
This constraint complicates the algorithm for computing code
lengths from symbol frequencies. Again, see Chapter 5,
references for details.
3.2.2. Use of Huffman coding in the "deflate" format
The Huffman codes used for each alphabet in the "deflate"
format have two additional rules:
* All codes of a given bit length have lexicographically
consecutive values, in the same order as the symbols
they represent;
* Shorter codes lexicographically precede longer codes.
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
We could recode the example above to follow this rule as
follows, assuming that the order of the alphabet is ABCD:
=4= |