From mmixal.w (or really, the generated mmixal.tex) in http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz): “Symbols are stored and retrieved by means of a ternary search trie, following ideas of Bentley and Sedgewick. (See ACM–SIAM Symp. on Discrete Algorithms 8 (1997), 360–369; R.Sedgewick, Algorithms in C (Reading, Mass. Addison–Wesley, 1998), 15.4.) Each trie node stores a character, and there are branches to subtries for the cases where a given character is less than, equal to, or greater than the character in the trie. There also is a pointer to a symbol table entry if a symbol ends at the current node.”
So it's a tree encoded as a stream of bytes. The stream of bytes acts on a single virtual global symbol, adding and removing characters and signalling complete symbol points. Here, we read the stream and create symbols at the completion points.
First, there's a control byte m
. If any of the listed bits
in m
is nonzero, we execute what stands at the right, in
the listed order:
(MMO3_LEFT)
0x40 - Traverse left trie.
(Read a new command byte and recurse.)
(MMO3_SYMBITS)
0x2f - Read the next byte as a character and store it in the
current character position; increment character position.
Test the bits of m
:
(MMO3_WCHAR)
0x80 - The character is 16-bit (so read another byte,
merge into current character.
(MMO3_TYPEBITS)
0xf - We have a complete symbol; parse the type, value
and serial number and do what should be done
with a symbol. The type and length information
is in j = (m & 0xf).
(MMO3_REGQUAL_BITS)
j == 0xf: A register variable. The following
byte tells which register.
j <= 8: An absolute symbol. Read j bytes as the
big-endian number the symbol equals.
A j = 2 with two zero bytes denotes an
unknown symbol.
j > 8: As with j <= 8, but add (0x20 << 56)
to the value in the following j - 8
bytes.
Then comes the serial number, as a variant of
uleb128, but better named ubeb128:
Read bytes and shift the previous value left 7
(multiply by 128). Add in the new byte, repeat
until a byte has bit 7 set. The serial number
is the computed value minus 128.
(MMO3_MIDDLE)
0x20 - Traverse middle trie. (Read a new command byte
and recurse.) Decrement character position.
(MMO3_RIGHT)
0x10 - Traverse right trie. (Read a new command byte and
recurse.)
Let's look again at the lop_stab
for the trivial file
(see File layout).
0x980b0000 - lop_stab for ":Main" = 0, serial 1. 0x203a4040 0x10404020 0x4d206120 0x69016e00 0x81000000
This forms the trivial trie (note that the path between “:” and “M” is redundant):
203a ":" 40 / 40 / 10 \ 40 / 40 / 204d "M" 2061 "a" 2069 "i" 016e "n" is the last character in a full symbol, and with a value represented in one byte. 00 The value is 0. 81 The serial number is 1.