Level 0: Tree Structure
Lowest protocol's level defines basic tree structure using two primitive types.
- Binary blob (sequence of bytes)
- Non-negative integer number with unlimited dynamic length
UBNumber Encoding
UBNumber is encoding used for representation of single instance from well-sorted countable infinite set. Value is stored as one or more bytes (similar to UTF-8 encoding).
First, non-zero bits are counted for length and then rest of bits is used as value while value is also incremented so that there is only one code for each number.
Most native encoding using UBNumber is UBNatural for representation of a natural non-negative integer number.
Examples of the UBNatural codes (sequence of bits = represented value):
00000000 = 0 00000001 = 1 00000010 = 2 00000011 = 3 ... 01111111 = 7Fh = 127 10000000 00000000 = 80h = 128 10000000 00000001 = 81h = 129 ... 10111111 11111111 = 407Fh = 16511 11000000 00000000 00000000 = 4080h = 16512 ...
Various interpretations can be mapped on UBNumber encoding. For this level there are defined two:
- UBNatural encoding using directly value from UBNumber
- UBENatural where value 7Fh is reserved for infinity constant and higher values are shifted by one
Document
Document starts with 6 bytes long blob called “Document Header” followed by a single block called “Root Block” and optional blob called “Extended Area”.
Header for current version of protocol is (hexadecimal):
FE 00 58 42 00 02
Block
Each block starts with single value:
UBNatural attributePartSize
If attributePartSize = 0 then this block is called “Terminator” and block ends. Otherwise it is followed by value:
UBENatural dataPartSize
If attributePartSize = count of bytes used by dataPartSize then this block is called “Data Block” and binary blob follows which has length in bytes specified by dataPartSize value and block ends.
If attributePartSize > count of bytes used by dataPartSize then this block is called “Node Block” and a sequence of attributes follows until sum of count of bytes used by attributes = attributePartSize - count of bytes used by dataPartSize.
UBNumber attribute
After attributes, sequence of blocks follows until sum of block sizes = dataPartSize and block ends.
If dataPartSize = infinity for data block then binary blob is ended by a sequence of two zero bytes. A sequence of two bytes where first is zero followed by a non zero byte is considered a sequence of nonstoping zero bytes. The non zero byte defines the length of the sequence.
If dataPartSize = infinity for node block then sequence of data blocks is ended by the terminator.
See following picture for clarification:
Document Parsing Grammar
When ignoring infinite data part size and terminators, it's possible to simplify grammar to following rules:
Document ::= header + Block + data Block ::= begin + Attributes + Blocks + end | begin + data + end Blocks ::= Block + Blocks | epsilon Attributes ::= attribute + Attributes | epsilon
The following chart reflects the basic graph of the occurrence of events in the sequential document parsing.
Explanation:
a - block attribute (blockAttribute)
b - begin of the block (blockBegin)
d - data part of block (blockData)
e - end of block (blockEnd)
Graph source file graph-1.graphml
Correct Document
Binary stream is structured correctly as XBUP document (well-formed) if the following conditions are met. Description of invalid state is also included for each condition.
- Optional: Stream header must be present (Corrupted or missing header)
- Optional: Header version must be in supported range (Unsupported header)
- In each block the end of last attribute corresponds to the end of the attribute part (Attribute Overflow)
- In each block the end of last subblock corresponds to the end of the data block part (Block Overflow)
- The terminal block is present only in blocks where it belongs to (Unexpected Terminator)
- End of file is after the end of the root block (Unexpected End)
Page Source