[XBUP] XBUP - Extensible Binary Universal Protocol

ยป Concepts

Concept: Data Structure

Protocol data structure concept describes how to store data in binary stream. Technically block-tree organization structure is used.

Declared Conditions

Requiremens on the structure are derived from project's requirements with additional focus on compatibility and scalability of the implementation.

Requirements:

Block Structure

Target method should describe how to structure various data types to be stored as potentially infinite sequence of bits. Following concept only describes basic approach using block as a basic structure entity, which can include one or more blocks to estabilish actual tree structure.

Each block consist of two parts:

First value in attribute part is:

UBNatural - AttribPartLength (APL)

This value limits the length of the remaining attribute part area and therefore size of UBNumber values sequence. AttribPartLength value = 0 means that there are no more UBNumber values in the block and that this is a termination block.

That the value is of the UBNatural type means that the data stream has to known not only the count but also space used by attributes onward. This can cause problems for large sequences of attributes. However, it is expected, that huge data values should be stored as data blocks or as a sequence or structure of smaller blocks.

Next value in the attribute part is:

UBENatural - DataPartLength (DPL)

This value specifies the length of data part as a sequence of bit clusters (size in bytes), which follows after attribute part and also includes infinity constant for unknown size.

Remaining values in attribute part are called attributes. If there is still remaining space for attributes, block is called node block and data part is then processed as a sequence of blocks. If there is no more space for attributes, block is called data block and data part is considered as blob of binary data.

Meaning of other attributes is not defined here and should not affect basic block-tree structure.

Node Block

Node block must have at least one attribute. Data part is processed as sequence of sub blocks, where this sequence must fit exactly to available space.

If node block have data part size set to infinity, sequence of sub blocks must be ended by termination block. If node block have data part size set to 0, there are no subblocks and block is called leaf block.

Examples (hexadecimal codes used):

APL DPL Attribute
02 00 00
APL DPL Attribute Termination Block
02 7F 00 00
APL DPL Attribute Child
02 03 00 02 00 00

Data Block

Data block is block which doesn't have any attributes.

If data block have data part size set to 0, it is called empty block. If data block have data part size set to infinity, data part is processed using following algorithm:

Data are processed by bytes. When next byte has value 0, additional byte is read to be used for length of empty space. If empty space length is 0, data block ends. Otherwise these two bytes are passed as data as a sequence of zero bytes of given empty space length.

Examples (hexadecimal codes used):

APL DPL
01 00
APL DPL Data
01 7F 00 00
APL DPL Data
01 01 00

File or Stream Structure

The structure of the document encoded using XBUP is structured as follows:

Document Header
Attribute Part
Data Part
Subblock's Attribute Part
Subblock's Data Part
...
...
Extended Area

Document Header

In addition to blocks there should be at the beginning of the stream a sequence of bits that would indicate the used method of coding. As mentioned in concept of numerical encoding, it is useful to at determine the size of the used cluster.

ClusterSize = FEh

This value is using unary encoding and allows to determine the ClusterSize value, but to match cluster, this value is interpretted as incremented by one, so that unary encoded value FEh = 7 means actually 8 bits long clustering (bytes). It must be introduced at the beginning of the file, since the encoding on of the following values depends on it.

For development purposes the header of the document includes several other data:

UBNatural - ProtocolVersion = 00h
DWord(4xUBNatural) - ProtocolSignature = 58 42 00 XXh ("XB" + development version)

Protocol version 0 is reserved for the protocol development stage. The development version then specify particular structure of the document and any incompatible changes shall be reflected in a change to this value.

If the stream contains no other data, then it's called an empty document and is technically not valid. Otherwise, the data is processed as a single block, and data after that block are called extended area. This should allow the use of protocol to be usable as a header for generally infinite bitstream.

One reason to use header is that in some operating systems file name extension is not used to distinguish the type of file, but just the first bytes of the content are usually accessed. The header can be interpreted as a 32-bit identification number and 16-bit version number. It is assumed that the final version will have different document header.

Document Validity

Stream is created correctly (well-formed) if the following conditions holds. Names of exceptional invalid states are also listed for the case given condition is not met.