[XBUP] XBUP - Extensible Binary Universal Protocol

ยป Concepts

Concept: Numerical Encoding

This concept describes specific encoding used as a fundamental basis of the entire protocol design, allowing efficient representation of any integer number or any instance of accountable set. It's based on hybrid unary-binary encoding with some recursive applicability.

This encodings should not be considered as the only way how to represent values in protocol. Although it is enforced in attributes sequence, it's still possible to store values in blobs using any proper encoding form for storage needs, including typical binary number encoding to fixed length with options to choose endianity or various types of encodings, like for example IEEE-754 for real numbers.

Declared Requirements

The following requirements or nice-to-have features are specified for the proposed encoding.

Declared Requirements:

UBNumber Encoding Family

Proposed UBNumber encoding consist of three components/parts: unary encoded value, optional recursive UBNumber-encoded part and binary encoded part.

Encoding uses bit clusters of fixed length, typically bytes. In basic form, where recursive part is not present, value of unary encoded part is followed by binary encoded part filling up the rest of sequence of clusters which count is based on value represented by unary part. This is similar to Unicode UTF-8 encoding.

When unary part would be longer than single cluster, recursive part is added instead and value of this part is used further extend cluster length of number. See examples below, where green color marks unary encoded part and red color marks recursive part. Optionally some encodings uses magenta color for value sign bit.

Native mapping of codes to values is demonstrated on UBNatural encoding variant, which can represent any non-negative integer value. Encoding can be also used to store other types of numbers, like for example real numbers.

UBNatural Encoding

For encoding of natural numbers with zero (non-negative integers or the Omega set), direct translation of UBNumber codes to incrementing integer values is used.

Examples of the type UBNatural:

 00000000                              = 0
 00000001                              = 1
 ...
 01111111                              = 7Fh
 10000000 00000000                     = 80h
 10000000 00000001                     = 81h
 ...
 10111111 11111111                     = 407Fh
 11000000 00000000 00000000            = 4080h
 ...
 11111110 11111111 .. 11111111           = 10204081020407Fh
          \_____ 7 times ____/
 ...
 11111111 00000000 00000000 .. 00000000  = 102040810204080h
                   \_____ 8 times ____/
 ...
 11111111 00000001 00000000 .. 00000000  = 10102040810204080h
                   \_____ 9 times ____/
 ...

Value encoding (value >> data stream)

  Value := Input
  Length := 0
  while Length <= ByteSize+1 and Value>=2^(ByteSize*(Length+1)) do {
    Length := Length + 1
    Value := Value - 2^(ByteSize*(Length+1))
  }
  if Length = 8 the {
    ExtLength := 0
    while Value>=2^((ByteSize+1)*(ExtLength+ByteSize+1)) do {
      ExtLength := ExtLength + 1
      Value := Value - 2^((ByteSize+1)*(ExtLength+ByteSize+1))
    }
    write prefix
    write ExtLength
  } else write unary Length
  write binary Value

Value decoding (data stream >> value)

  read prefix or Length
  if prefix then {
    Length := ByteSize + 1
    read ExtLength
  }
  read Value
  Result := Value
  if prefix then {
    for each I from interval <1..ExtLength> Result := Result + (2^((I+ByteSize) * (ByteSize+1)))
  }
  for each I from interval <1..Length> Result := Result + (2^(I * ByteSize))

UBENatural Encoding

This encoding is designed for natural numbers with a special constant for the infinity. Infinity constant is placed on the highest value in the shortest form. Another option is to put infinity as a value 0, which would incline to have infinity as default value (see default zero principle on protocol transformation concept).

This type is used in the block structure to represent the length of data part and infinity constant expresses the undefined, potentially infinite length.

Examples of the type UBENatural

 ...
0111110                              = 7Eh
0111111                              = ∞
1000000 00000000                     = 7Fh
1000000 00000001                     = 80h
..

Conversion of the UBNatural value

  if Value>(2^ByteSize) then Value := Value - 1 else
  if Value=(2^ByteSize)-1 then Value := ∞

UBInteger Encoding

This encoding is used to store integer numbers. Binary part is interpreted as a binary code in the two-complementary code with the highest bit as the sign. The sign flag is included as first occurring bit, because it might theoretically bring additional information how to processing the rest of the value.

Examples of the UBInteger type values

 11011111 11111111 11111111            = -2041h
 10100000 00000000                     = -2040h
 ...
 10111111 11111110                     = -42h
 10111111 11111111                     = -41h
 01000000                              = -40h
 ...
 01111110                              = -2
 01111111                              = -1
 00000000                              = 0
 00000001                              = 1
 ...
 00111111                              = 3Fh
 10000000 00000000                     = 40h
 10000000 00000001                     = 41h
 ...
 10011111 11111111                     = 203Fh
 11000000 00000000 00000000            = 2040h
 ...

Value encoding

  Value := Abs(Input)
  if Input<0 then Value := Value + 1
  Length := 0

  while Length <= ByteSize+1 and Value>=2^(ByteSize*(Length+1)) do {
    Length := Length + 1
    Value := Value - 2^(ByteSize*Length-1)
  }
  if Length = 8 then {
    ExtLength := 0
    while Value>=2^((ByteSize+1)*(ExtLength+ByteSize+1)) do {
      ExtLength := ExtLength + 1
      Value := Value - 2^((ByteSize+1)*(ExtLength+ByteSize+1))
    }
    write prefix
    write ExtLength
  } else write unary Length
  if Input<0 then Value := Neg(Value)
  write binary Value

Where the Neg operation is the inversion of the lowest ByteSize * (Length +1) bits of the value Value (including sign) and Abs is a function of absolute value.

Value decoding

  read prefix or Length
  if prefix then {
    Length := ByteSize + 1
    read ExtLength
  }
  read Value
  Result := Value
  if prefix then BitLength := ByteSize*(ExtLength+8)+7 else BitLength := Length*ByteSize+(7-Length)
  if (Value and 2^(BitLength))>0 then { // Test on negative sign
    Result := ( - Neg(Result) ) - 1
    For each integer I from interval <1..Length> Result := Result - (2^(I*ByteSize-1))
    if prefix then {
      for each I from interval <1..ExtLength> Result := Result - (2^((I+ByteSize) * (ByteSize+1)))
    }
  } else {
    For each I integer from interval <1..Length> Result := Result + (2^(I*ByteSize-1))
    if prefix then {
      for each I from interval <1..ExtLength> Result := Result + (2^((I+ByteSize) * (ByteSize+1)))
    }
  }

UBEInteger Encoding

Like in the case UBENatural this is an extension of the integer number set adding constants for infinity, this time it includes both positive and negative one. Placed the similar way.

Examples of the type UBEInteger

 ...
 10111111 11111111                     = -40h
 01000000                              = -∞
 01000001                              = -3fh
 ...
 00111110                              = 3Eh
 00111111                              = ∞
 10000000 00000000                     = 3Fh
 ...

Conversion of the UBInteger value

  if Value>(2^(ByteSize-1)) then Value := Value - 1 else
  if Value<(2^(ByteSize-1))+1 then Value := Value + 1 else
  if Value=(2^(ByteSize-1))-1 then Value := ∞ else
  if Value=-(2^(ByteSize-1)) then Value := -∞

UBReal Encoding

This encoding enables the representation of limited subset of real numbers. Because the real numbers are uncountable, it was necessary to limit to numbers with final development of the binary base form, because the endless numbers cannot be fitted into finite memory. Solution is based on the use of a pair of UBInteger values. The first one defines the base and the other mantissa, which determines how much the value is shifted on the base-2 scale.

To eliminate redundancy, method of adding invisible bit 1 before point is used. Only exception is in basic form, where value is decremented to support zero constant. This approach uses following algorithm:

  if (Base = 0 and Mantissa = 0) the Value := 0 else {
    Value := (Base*2+1)*(2^Mantissa)
    if (Base > 0 and Mantissa = 0) then Value := Value - 2
  }

Encoding numbers using this formula is a bit more complicated. It requires the division by the number two, until the remaining part is 1, or multiplication until number's fraction is less or equal equal to 1/2. The resulting whole number is then placed in the base and count of the performed division or negation of the count of performed multiplication is then placed as mantissa.

This encoding has some useful properties, for example, the mantissa rate determines whether there is a whole number, or number, which has nonzero fraction. Likewise, it is possible to determine the parity of the number from the parity of the base. So, if instead of the both numbers only one part is included, it can still provide some useful value. If, for example, only base is included, the mantissa is zero and odd numbers are encoded plus code for zero, which is not so useful as a second case when there is only mantissa so binary exponents with the exception of 1 replaced by value 0 are encoded.

Some numbers with endless development will be possible to store using higher mathematical algorithms and blocks, such as using recursion. Binary form of numbers is simply the best for the bit stream data.

Examples of the type UBReal

- numbers with zero mantissa:

    ...
  10111111 11111111  00000000          = -81h
  01000000  00000000                   = -7Fh
  01000001  00000000                   = -7Dh
  ...
  01111110  00000000                   = -3
  01111111  00000000                   = -1
  00000000  00000000                   = 0 (1)
  00000001  00000000                   = 1 (3)
  00000010  00000000                   = 3 (5)
  ...
  00111111  00000000                   = 7Dh (7Fh)
  10000000 00000000  00000000          = 7Fh (81h)
  ...

- other numbers with different mantisses:

  01111111  00000001                   = -2
  00000000  00000001                   = 2
  00000001  00000001                   = 6
  00000010  00000001                   = 10
  00000000  00000010                   = 4
  00000000  00000011                   = 8
  00000000  01111111                   = 0.5
  00000001  01111111                   = 1.5

Value encoding

  Base := 0
  Mantissa := 0
  if not Value=0 then {
    if Frac(Value)=0 then {
     while not Frac(Value)=0.5 do {
        Mantissa := Mantissa + 1
        Base := Base / 2
      }
    } else {
      while not Frac(Value)=0.5 do (
        Mantissa := Mantissa - 1
        Base := Base * 2
      }
    }
  }
  write UBInteger Base
  write UBInteger Mantissa

Value decoding

  Base := read UBInteger
  Mantissa := read UBInteger
  if (Base=0 and Mantissa=0) then Result := 0 else {
    Result := (Base*2+1)*(2^Mantissa)
    if (Base>0 and Mantissa=0) then Result := Result - 2
  }

UBReal as a Single Number

Real number with the final decimal development is likely to be possible to encode into a single UBNumber value, as it is equally possible to effectively encode any final sequence of integer numbers into a single natural number, but combining two numbers into single value is a bit problematic.

One possible solution is the diagonal method, but it is a bit computational-demanding:

(0,0)  (0,1)  (0,2)  (0,3) ...
      /      /      /
     /      /      /
(1,0)  (1,1)  (1,2)  (1,3)
      /      /      /
     /      /      /
(2,0)  (2,1)  (2,2)  (2,3)
      /      /      /
     /      /      /
(3,0)  (3,1)  (3,2)  (3,3)
...

Resulted code sequence is build following diagonal lines:

(0,0) = 0, (0,1) = 1, (1,0) = 2, (0,2) = 3, (1,1) = 4, (2,0) = 5, (0,3) = 6, ...

Other possible method is bit interleaving, where in final value, odd bits are used to construct first value and even bits are used to construct second value. Value conversion is slightly easier to compute using binary shifting.

UBEReal Encoding

This is another extension adding constants for the +-infinity to real numbers. Basically, the mantissa with a base value of 0 is interpreted as UBEInteger type. This corresponds to constants:

   00111111 00000000                     = +infinity
   01000000 00000000                     = -infinity

Examples of the UBEReal type

 10111111 11111111  00000000           = -7Fh
 01000000  00000000                    = -∞
 01000001  00000000                    = -7Dh
 ...
 00111110  00000000                    = 7Bh
 00111111  00000000                    = ∞
 10000000 00000000  00000000           = 7Dh
 ...

UBReal value conversion

  if (Base=0 and Mantissa=0) then Value := 0 else {
    if (Base=(2^ByteSize)-1 and Mantissa=0) then Value := ∞ else
    if (Base=-(2^ByteSize) and Mantissa=0) then Value := -∞ else {
      if (Mantissa=0) then {
        if (Base>0) then Value := Value - 2
        if (Base<(2^ByteSize)-1) then Value := Value - 2 else
        if (Base>-(2^ByteSize)) then Value := Value - 2
      }
    }
  }

UBRatio Encoding

Another considered type is encoding of real numbers in the fixed interval <0,1> with the finite binary development. This set is already countable large, and it is possible to encode it as a UBNumber value. Value is essentially stored as a count of divided of half cut to a depth intervals. This type can be used to store real numbers for any constant intervals of equal spread. Likewise, for saving intensity and other physical quantities. Encoding is based on point shifting.

Example conversion

0010111 =>(-1) 0010110 =>(*2+1)
00101101 =>(shift)001.01101 =>(-1) 0.01101 =>(calculate)
0.25 + 0.125 + 0,03125 = 0,40625 = 13/32

Example values of the UBRatio type

  00000000  0                          = 0
  00000001  1                          = 1
  00000010  0.1                        = 1/2
  00000011  0.01                       = 1/4
  00000100  0.11                       = 3/4
  00000101  0.001                      = 1/8
  00000110  0.011                      = 3/8
  00000111  0.101                      = 5/8
  00001000  0.111                      = 7/8
  00001001  0.0001                     = 1/16
  00001010  0.0011                     = 3/16
  00001011  0.0101                     = 5/16
  ...

Value decoding

  Value := Input
  if not (Value=0 or Value=1) then (
    Value := Value + 1
    while (Value = Trunc(Value)) do ( Value := Value * 2)
    Value := Trunc(Value/2) + 1
  )
  write UBNatural Value

Value encoding

  Value := read UBNatural
  if (Value<2) then ( Result := Value ) else (
    Result := Value*2 - 1
    while (Result>2) do ( Result := Result / 2 )
    Result := Result - 1
    )
  )

Other Considered Encodings

In addition to the above there were considered some additional encoding of some frequently used types.

UBNReal - nonnegative real number - the base is interpreted as UBENatural/UBNatural
UBCReal - integer number with development - mantissa is interpreted as UBNatural
UBComplex - complex number - stored as two real numbers
UBKvat - quaternion - four real numbers, but no practical use
UBFrag - fractions
UBBits - array of logical values - the number is interpreted as a field of 0/1, and it might make sense to define it if the format will be redundant. Thus it will be realized as data block.

Any additional encodings might be described in other parts of the documentation.

IEEE - IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985) http://en.wikipedia.org/wiki/IEEE754
Two's Complement Form http://en.wikipedia.org/wiki/Two%27s_complement
Alpha Coding - Alpha Unary Number Coding http://en.wikipedia.org/wiki/Unary_coding
Beta Coding - Beta Binary Coding http://en.wikipedia.org/wiki/Binary_number
Gama Coding - Elias Gamma Number Coding http://en.wikipedia.org/wiki/Elias_Gamma_coding