[XBUP] XBUP - Extensible Binary Universal Protocol

» Concepts » Progress

Concept: Number's Encoding

This document is part of the eXtensible Binary Universal Protocol project documentation. Provides description of used number's encoding types and their evolution.

Number's Encoding

Specific encoding of numbers is a fundamental basis of the entire protocol design, which makes it different from most of the current binary formats. It is a variant of gamma encoding, so far called a recursive unary-binary encoding. It is highly likely that this encoding has been previously described, and someone named it already.

Basic procedure is a combination of three techniques: Encoding on a fixed count of bits, the application of the indentation (recursion) and predefined value with its length.

To meet the basic requirement to allow any coding numbers, probably already the simplest unary encoding alpha would be sufficient. But this coding is totally unsatisfactory due to its wasting on space. On the other side as regards to the space-usage is the binary encoding beta which is coding individual bits as a value in exponent of two. This code, however, requires the definition of length and can not be directly used. The resulting code was therefore gradually constructed as a combinations of these two variants, and exclusion of inappropriate or impractical codes. From the number of suitable varieties there were selected those which seemed to be the most appropriate for the protocol needs.

Declared Conditions

The following requirements have been established which should comply with the proposed coding.

Declared Requirements:

Unary Encoding

Unary encoding seems to be as old as the human king itself (how many fingers, the value of many bars on the wall of the cave, so many killed Mammoths :-)). The most known variant of the encoding in an infinite bit stream is the alpha encoding, which bears the same number of bits ending with the opposite bit. This encoding does not include natural numbers only without 0, which is needed, therefore similar encoding with the initial value of 0 is used. This sequence can be interpreted in other ways as well as, for example: The numbers belong to the language (1*0), providing an algebra of whole numbers with a constant (0) and successor operations (1), or as a binary code of single values stop with block of value 0. Bits are read from the input stream and when the bit 1 is received, the successor is used, otherwise the reading will stop and return the number of bits of value 1. The numbers in this format should be for their own processing data files transferred to the internal computer memory, implemented in the form, which can be processed more efficiently, for example, in binary encoding, which is described in the next paragraph.

Binary Encoding

The most well-known binary encoding the beta encoding which is intended for encoding data on the final sequence of bits only and not for encoding into a potentially infinite binary sequences, as it require additional value for the length. That can be realized, for example, as the third symbol, just like in Morse alphabet for the sequence of dots and dashes separated with a space. Each additional bit is another digit, and brings the increase in the value of the exponent based on 2 to the final result. For these reasons, binary encoding could not be used directly and had to be some way extended up to the encoding of length. So far, the the most appropriate solution seemed to be the combination of the unary and binary encoding.

Growing Binary Codes

One option is to use a gradually increasing binary codes stop by block indentation. Example with a linear increase in the length:

 0           = 0
 100         = 1
 101         = 2
 110         = 3
 111000      = 4
 111001      = 5
 111010      = 6
 111011      = 7
 111100      = 8
 111101      = 9
 111110      = 10
 1111110000  = 11
 ...

The length of the code can grow even faster, for example, exponentially, and may be used for other basic length and, where appropriate, it does not have to happen to increase the length.

Interlaced Encoding

This variant of interlaced unary-binary encoding is an example of inadequate implementation. It works so that if the highest bit of the byte is 1, code is extending the value of the next byte. The basic form is a single number, while the highest value (first) bit is 0. If the highest bit is 1, code is extending the total length of the number of additional byte, while with every new byte the same rule re-applies. If the binary encoding expand up one bit, this encoding known as the gamma coding.

The sequence of values is as follows (binary form = range of possible values):

0xxxxxxx                                = 0..7Fh
1xxxxxxx 0xxxxxxx                       = 0..3FFFh
1xxxxxxx 1xxxxxxx 0xxxxxxx              = 0..1FFFFFh
1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx     = 0..0FFFFFFFh
...

Binary encoded value can be either redundantly interpreted or non-redundantly and with zero width of binary code this corresponds to unary encoding. Although this method may seem appropriate at first glance, it has several disadvantages in fact:

Continuous Encoding

This encoding is divided into two parts compact, unary codified length and binary codified own value. Length must be introduced first, so the code is then prefix code and thus decodable in the same order as it is read. This encoding has the same basic single-byte form, where the highest bit is 0, and other bits specify the number. If the highest bit is 1, the number is expanding of additional bit of the unarycode and a group of binary coded bits. The number of these bits may be in a constant and every step and we can introduce constant for it named ClusterSize. Usually the ClusterSize = 7 will be used, as we have done in the previous case. Perhaps it will better to understand this principle from the example of a sequence.

Example (binary form = range of possible values):

 0xxxxxxx                                = 0..7Fh
 10xxxxxx xxxxxxxx                       = 0..3FFFh
 110xxxxx xxxxxxxx xxxxxxxx              = 0..1FFFFFh
 1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx     = 0..0FFFFFFFh
 ...
 11111110 xxxxxxxx .. xxxxxxxx           = 0..1FFFFFFFFFFFFh
 11111111 0xxxxxxx .. xxxxxxxx           = 0..0FFFFFFFFFFFFFFh
 ...

This variant appeared to be much more acceptable, both due to numbers skip operation, and also to the recognizing the space needed for the allocation of the value. Also there is no need to carry out as many arithmetic shifts to the number value. Just to remove all the bits until the first zero the numbers is needed. This method of encoding is used, for example, in UTF-8, which is used to store characters in the Unicode Standard, or even in various projects introducing binary encoding of XML. The requirement on processibility/efficiency essentially enforces the order of bytes from the low post, the so-called Big Endian, because the amount of arithmetic shifts falls off.

That reserving bits for a record of the length, therefore this coding is of course, less effective than beta coding, but it is really binary. This loss can be expressed as, for example, the number of unused bits per byte. In both variants is the loss 1 bit per byte, that is, for better understanding, from each interval which could be implemented 1/(2^length in bytes/words) is lost. Specifically, it is in this case 1/2, which means that instead of the interval 0..0FFh which is possible to store into single byte, only half of it would be possible, therefore 0..7Fh.

Another possibility is to choose a higher increase ration than of the binary expansion of the number of constant length, such as exponential. However, this should lead to increased waste.

Redundant and Non-redundant Variants

In both previous variants, the binary encoding could be interpreted as a number directly as binary code without further adjustments, which, however, creates a certain ambiguity - redundancy. For example, we can store the value 1 as infinitely many forms of different lengths.

Forms of number 1:

 00000001                                = 1
 10000000 00000001                       = 1
 11000000 00000000 00000001              = 1
 ...

This redundancy has probably also some advantages. Especially with such way encoded numbers it works much easier, and you can use this redundancy in the file in advance to reserve space for a greater range of values, which may be subject of the restrictions on access to the file when enlarging or reducing the value. For example, when storing number 200 instead of other bigger, it does not necessarily result in a change in the size of the entire file. On the other hand, this redundancy increase the file size without increasing the total value of the information.

When the redundancy is allowed, it could be coded as other types of values, such as the integer numbers, real numbers and other, as follows:

Different types are interpreted as follows (type - description):

Natural - Is the natural number, or 0 (non-negative whole number) at a designated number of bits

Integer - The most left bit (highest) is a sign. 0 - positive, 1 - negative and because of the redundacy it is not important, whether they are negative numbers in normal, inverse, or in complementary binary code

Real - Real numbers are represent as two numbers of the Integer type. The first is the base and the other is mantissa, which indicates the index of the binary, or another shift

EReal - The real number with specific constants for +- ∞ infinity:

   00111111  00000000                    = +infinity
   01000000  00000000                    = -infinity

Since the value corresponding to these configurations may be stored in the different form with different length, these constants can be fixed without the problem.

MultiBit - Array of bits indexed from right from 0

The requirement for workability rather implies the use of non-redundant form of the encoding. Value of each type will then have a clear expression and if it gets off the value of restricted interval the entire file must be rebuild, which can lead to substantial overhead and requires additional techniques. Another disadvantage is more complex conversion of numbers.

Large loss rates of the number's capacity, which is due to the usege of the unary representation of the length of main values, can be partly to reduced using the recursion.

Recursive Encoding

The most recent proposed variant is a simple change to recursive non-interleaved unary binary encoding. The number is encoded pair of Length and Value, where the value of Length is a encoded almost same way as a whole number. The easiest way it to use directly the nonredundant deterministic variant, which is uniquely decodable encoding. There is again the constant ClusterSize which affects the encoding and determines the way how the Value capacity increase depending on the length.

First is the value Length, which is a natural number, including 0. It is coded as a single bit indicating recursion, and if the value of that bit is 0 then the value is 0 or in the opposite case that this bit is 1, other bits are interpreted recursively as additional value in the same recursive encoding and the resulting value is increased by the 1.

Value of Value is binary encoded on a bit sequences depending on the Length as follows:

If Length<ClusterSize + 1 then length of Value is (Length+1)*ClusterSize else Value has length Length*(ClusterSize+1)

As it is now possible to see if it is constant ClusterSize = -1, the entire encoding becomes the standard unary coding, which can be seen from the context of recursion and algebra with the successor referred earlier. In other cases it determines the size of the clusters as ClusterSize + 1 that extended the length of the value. This is useful for the alignment of the bits n-tuples, such as byte octets (eight bits), with a corresponding value ClusterSize = 7.

Best to show it on another example.

Example:

 0xxxxxxx                                = 0..7Fh
 10xxxxxx xxxxxxxx                       = 0..3FFFh
 110xxxxx xxxxxxxx xxxxxxxx              = 0..1FFFFFh
 1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx     = 0..0FFFFFFFh
 ...
 11111110 xxxxxxxx .. xxxxxxxx           = 0..0FFFFFFFFFFFFFFh
          \_____ 7 times ____/
 11111111 00000000 xxxxxxxx .. xxxxxxxx  = 0..0FFFFFFFFFFFFFFFFh
          \_more_/ \_____ 8 times ____/
 11111111 00000001 xxxxxxxx .. xxxxxxxx  = 0..0FFFFFFFFFFFFFFFFFFh
          \_more_/ \_____ 9 times ____/
 ...

Generally, the prefix is 0FFh:

 11111111 N xxxxxxxx .. xxxxxxxx  = 0..2^( 8*(N + 8))
            \____ n+8 times ___/

where N is the number of additional recursive type.

It would seem that this variant is only unnecessary complication compare to unary binary encoding and benefits are only for very large numbers, however from other view, it makes sense, especially for small size of ClusterSize, or for storing large numbers for reasons of accuracy.

The following table shows the loss of information in one cluster compared to a pure binary encoding for storing of the same number (1 - (Divide of the max, the value of a binary recursive form to pure binary on given bits)/ the count of clusters), for example for ClusterSize = 7.

Losses list (number of bytes expressing main number = amount of lost information on the byte):

 1..7          = 0,500000
 8             = 0,670123
 9             = 0,635129
 10            = 0,603149
 ..
 14            = 0,500000
 15            = 0,478801
 ..
 127+8         = 0,077761
 127+9         = 0,112795
 127+10        = 0,112037
 ...

Loss gradually converge to zero, unlike to unary-binary encoding, where is constantly 0.5, the convergence of the binary code is therefore higher. The highest loss (Limes superior) reaches around 0.67.

For the use of recursion refers the efficiency only, although there are more faster converging encodings, this can in some sense be regarded as the basic recursion. To use this variant speaks requirement of effectiveness. Recursive call themselves generally bear the higher overhead of storing variables, but in this case, it is possible to use a simple iteration. Against speaks the higher complexity of implementation and nonlinearity of the increase of the length of the code.

This encoding was set for the XBUP protocol as a basic. The structure of the encoding requires determinizm and enforce the order of bit interpretation of binary encoding from the highest down to the unit. Similarly, as in previous variants, there is also enforced the endianity of the encoding by requirement for effectiveness. This encoding will remain in the document referred to as coding UBNumber.

A table of recursive codes is included for illustration for ClusterSize = 0, at a minimum “nonunary” option.

Code and the corresponding value

 0                                       = 0
 10 0                                    = 1
 10 1                                    = 2
 110 0 00                                = 3
 110 0 01                                = 4
 110 0 10                                = 5
 110 0 11                                = 6
 110 1 000                               = 7
 110 1 001                               = 8
 ...
 110 1 111                               = 14
 1110 0 00 0000                          = 15
 ...
 1110 0 00 1111                          = 30
 1110 0 01 00000                         = 31
 ...

Non-linear Recursive Encoding

One possible adjustment of the recursive encoding is possible change of the growth rate of the length of their own values. In the above scenario the value length is increasing linearly. One of the possible disadvantages is that in the case of the recursion transition while increasing, the total length of the value will be much larger than for numbers using base unit (ClusterSize + 1). Growth steps by unit could be achieved with limiting the growth, but on the other hand, it would lead to more complex calculating, but primarily to the failure of coding variant for ClusterSize = 0.

Another possible adjustment of the acceleration of growth for example on quadratic expansion, or exponentially. However, this could lead to more waste of space without any visible benefits.

Modified Recursive Encoding LRUB

The following modification (Linear-prolonging Recursive Unary-Binary encoding) is offered as a potential replacement for simple recursive encoding in the following versions. This is a slight modification at which in some specific cases, growth rate of binary numbers is reduced to achieve a purely linear increase in the length of the entire code. Encoding thus maintains both recursivity and determinizm while at the same time it solves some of the problems which may occur with the use of purely recursive code.

Here is an example:

 00                                      = 0
 01                                      = 1
 10 00                                   = 2
 10 01                                   = 3
 10 10                                   = 4
 10 11                                   = 5
 11 00 00                                = 6 (limited growth)
 11 00 01                                = 7
 11 00 10                                = 8
 11 00 11                                = 9
 11 01 00 00                             = 10
 ...
 11 01 11 11                             = 25
 11 10 00 00 00                          = 26
 11 10 00 00 01                          = 27
 ...

Corrected Recursive Encoding SLRUB

SLRUB Encoding (Size-corrected Linear-prolonging Unary-Binary Encoding) is the adjustment of prior encoding, so that when we want to define the area through its length, we can define the size of any large area. In this case the interpretation of the value of the encoding is becoming redundant. This variant is more likely to be introduced as a type of LRUB number (for purely recursive coding is the implementation not appropriate).

What will be the interpretation of the same example:

 00                                      = 0
 01                                      = 1
 10 00                                   = 1
 10 01                                   = 2
 10 10                                   = 3
 10 11                                   = 4
 11 00 00                                = 4
 11 00 01                                = 5
 11 00 10                                = 6
 11 00 11                                = 7
 11 01 00 00                             = 7
 ...
 11 01 11 11                             = 23
 11 10 00 00 00                          = 23
 11 10 00 00 01                          = 24
 ...

UBNumber Encoding Forms

Unlike the definition of the number's encoding the encoding of other types of numbers is more or less automatic matter. Like in the previous case, also now we want to encode the whole, real, and other numbers. Additionally the condition that the ClusterSize>= 0 was set, sequence of bits 0 must always reflect the true value of 0. More precise specifications and examples of conversion algorithms can be found in the protocol specification. The following encodings for different types of numbers are derived from UBNumber encoding.

Note: Conversion algorithms for recursive encoding variant without linear limitation will be fixed later.

UBNatural Encoding

For encoding of natural numbers with zero, or the Omega Set, there is the coding, which is direct derivation of UBNumber. It allows to encode a natural number of any value.

Examples of the type UBNatural:

 00000000                              = 0
 00000001                              = 1
 ...
 01111111                              = 7Fh
 10000000 00000000                     = 80h
 10000000 00000001                     = 81h
 ...
 10111111 11111111                     = 407Fh
 11000000 00000000 00000000            = 4080h
 ...

Value encoding (value >> data stream)

  Value := Hodnota
  Length := 0
  while Length <= ByteSize+1 and Value>=2^(ByteSize*(Length+1)) do (
    Length := Length + 1
    Value := Value - 2^(ByteSize*(Length+1))
  )
  if Length = 8 the (
    ExtLength := 0
    while Value>=2^((ByteSize+1)*(ExtLength+ByteSize+1)) do (
      ExtLength := ExtLength + 1
      Value := Value - 2^((ByteSize+1)*(ExtLength+ByteSize+1))
    )
    write prefix
    write ExtLength
  ) else write unary Length
  write binary Value

Value decoding (data stream >> value)

  read prefix or Length
  if prefix then (
    Length := ByteSize + 1
    read ExtLength
  )
  read Value
  Result := Value
  if prefix then (
    for each I from interval <1..ExtLength> Result := Result + (2^((I+ByteSize) * (ByteSize+1)))
  )
  for each I from interval <1..Length> Result := Result + (2^(I * ByteSize))

UBENatural Encoding

This encoding is designed for natural numbers with a special constant for the infinity. It will be used primarily later in the block structure to represent the length and infinity there will express the unknown, potentially infinite length. For the realization is the value of infinity placed on the highest value in the basic form. Another option was to put infinity to 0, but that would conflict with the previously established rule of static zero.

Examples of the type UBENatural

 ...
0111110                              = 7Eh
0111111                              = ∞
1000000 00000000                     = 7Fh
1000000 00000001                     = 80h
..

Conversion of the UBNatural value

  if Value>(2^ByteSize) then Value := Value - 1 else
  if Value=(2^ByteSize)-1 then Value := ∞

UBInteger Encoding

This encoding is used to store integer numbers. Binary part is interpreted as a binary code in the two-complementary code with the highest bit as the sign. Probably another way of coding can not be used because of efficiency, moreover, the sign must be shown first, because it is more important than the value (affects processing).

Examples of the UBInteger type values

 11011111 11111111 11111111            = -2041h
 10100000 00000000                     = -2040h
 ...
 10111111 11111110                     = -42h
 10111111 11111111                     = -41h
 01000000                              = -40h
 ...
 01111110                              = -2
 01111111                              = -1
 00000000                              = 0
 00000001                              = 1
 ...
 00111111                              = 3Fh
 10000000 00000000                     = 40h
 10000000 00000001                     = 41h
 ...
 10011111 11111111                     = 203Fh
 11000000 00000000 00000000            = 2040h
 ...

Value encoding

  Value := Abs(Input)
  if Input<0 then Value := Value + 1
  Length := 0

  while Length <= ByteSize+1 and Value>=2^(ByteSize*(Length+1)) do (
    Length := Length + 1
    Value := Value - 2^(ByteSize*Length-1)
  )
  if Length = 8 then (
    ExtLength := 0
    while Value>=2^((ByteSize+1)*(ExtLength+ByteSize+1)) do (
      ExtLength := ExtLength + 1
      Value := Value - 2^((ByteSize+1)*(ExtLength+ByteSize+1))
    )
    write prefix
    write ExtLength
  ) else write unary Length
  if Input<0 then Value := Neg(Value)
  write binary Value

Where the Neg operation is the inversion of the lowests ByteSize * (Length +1) bits of the value Value (including sign) and Abs is a function of absolute value.

Value decoding

  read prefix or Length
  if prefix then (
    Length := ByteSize + 1
    read ExtLength
  )
  read Value
  Result := Value
  if prefix then BitLength := ByteSize*(ExtLength+8)+7 jinak BitLength := Length*ByteSize+(7-Length)
  if (Value and 2^(BitLength))>0 then ( // Test on negative sign
    Result := ( - Neg(Result) ) - 1
    For each integer I from interval <1..Length> Result := Result - (2^(I*ByteSize-1))
    if prefix then (
      for each I from interval <1..ExtLength> Result := Result - (2^((I+ByteSize) * (ByteSize+1)))
    )
  ) else (
    For each I integer from interval <1..Length> Result := Result + (2^(I*ByteSize-1))
    if prefix then (
      for each I from interval <1..ExtLength> Result := Result + (2^((I+ByteSize) * (ByteSize+1)))
  )

UBEInteger Encoding

Like in the case UBENatural this is an extension of the number of constants for infinity, this time it includes negative one. Placed the similar way.

Examples of the type UBEInteger

 ...
 10111111 11111111                     = -40h
 01000000                              = -∞
 01000001                              = -3fh
 ...
 00111110                              = 3Eh
 00111111                              = ∞
 10000000 00000000                     = 3Fh
 ...

Conversion of the UBInteger value

  if Value>(2^(ByteSize-1)) then Value := Value - 1 else
  if Value<(2^(ByteSize-1))+1 then Value := Value + 1 else
  if Value=(2^(ByteSize-1))-1 then Value := ∞ else
  if Value=-(2^(ByteSize-1)) then Value := -∞

UBReal Encoding

The aim of this encoding is to enable the storage of real numbers. Because the real numbers are uncountable, it was necessary to limit on a subset. Chosen limitation was the final development of the binary form, because the endless numbers can not be fitted into finite memory. As a suitable solution seems to be a encoding as a pair of UBInteger type numbers. The first one defines the base and the other mantis. Base bears main value and mantissa determines how much the value is shifted. For example, for the Intel processor architecture are real numbers realized by the IEEE 754 standard, which is used to eliminate redundancy using the invisible one method. Also in this encoding that it is not possible to allow redundancy, in order not to violate the rule of determinism and it is necessary to find a way to solve such type of ambiguity. For example, one way is to try to use the similar method of the invisible one. It is possible to choose between placing the one before the highest, or at the lowest bit of the Value. There is also the possibility of multiple interpretations of the mantis. So first the technique of placing “invisible” one before the highest bit of the base should be described. When interpreting the number it is essential for any negation of the negative number, to add before valid bits also one bit with value 1. Already at this stage it is possible to recognize some problems with the point skipping. Another problem is the right interpretation of the decimal point shift. Usual decimal exponent will not be proper solution here so instead will be better to choose “half” (binary) point. Lets list three ways of placing point depending on the location of the invisible one before the highest bit:

    00000000  00000000                   = 64              [(1)000000.]
    00000000  00000000                   = 1               [(1).000000]
    00000000  00000000                   = 0.5             [.(1)000000]

Most preferable way seems to be to put the point to the highest bit, but there related problems with the mantis conversion, which is depending on the length of the base value. However, those cannot be avoid even with the placement of the point at the end of the code, which would force to somehow shift the mantis. The third method is basically similar unsuitable. Fortunately, we can still use less unusual option to add the invisible one at the end of the number. Opět si uvedeme způsoby interpretace tečky: Again, lets include different ways of point interpretation:

    00000000  00000000                   = 0.015625        [.000000(1)]
    00000000  00000000                   = 0.5             [000000.(1)]
    00000000  00000000                   = 1               [000000(1).]

Finally, the third option was selected, since it appeared to be the best. Only the problem with the interpretation of the value 0 remains to resolve, with easiest way to shift it out. That approach brings together the following algorithm:

  if (X=0 and Y=0) the Value := 0 else (
    Value := (X*2+1)*(2^Y)
    if (X>0 and Y=0) then Value := Value - 2
  )

Encoding numbers seem to be more technically difficult. It requires the division by the number two, until the remaining part is 1, or multiplication until number's fraction is less or equal equal to 1/2. The resulting whole number is then placed in the base and count of the performed division or negation of the count of performed multiplication is then placed as mantis. This operation should be optimized in the future as much as possible (could be added to as a processor integrated instruction, or two :-)).

The advantages of the selected form of encoding include, for example, the mantissa rate determines whether there is a whole number, or number, which has nonzero fraction. Likewise, it is possible to determine the parity of the number from the parity of the base. So, if instead of the both numbers only one part is included, it can still encode some useful value. If, for example, only base is included, the mantissa is zero and odd numbers are encoded plus code for zero, which is not so useful as a second case when there is only mantissa so binary exponents with the exception of 1 replaced by value 0 are encoded.

Some numbers with endless development will be possible to store using higher mathematical algorithms and blocks, such as using recursion. Binary form of numbers is simply the best for the bit stream data.

Examples of the type UBReal

- numbers with zero mantissa:

    ...
  10111111 11111111  00000000          = -81h
  01000000  00000000                   = -7Fh
  01000001  00000000                   = -7Dh
  ...
  01111110  00000000                   = -3
  01111111  00000000                   = -1
  00000000  00000000                   = 0 (1)
  00000001  00000000                   = 1 (3)
  00000010  00000000                   = 3 (5)
  ...
  00111111  00000000                   = 7Dh (7Fh)
  10000000 00000000  00000000          = 7Fh (81h)
  ...

- other numbers with different mantisses:

  01111111  00000001                   = -2
  00000000  00000001                   = 2
  00000001  00000001                   = 6
  00000010  00000001                   = 10
  00000000  00000010                   = 4
  00000000  00000011                   = 8
  00000000  01111111                   = 0.5
  00000001  01111111                   = 1.5

Value encoding

  Base := 0
  Mantissa := 0
  if not Value=0 then (
    if Frac(Value)=0 then (
     while not Frac(Value)=0.5 do (
       Mantissa := Mantissa + 1
       Base := Base / 2
     )
    ) else (
     while not Frac(Value)=0.5 do (
       Mantissa := Mantissa - 1
       Base := Base * 2
     )
    )
  )
  write UBInteger Base
  write UBInteger Mantissa

Value decoding

  Base := read UBInteger
  Mantissa := read UBInteger
  if (Base=0 and Mantissa=0) then Result := 0 else (
    Result := (Base*2+1)*(2^Mantissa)
    if (Base>0 and Mantissa=0) then Result := Result - 2
  )

UBReal as a Single Number

Real number with the final decimal development is likely to be possible to encode into a single UBNumber value, as it is equally possible to effectively encode any final final seqeunce of integer numbers into a single natural number. For or against the use of encoding there has not be found the appropriate arguments.

One possible solution is the diagonal method:

(0,0)  (0,1)  (0,2)  (0,3)
      /      /      /
     /      /      /
(1,0)  (1,1)  (1,2)  (1,3)
      /      /      /
     /      /      /
(2,0)  (2,1)  (2,2)  (2,3)
      /      /      /
     /      /      /
(3,0)  (3,1)  (3,2)  (3,3)
...
(0,0) - (1,0) - (0,1) - (2,0) - (1,1) - (0,2) - ...

UBEReal Encoding

Again, another extension is this time to add constants for the +-infinity to real number. Basically, the mantis with a base value of 0 is interpreted as UBEInteger type. This corresponds to constants:

   00111111 00000000                     = +infinity
   01000000 00000000                     = -infinity

Examples of the UBEReal type

 10111111 11111111  00000000           = -7Fh
 01000000  00000000                    = -∞
 01000001  00000000                    = -7Dh
 ...
 00111110  00000000                    = 7Bh
 00111111  00000000                    = ∞
 10000000 00000000  00000000           = 7Dh
 ...

UBReal value conversion

  if (Base=0 and Mantissa=0) then Value := 0 else (
    if (Base=(2^ByteSize)-1 and Mantissa=0) then Value := ∞ else
    if (Base=-(2^ByteSize) and Mantissa=0) then Value := -∞ else (
      if (Mantissa=0) then (
        if (Base>0) then Value := Value - 2
        if (Base<(2^ByteSize)-1) then Value := Value - 2 else
        if (Base>-(2^ByteSize)) then Value := Value - 2
      )
    )
  )

UBRatio Encoding

So far the last approved type is the type of encoding real numbers in the interval <0,1> with the finite binary development. This set is already countable large, and it is possible to encode it as a UBNumber value. Value is essentially stored as a count of divided of half cut to a depth intervals. This type can be used to store real numbers for any constant intervals of equal spread. Likewise, for saving intensity and other physical quantities.

Encoding with Reversion

Základní úvaha vycházela z toho, že bude nutné kódovat posloupnost čísel s postupně prostoucím rozvojem. První hodnoty musí být samozřejmě 0 a 1, dále je předpokládáno pořadí postupně na poloviny dělených zlomků bez opakování, tedy 1/2, 1/4, 3/4, 1/8, 3/8, 5/8, 7/8, 1/16, 3/16 atd. Nakonec však bylo požadováno pouze to, aby byly skupiny zlomků se stejným zalomením v pořadí za sebou a bez opakování a použito bylo přidání tečky za celé číslo a otočení pořadí bitů, takže místo do vyšších hodnot porostou do vyšší přesnosti. Tento algoritmus vyžaduje použití netriviální operace reverze pořadí bitů.

Příklad převodu

0010111 =>(-1) 0010110 =>(reverse order) 0110100 =>("0." add) 0.0110100 =>(calculate) 0.25 + 0.125 + 0,03125 = 0,40625

Uveďme si příklady několika prvních hodnot ve formátu UBRatio:

kód, binární tvar skutečné hodnoty, zlomkový tvar

  00000000  0                          = 0
  00000001  1                          = 1
  00000010  0.1                        = 1/2
  00000011  0.01                       = 1/4
  00000100  0.11                       = 3/4
  00000101  0.001                      = 1/8
  00000110  0.101                      = 5/8
  00000111  0.011                      = 3/8
  00001000  0.111                      = 7/8
  00001001  0.0001                     = 1/16
  00001010  0.1001                     = 9/16
  00001011  0.0101                     = 5/16
  ...

Dekódování hodnoty

  Přečti UBNatural Value
  Je-li (Value<2) potom ( Hodnota := Value ) jinak (
    Hodnota := 0
    Value := Value - 1
    Pomocná := 1
    Dokud platí (Value>0) proveď (
      pokud platí (Value and 1) potom Hodnota := Hodnota + 1/Pomocná
      Pomocná := Pomocná shl 1
      Value := Value shr 1
    )
  )

Kódování hodnoty

  Je-li (Hodnota=0 nebo Hodnota=1) potom ( Value := Hodnota ) jinak (
    Value := 0
    Pomocná := 0
    Dokud platí (Hodnota>0) proveď (
      Hodnota := Hodnota * 2
      Je-li (Hodnota > 1) potom (
        Value := Value + Pomocná
        Hodnota := Hodnota - 1
      )
      Pomocná := Pomocná shl 1
    Value := Value + 1
    )
  )
  Zapiš UBNatural Value

Direct Encoding

There is, fortunately, available a variant of the encoding without the reversing the order of bits which uses a simple shift. It also uses more natural order of fractions.

Example conversion

0010111 =>(-1) 0010110 =>(*2+1)
00101101 =>(shift)001.01101 =>(-1) 0.01101 =>(calculate)
0.25 + 0.125 + 0,03125 = 0,40625 = 13/32

Example values of the UBRatio type

  00000000  0                          = 0
  00000001  1                          = 1
  00000010  0.1                        = 1/2
  00000011  0.01                       = 1/4
  00000100  0.11                       = 3/4
  00000101  0.001                      = 1/8
  00000110  0.011                      = 3/8
  00000111  0.101                      = 5/8
  00001000  0.111                      = 7/8
  00001001  0.0001                     = 1/16
  00001010  0.0011                     = 3/16
  00001011  0.0101                     = 5/16
  ...

Value decoding

  Value := Input
  if not (Value=0 or Value=1) then (
    Value := Value + 1
    while (Value = Trunc(Value)) do ( Value := Value * 2)
    Value := Trunc(Value/2) + 1
  )
  write UBNatural Value

Value encoding

  Value := read UBNatural
  if (Value<2) then ( Result := Value ) else (
    Result := Value*2 - 1
    while (Result>2) do ( Result := Result / 2 )
    Result := Result - 1
    )
  )

Other Considered Encodings

In addition to the above there were considered some additional encoding of some frequently used types.

UBNReal - nonnegative real number - the base is interpreted as UBENatural/UBNatural
UBCReal - integer number with development - mantissa is interpreted as UBNatural
UBComplex - compex numberk - two real numbers
UBKvat - quaternion - four real numbers, but no practical use
UBFrag - fractions
UBBits - array of logical values - the number is interpreted as a field of 0/1, and it might make sense to define it if the format will be redundant. Thus it will be realized as data block.

Any additional encodings might be described in other parts of the documentation.

Argumentation of the Correctness

Lets try to summarize why it is the chosen encoding UBNumber the most appropriate (perhaps I will be possile to reformulate these arguments to some form of proof later).

Since it is possible to encode a sequence of integers as a single whole number, it will be useful to prove that such operations is inappropriate.

Open Questions

Especially the following questions have not been sufficiently argumented so far:

The list of resources, literature and relevant links:

Inverse Form http://en.wikipedia.org/wiki/Ones%27_complement
Two's Complement Form http://en.wikipedia.org/wiki/Two%27s_complement
RFC - Request For Comment http://www.ietf.org/rfc.html
MIME - Multipurpose Internet Mail Extensions http://en.wikipedia.org/wiki/MIME
UNICODE - The character encoding system http://www.unicode.org
UCS - http://www.unicode.org
UTF - Unicode Transformation Format http://www.unicode.org
INTEL - Intel Corporation (C) http://www.intel.com
IEEE - Institute of Electrical and Electronics Engineers, Inc. http://www.ieee.com
IEEE - IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985) http://en.wikipedia.org/wiki/IEEE754
ISO - International Standartization Organization http://www.iso.ch
ANSI - American National Standartization Organization http://www.ansi.org
Alpha Coding - Alpha Unary Number Coding http://en.wikipedia.org/wiki/Unary_coding
Beta Coding - Beta Binary Coding http://en.wikipedia.org/wiki/Binary_number
Gama Coding - Elias Gamma Number Coding http://en.wikipedia.org/wiki/Elias_Gamma_coding
Convergence - Mathematical Convergence http://en.wikipedia.org/wiki/Convergence
Limes Superior - Mathematical Limes Superior http://www.math.uni-sb.de/~ag-wittstock/lehre/WS00/analysis1/Vorlesung/node62.html
Binary XML - XML Binary Characterization http://www.w3.org/XML/Binary/