Concept: Numerical Encoding
This concept describes specific encoding used as a fundamental basis of the entire protocol design, allowing efficient representation of any integer number or any instance of accountable set. It's based on hybrid unary-binary encoding with some recursive applicability.
This encodings should not be considered as the only way how to represent values in protocol. Although it is enforced in attributes sequence, it's still possible to store values in blobs using any proper encoding form for storage needs, including typical binary number encoding to fixed length with options to choose endianity or various types of encodings, like for example IEEE-754 for real numbers.
Declared Requirements
The following requirements or nice-to-have features are specified for the proposed encoding.
Declared Requirements:
- Encoding numbers of any size (Universality) - The basic requirement is to allow the expression of any finite number. Can not be considered only big enough margin, since the format would lose its universality
- Encoding into sequence of bits - The data are coded into a potentially infinite sequence of bits. The sequence of two-state information mediators is selected because of its minimalism
- Compatibility with unary encoding - Unary encoding is the most simple solution, and should be therefore considered
- Use binary encoding where possible (Efficiency) - The aim is to get as close as possible to the binary representation of numbers, namely unary coding itself is very inefficient. Likewise, other encoding types, such as pure unary-binary encoding (gamma) or Fibonachi's coding, are effective from some perspectives, but there are some issues
- Determinism (Efficiency) - Although a unified expression of numbers is not a necessity, it is appropriate with regard to algorithms, to enforce the only one possible variant of the representation of the value. It is also necessary with regard to the above-mentioned efficiency of the encoding
- Processibility (Efficiency) - This requirement seeks to improve data processing efficiency. For example to process this encoding, as few as possible CPU (for example binary shift) operations should needed
- Organization of data in bit clusters (Efficiency) - This requirement is introduced because of the the organization of bits as the bytes in the processing on current computers. The numbers should be encrypted preferably with a length of exponent with the base of 2 instead of the sequence of bits into bit groups
UBNumber Encoding Family
Proposed UBNumber encoding consist of three components/parts: unary encoded value, optional recursive UBNumber-encoded part and binary encoded part.
Encoding uses bit clusters of fixed length, typically bytes. In basic form, where recursive part is not present, value of unary encoded part is followed by binary encoded part filling up the rest of sequence of clusters which count is based on value represented by unary part. This is similar to Unicode UTF-8 encoding.
When unary part would be longer than single cluster, recursive part is added instead and value of this part is used further extend cluster length of number. See examples below, where green color marks unary encoded part and red color marks recursive part. Optionally some encodings uses magenta color for value sign bit.
Native mapping of codes to values is demonstrated on UBNatural encoding variant, which can represent any non-negative integer value. Encoding can be also used to store other types of numbers, like for example real numbers.
UBNatural Encoding
For encoding of natural numbers with zero (non-negative integers or the Omega set), direct translation of UBNumber codes to incrementing integer values is used.
Examples of the type UBNatural:
00000000 = 0 00000001 = 1 ... 01111111 = 7Fh 10000000 00000000 = 80h 10000000 00000001 = 81h ... 10111111 11111111 = 407Fh 11000000 00000000 00000000 = 4080h ... 11111110 11111111 .. 11111111 = 10204081020407Fh \_____ 7 times ____/ ... 11111111 00000000 00000000 .. 00000000 = 102040810204080h \_____ 8 times ____/ ... 11111111 00000001 00000000 .. 00000000 = 10102040810204080h \_____ 9 times ____/ ...
Value encoding (value >> data stream)
Value := Input Length := 0 while Length <= ByteSize+1 and Value>=2^(ByteSize*(Length+1)) do { Length := Length + 1 Value := Value - 2^(ByteSize*(Length+1)) } if Length = 8 the { ExtLength := 0 while Value>=2^((ByteSize+1)*(ExtLength+ByteSize+1)) do { ExtLength := ExtLength + 1 Value := Value - 2^((ByteSize+1)*(ExtLength+ByteSize+1)) } write prefix write ExtLength } else write unary Length write binary Value
Value decoding (data stream >> value)
read prefix or Length if prefix then { Length := ByteSize + 1 read ExtLength } read Value Result := Value if prefix then { for each I from interval <1..ExtLength> Result := Result + (2^((I+ByteSize) * (ByteSize+1))) } for each I from interval <1..Length> Result := Result + (2^(I * ByteSize))
UBENatural Encoding
This encoding is designed for natural numbers with a special constant for the infinity. Infinity constant is placed on the highest value in the shortest form. Another option is to put infinity as a value 0, which would incline to have infinity as default value (see default zero principle on protocol transformation concept).
This type is used in the block structure to represent the length of data part and infinity constant expresses the undefined, potentially infinite length.
Examples of the type UBENatural
...0111110 = 7Eh 0111111 = ∞ 1000000 00000000 = 7Fh 1000000 00000001 = 80h ..
Conversion of the UBNatural value
if Value>(2^ByteSize) then Value := Value - 1 else if Value=(2^ByteSize)-1 then Value := ∞
UBInteger Encoding
This encoding is used to store integer numbers. Binary part is interpreted as a binary code in the two-complementary code with the highest bit as the sign. The sign flag is included as first occurring bit, because it might theoretically bring additional information how to processing the rest of the value.
Examples of the UBInteger type values
11011111 11111111 11111111 = -2041h 10100000 00000000 = -2040h ... 10111111 11111110 = -42h 10111111 11111111 = -41h 01000000 = -40h ... 01111110 = -2 01111111 = -1 00000000 = 0 00000001 = 1 ... 00111111 = 3Fh 10000000 00000000 = 40h 10000000 00000001 = 41h ... 10011111 11111111 = 203Fh 11000000 00000000 00000000 = 2040h ...
Value encoding
Value := Abs(Input) if Input<0 then Value := Value + 1 Length := 0 while Length <= ByteSize+1 and Value>=2^(ByteSize*(Length+1)) do { Length := Length + 1 Value := Value - 2^(ByteSize*Length-1) } if Length = 8 then { ExtLength := 0 while Value>=2^((ByteSize+1)*(ExtLength+ByteSize+1)) do { ExtLength := ExtLength + 1 Value := Value - 2^((ByteSize+1)*(ExtLength+ByteSize+1)) } write prefix write ExtLength } else write unary Length if Input<0 then Value := Neg(Value) write binary Value
Where the Neg operation is the inversion of the lowest ByteSize (Length +1) bits of the value Value (including sign) and Abs is a function of absolute value.
Value decoding
read prefix or Length
if prefix then {
Length := ByteSize + 1
read ExtLength
}
read Value
Result := Value
if prefix then BitLength := ByteSize*(ExtLength+8)+7 else BitLength := Length*ByteSize+(7-Length)
if (Value and 2^(BitLength))>0 then { // Test on negative sign
Result := ( - Neg(Result) ) - 1
For each integer I from interval <1..Length> Result := Result - (2^(I*ByteSize-1))
if prefix then {
for each I from interval <1..ExtLength> Result := Result - (2^((I+ByteSize) * (ByteSize+1)))
}
} else {
For each I integer from interval <1..Length> Result := Result + (2^(I*ByteSize-1))
if prefix then {
for each I from interval <1..ExtLength> Result := Result + (2^((I+ByteSize) * (ByteSize+1)))
}
}
UBEInteger Encoding
Like in the case UBENatural this is an extension of the integer number set adding constants for infinity, this time it includes both positive and negative one. Placed the similar way.
Examples of the type UBEInteger
... 10111111 11111111 = -40h 01000000 = -∞ 01000001 = -3fh ... 00111110 = 3Eh 00111111 = ∞ 10000000 00000000 = 3Fh ...
Conversion of the UBInteger value
if Value>(2^(ByteSize-1)) then Value := Value - 1 else if Value<(2^(ByteSize-1))+1 then Value := Value + 1 else if Value=(2^(ByteSize-1))-1 then Value := ∞ else if Value=-(2^(ByteSize-1)) then Value := -∞
UBReal Encoding
This encoding enables the representation of limited subset of real numbers. Because the real numbers are uncountable, it was necessary to limit to numbers with final development of the binary base form, because the endless numbers cannot be fitted into finite memory. Solution is based on the use of a pair of UBInteger values. The first one defines the base and the other mantissa, which determines how much the value is shifted on the base-2 scale.
To eliminate redundancy, method of adding invisible bit 1 before point is used. Only exception is in basic form, where value is decremented to support zero constant. This approach uses following algorithm:
if (Base = 0 and Mantissa = 0) the Value := 0 else { Value := (Base*2+1)*(2^Mantissa) if (Base > 0 and Mantissa = 0) then Value := Value - 2 }
Encoding numbers using this formula is a bit more complicated. It requires the division by the number two, until the remaining part is 1, or multiplication until number's fraction is less or equal equal to 1/2. The resulting whole number is then placed in the base and count of the performed division or negation of the count of performed multiplication is then placed as mantissa.
This encoding has some useful properties, for example, the mantissa rate determines whether there is a whole number, or number, which has nonzero fraction. Likewise, it is possible to determine the parity of the number from the parity of the base. So, if instead of the both numbers only one part is included, it can still provide some useful value. If, for example, only base is included, the mantissa is zero and odd numbers are encoded plus code for zero, which is not so useful as a second case when there is only mantissa so binary exponents with the exception of 1 replaced by value 0 are encoded.
Some numbers with endless development will be possible to store using higher mathematical algorithms and blocks, such as using recursion. Binary form of numbers is simply the best for the bit stream data.
Examples of the type UBReal
- numbers with zero mantissa:
... 10111111 11111111 00000000 = -81h 01000000 00000000 = -7Fh 01000001 00000000 = -7Dh ... 01111110 00000000 = -3 01111111 00000000 = -1 00000000 00000000 = 0 (1) 00000001 00000000 = 1 (3) 00000010 00000000 = 3 (5) ... 00111111 00000000 = 7Dh (7Fh) 10000000 00000000 00000000 = 7Fh (81h) ...
- other numbers with different mantisses:<br/>
<span style="color:green">0</span><span style="color:magenta">1</span>111111 <span style="color:green">0</span><span style="color:magenta">0</span>000001 = -2 00000000 00000001 = 2 00000001 00000001 = 6 00000010 00000001 = 10 00000000 00000010 = 4 00000000 00000011 = 8 00000000 01111111 = 0.5 00000001 01111111 = 1.5
Value encoding
Base := 0 Mantissa := 0 if not Value=0 then { if Frac(Value)=0 then { while not Frac(Value)=0.5 do { Mantissa := Mantissa + 1 Base := Base / 2 } } else { while not Frac(Value)=0.5 do ( Mantissa := Mantissa - 1 Base := Base * 2 } } } write UBInteger Base write UBInteger Mantissa
Value decoding
Base := read UBInteger Mantissa := read UBInteger if (Base=0 and Mantissa=0) then Result := 0 else { Result := (Base*2+1)*(2^Mantissa) if (Base>0 and Mantissa=0) then Result := Result - 2 }
UBReal as a Single Number
Real number with the final decimal development is likely to be possible to encode into a single UBNumber value, as it is equally possible to effectively encode any final sequence of integer numbers into a single natural number, but combining two numbers into single value is a bit problematic.
One possible solution is the diagonal method, but it is a bit computational-demanding:
0,0) (0,1) (0,2) (0,3) ... / / / / / / (1,0) (1,1) (1,2) (1,3) / / / / / / (2,0) (2,1) (2,2) (2,3) / / / / / / (3,0) (3,1) (3,2) (3,3) ...``` Resulted code sequence is build following diagonal lines:
Other possible method is bit interleaving, where in final value, odd bits are used to construct first value and even bits are used to construct second value. Value conversion is slightly easier to compute using binary shifting.
UBEReal Encoding
This is another extension adding constants for the +-infinity to real numbers. Basically, the mantissa with a base value of 0 is interpreted as UBEInteger type. This corresponds to constants:
00111111 00000000 = +infinity 01000000 00000000 = -infinity
Examples of the UBEReal type
10111111 11111111 00000000 = -7Fh 01000000 00000000 = -∞ 01000001 00000000 = -7Dh ... 00111110 00000000 = 7Bh 00111111 00000000 = ∞ 10000000 00000000 00000000 = 7Dh ...
UBReal value conversion
if (Base=0 and Mantissa=0) then Value := 0 else { if (Base=(2^ByteSize)-1 and Mantissa=0) then Value := ∞ else if (Base=-(2^ByteSize) and Mantissa=0) then Value := -∞ else { if (Mantissa=0) then { if (Base>0) then Value := Value - 2 if (Base<(2^ByteSize)-1) then Value := Value - 2 else if (Base>-(2^ByteSize)) then Value := Value - 2 } } }
UBRatio Encoding
Another considered type is encoding of real numbers in the fixed interval <0,1> with the finite binary development. This set is already countable large, and it is possible to encode it as a UBNumber value. Value is essentially stored as a count of divided of half cut to a depth intervals. This type can be used to store real numbers for any constant intervals of equal spread. Likewise, for saving intensity and other physical quantities. Encoding is based on point shifting.
Example conversion
0010111 =>(-1) 0010110 =>(*2+1) 00101101 =>(shift)001.01101 =>(-1) 0.01101 =>(calculate) 0.25 + 0.125 + 0,03125 = 0,40625 = 13/32
Example values of the UBRatio type
00000000 0 = 0 00000001 1 = 1 00000010 0.1 = 1/2 00000011 0.01 = 1/4 00000100 0.11 = 3/4 00000101 0.001 = 1/8 00000110 0.011 = 3/8 00000111 0.101 = 5/8 00001000 0.111 = 7/8 00001001 0.0001 = 1/16 00001010 0.0011 = 3/16 00001011 0.0101 = 5/16 ...
Value decoding
Value := Input if not (Value=0 or Value=1) then ( Value := Value + 1 while (Value = Trunc(Value)) do ( Value := Value * 2) Value := Trunc(Value/2) + 1 ) write UBNatural Value
Value encoding
Value := read UBNatural if (Value<2) then ( Result := Value ) else ( Result := Value*2 - 1 while (Result>2) do ( Result := Result / 2 ) Result := Result - 1 ) )
Other Considered Encodings
In addition to the above there were considered some additional encoding of some frequently used types.
UBNReal - nonnegative real number - the base is interpreted as UBENatural/UBNatural
UBCReal - integer number with development - mantissa is interpreted as UBNatural
UBComplex - complex number - stored as two real numbers
UBKvat - quaternion - four real numbers, but no practical use
UBFrag - fractions
UBBits - array of logical values - the number is interpreted as a field of 0/1, and it might make sense to define it if the format will be redundant. Thus it will be realized as data block
Any additional encodings might be described in other parts of the documentation.
Links
IEEE - IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985) https://en.wikipedia.org/wiki/IEEE754
Two's Complement Form https://en.wikipedia.org/wiki/Two%27s_complement
Alpha Coding - Alpha Unary Number Coding https://en.wikipedia.org/wiki/Unary_coding
Beta Coding - Beta Binary Coding https://en.wikipedia.org/wiki/Binary_number
Gama Coding - Elias Gamma Number Coding https://en.wikipedia.org/wiki/Elias_Gamma_coding
Page Source