[XBUP] XBUP - Extensible Binary Universal Protocol

» Concepts » Progress

Concept: Introduction of the Types

This document is part of the eXtensible Binary Universal Protocol project documentation. Provides description declaration of block and attribute types and way how to use them.

Type Introduction

In the previous parts of the documentation encoding numbers and block tree structure was described, where blocks has the sequences of attributes. We can described as an existence of certain duality between attributes and subblocks, since the attributes can be expressed as subblocks, but attributes are limited to finite value. In order to process the data, it is necessary to define the meaning of individual attributes and subblocks. Since the definition must be finite, it is necessary to limit to the finite number of items of both types, yet allow to realize high enumerable sequences of attributes. It is also necessary to consider how exactly will be expressed the relationships such as generalization / specialization. It seems appropriate to consider for example:

In the first case, these blocks can be considered as different, because using a different unit of measurement, although in many programming languages and databases there are still used only the basic expressions of the values without mentioning any specific units. In the second and third case it will be probably more appropriate to express the relationship rather as link to supobject, because such differentiation would lead to using huge count of types.

Block Type

In the following part there will be described a way how to recognize the meaning of data contained in the individual blocks. Few alternatives should be considered again:

Perhaps the best way is to identify the type of data that represents the block using its attributes. Single types of blocks should be divided into groupsof by the importance.

Obviously these attributes should be placed in the block as the first ones. Therefore if the block has at least two attributes, the first two values are known as UBBlockType and are as follows:

1 UBNatural - BlockGroup
2 UBNatural - BlockType

Blocks are organized by type into groups (Groups) and the BlockGroup value determines to which group the block belongs to. The value 0 means that it is a basic block, which is a block that is natively processible by programs using build-in support. The BlockType determines the specific type of block in the group. Allowed ranges of values and thus the meaning of groups of blocks, determines the definition block.

If the block has less than two attributes, it is possible to select several variants how to use such incomplete blocks:

Special case of single-attribute block can be used several ways:

Document Type

Also for recognition of the document as a whole, or for the type of the stream, it is possible to consider several options:

In addition to its own blocks there should be at the beginning of the stream a sequence of bits that would indicate the used method of coding. As mentioned in part relevant to coding, it is necessary to at least determine the size of the used cluster.

Byte - ClusterSize = FEh

In order to determine the ClusterSize value, which may be of any value, it must be introduced at the beginning of the file, since the encoding on of the following values depends on it. This value is due to the universality encrypted unary. The advantage is that its code has the same as the cluster, usually for ClusterSize = 7. Using this value also exclude the use of purely unary coding.

For development purposes the header of the file is enriched by several other chars. Text characters are placed in the header for compatibility with existing operating systems and are readable only for ClusterSize 8x + 7 The existence of these values is a purely technical nature and they may be removed in later releases.

UBNatural - ProtocolVersion = 00h
DWord(4xUBNatural) - ProtocolSignature = 58 42 00 XXh (“XB” + development version)

Protocol version 0 is reserved for the protocol development stage. The development version then specify particular structure of the file and any incompatible changes in shall be reflected in a change to this value.

If the file contains no other data, then it's called an empty file. Otherwise, the data is processed as a single block, and data after that block are called as extended area. This should allow the use of protocol for the specifications of XBUP bitstream of generally infinite length. One reason is that in some operating systems file name extension is not used to distinguish the type of file, but just the first characters of the content are usually accessed. The header can be interpreted as a 32-bit identification number and 16-bit version number. It is assumed that the final version will have different file headers.

The principle of Default Zero

One of the interesting features in the block attributes interpretation is the possibility of using the principle of implied zero, which says that if there is not the attribute present in the attribute part, it is equivalent to the state as if it had been present and had zero value. This principle can be used to shorten the length of the whole block where the attributes which are usually 0 can be place at the end of a sequence of attributes and in the practical application of them they will be removed. This principle can be also used as an argument for declaring the order of the block attributes.

This technique also helps with the implementation of compatibility realized as an extension of the previous version providing new attributes with special meaning. Also, it would be appropriate to use this principle while defining the rules for the construction of attributes order.

In the case of the use of this technique it is possible establish a clear record, which indicates the minimum number of attributes, which means that it presents the only attributes to the last non-zero value, followed by zero values only, which are not present in the block.

Groups of Blocks

Another consideration is trying to solve the problem how to organize the groups.

The count of blocks in the group may be hypothetically infinite. However, it is appropriate to comply with the final number, in a negligent manner not save the value in the endless sequence of definitions.

Relations Between Blocks

Between the blocks in the tree there is defined the basic relation of a parent-child as the tree of definition's goes, which may not fully cover the needs of data representation. An important aspect here is for example, is a dynamic context, which allows to replace the various blocks between each other. Block relations to other individual data items (parameters) can be addressed in several ways:

Since the protocol is designed as a dynamic, static variant requires existence of the dynamics on the definitions side, which is not viable. Variant using block type recognition would need to scan data and therefore possibly to cache, as it might be required to return to them. Full or direct referencing raises the demand for data capacity, but also contradicts the concept of a blob of data and require extence of links which are not necessary. Direct references could provide the necessary momentum for a reasonable price of single attribute per parameter.

As the solution static option was chosen, which best suits the concept of a data block as a blob. For the need to create a list of blocks is, however, expanded on the possibility of using the attribute to determine the number of items of the same type in sequence of subblocks. Reordering can be implemented using links. It's possible to handle types of parameters using a number of ways, such as:

Revision

To ensure backward / forward compatibility, it is useful to allow to support the addition of new definitions in the specification, while maintaining the existing ones. It is possible to provide either a higher level protocol or to define a revision already at this level. The revision technique defines how the document is processed so that the application can handle the newer / older revisions.

Possible approaches for the definition of a block type:

Attribute Types

The next step is the introduction of the types of attributes.

In the case of multivalued attributes the question is how to deal with unlimited large sequence. Also here you can specify the size of the used area, but in this case using the number of attributes seems more appropriate, also thanks to the possible conflict with the principle implicit zero.

There is also possibility of introducing some connection between attributes and blocks, which represents just one value. In this comparison the attribute would represent a block without parameters and types of attributes could be presented as a sequence of such blocks. It is also appropriate to consider whether it is possible to apply tree hierarchy on the attributes just like on the blocks.

Attribute Type Examples

As a simple attributes can mentioned sequences of UBNumber values with the fixed number of elements. The basic and already mentioned types are UBNatural, UBInteger and their variants expanded for the infinity constants and UBRatio type. These types can then be extended to the meaninf of defined specification, for example, using units or any other specific meaning.

Pointer

This type (UBPointer) is the basic for the solution to the problem of linking the documents. Unlike the XML it is not appropriate here for the internal links in the document to use subnodes search, because especially with regard to possible transformation it could be a problem to identify a specific node. It is possible to choose between several possible solutions:

The chosen solution for the UBPointer attribute type is realized as the following value:

UBNatural - SubBlockIndex

Value is used for referring to his own subblock using the index value of the order between subblocks. In the case when referenced block is not present, a corresponding error WrongPointer is raised. Blocks are indexed from 1 and value 0 means empty pointer.

An alternative approach is the UBAccPointer, which is similar to the previous option only in the case of zero assuming the position next from the last position of UBAccPointer in this node.

Boolean Type

Simple values UBNatural has restrictions on the value to 0 and 1 and was established as the UBBoolean type for storing logical values.

Alternatively, you can use the value of UBBitField, which gives array of bits for bits of UBNatural value.

Fractions

Target here is to enable the implementation of the fractional values. These values are determined by calculation and therefore should not be included as the basic types.

UBFraction type is used for the implementation of the fraction in the interval <0,1> with non-negative integer values and without division by zero. From the perspective of the respective real values have a repeating values. Values are stored following sequence:

1/1, 1/2, 2/2, 1/3, 2/3, 3/3, 1/4, 2/4, 3/4, 4/4, …
Sequence[n=1..][m=1..n](m/n)

UBIntFraction type is an extension for the whole members.

Attribute Sequences

Because of the need for complex blocks it was necessary to define a specific sequence of attributes showing compound information. Single items have their own names and forms a certain hierarchy.

There is more possible ways how to deal with such attribute groups:

Whether this encoding should introduce a new level, or possibly merge some characteristics into one level it is not yet entirely clear and will be decided later. In the meantime, it is possible to continue without a solution to this problem.

Examples of some types of multivalued attributes will be included. Some of them were already mentioned in part about encoding, or in this document in part about BlockType.

Real and Complex Numbers

Real number UBReal is already described in the section dealing with the encoding:

UBInteger - Base
UBInteger - Mantis

There are also complex numbers available:

UBReal - RealPart
UBReal - IrationalPart

It is also possible to use the extension of those types including constants for infinity, which is UBEReal, and UBEComplex. Alternatively, use of those types can be restricted on the positive, or integer variants, such as UBPositiveReal (UBCutInteger / UBTruncate).

Version of Block

Blocks which is the compatibility required are declared using the following UBVersion type, which is the sequence of two attributes to determine the version of the block:

UBNatural - MajorVersion
UBNatural - MinorVersion

If both values are zero, assuming that there is not a version of the block. MajorVersion = 0 value is a test version. For an expanded version of UBVersionExt there is usually followed attribute:

UBPointer - AlternativeBlock

It is a reference to the other blocks of the same type but with a different version. For the realization of the version it is the same as in the case of the need for two values. The first value determines backward and the other forward compatibility. For the same value MajorVersion there must be guaranteed increasing value of MinorVersion that the sequence of attributes is only extended to include new items.

List

List UBList is the structure defining the final list of attributes:

UBNatural - ItemsCount
UBNumber - Value 1
..
UBNumber - Value n

Alternatively, allow UBENatural ItemsCount?

Dynamic Sequences of Attributes

It seems to be appropriate to allow the creation of items represented using a variable number of attributes. Implementation of these sequences is somewhat problematic:

Path

This type is called UBPath and is defined as a sequence of UBNatural type values and is intended primarily for the implementation of the path in the tree.

UBNatural - PathCount
UBPointer - Path0Node
UBPointer - Path1Node
UBPointer - Path2Node

Using the previous type there can be constructed UBLink as reference to another block in the document.

UBNatural - UpCount
UBPath - LinkPath

List of Linked Items

The following UBPointerList type is similar to the UBList type, where various items of the list are referenced using the UBPointer type value, which allows putting them in a different order than those defined in the index. It is also possible to insert additional blocks between the individual items.

UBNatural - ItemsCount
UBPointer - Item0
UBPointer - Item1
UBPointer - Item2

Attribute Types Hierarchy

Specification of the block from the previous level of the document can define a list referring to the blocks representing the various attributes. Block, representing the attribute should allow to specify the type attribute as follows.

Attributes are defined as well as blocks in the tree structure. Root type of attribute is the UBNumber. The current proposed structure of attributes is as follows:

Type System

Currently selected variant is used to define type construction list, which defines the list of items which are of two possible kinds of operations, either to connect (JOIN) or to add (consist), while the addition will add another item to the end of the subtype, but the connection will add all items referenced by the definition of the same type. These lists link together the three types of items defining the format, group, and block of the document. Each of these definitions is defined by a list of revisions defining the number of operations. For the block specification there are also defined operations for finite and infinite list.

In addition, the design of the block allows other exceptions:

The block definition allows to define the attributes and parameters. This makes it possible to partially address the duality between the attributes and subblocks, which is defined under one definition of a list of attributes and at the same time as a block that uses these attributes.

The definition of a type system are stored in the catalog of types, where it is possible to use your own definitions using the built in basic blocks and later it should be possible to add the definition from any source.

The following chart shows the ER diagram of the type definition in the catalog of types, including the tree hierarchy of categories of the definitions:

Type definition's ER diagram

Diagram source file diagram4.dia

As other alternatives, it should be considered to define the two separate lists and express the connection other way. …

So, there are special block for which we need to distinguish what type of block means what. It was also noted that the document type can be determined from the contents of the root block. There are again several ways to interpret the block type:

Probably for the document specification, it is necessary that there will be fixed blocks, which would allow at least to define the meaning of other blocks in the document. For these blocks there is reserved range of value with BlockGroup = 0 and the full support of these blocks is required for all applications that support level 1 and higher levels of the XBUP protocol.

On the layout of the basic blocks there are set similar requirements as to the structure:

In addition, it is necessary to introduce declaration of list for both the attribute list and a list of parameters. There is a need to consider the following aspects:

Basic Blocks

Basic blocks should primarily allow creation of a type definitions and for basic constraints for its addressing. Since the types of blocks are determined dynamically, it is necessary to allow the definition of groups and blocks in the document. For this purpose it is appropriate to define a group of blocks which would allow to specify the meaning of other groups and types of blocks of the document and optionally use the built in definitions or catalog. In addition, there should be a root block of a document specifying what type of data contains the document. A viable solution is to use the root block to specify the format and the main block of the document. Spefication can be both external and internal - contained in a document and also at the same time, the internal definition takes precedence over the catalog.

Basic blocks should therefore meet the definition of a type system and of the catalog and are defined in the Basic (0) / Basic (0) and always implicitly defined for the group 0, while a block of type (0,0) is restricted due to the possible use of the principle of default zero for data blocks. So blocks have increased value by one for groups.

Declaration

Block: Basic (0) / DocumentDeclaration (1)

Declaration block determines the allowed range of groups. This block should be located at the beginning of each file, if the application didn't deal any special static meaning.

Definition:

Join GroupsReserved (UBNatural) - The number of reserved groups
Join PreserveCount (UBNatural) - The number of groups to keep from previous definitions
Consist FormatDeclaration - Declaration of format
Any DocumentRoot - Root node of document

For subblocks of this block there is permitted range of values in the interval group PreserveCount + 1 .. PreserveCount + GroupsReserved + 1. PreserveCount + GroupsReserved + 1. If the value PreserveCount = 0, takes the highest not yet reserved group in the current or parental blocks + 1. For all values of zero and the application of rules of cutting the block of zeros coincides with the data block.

Format Declaration

Block: Basic (0) / FormatDeclaration (2)

This block allows to specify the basic structure of an equivalent of format specification. Specifies the sequence of groups and their definition.

Definition:

Join GroupsLimit (UBNatural) - Maximum allowed value of group for those types of blocks
Join FormatSpecCatalogPath (UBPath) - Specification of format defined as path in catalog
Join Revision (UBNatural) - Specification's revision number
List GroupDeclaration - Declaration of group
List FormatDefinition - Format definition
List Revision - Specification's revision
List GroupDeclaration defines a sequence of groups of format, while the FormatDefinition defines a sequence of operations Join / Consist. Together with the list of revisions it defines the specification of format.

Group Declaration

Block: Basic (0) / GroupDeclaration (3)

This basic block represents declaration of the group. It specify the sequence of block specifications and their definition.

Definition:

Join BlocksLimit (UBNatural) - Maximum allowed value for block for those types of blocks
Join GroupSpecCatalogPath (UBPath) - Specification of format defined as path in catalog
Join Revision (UBNatural) - Specification's revision number
List BlockDeclaration - Declaration of block
List GroupDefinition - Group definition
List Revision - Specification's revision
List BlockDeclaration determines the sequence of blocks in the group, while the sequence of Join/Consist operations is defined by GroupDefinition. Along with the list of revisions it defines specification of the group.

Block Declaration

Block: Basic (0) / BlockDeclaration (4)

The definition of blocks has two levels, since it is necessary to define both attributes and subblocks.

Definition:

Join AttributesLimit (UBNatural) - Maximum allowed number of attributes for block (includes lists)
Join ParametersLimit (UBNatural) - Maximum allowed number of parameters (includes lists)
Join BlockSpecCatalogPath (UBPath) - Specification of format defined as path in catalog
Join Revision (UBNatural) - Specification's revision number
List ListDeclaration
List BlockDeclaration
List BlockDefinition
List Revision - Specification's revision
List BlockDeclaration determines the sequence of blocks in the group, while the sequence of Join/Consist operations or alternatively ListJoin/ListConsist is defined by BlockDefinition. Along with the list of revisions it defines specification of the block.

Format Definition

Block: Basic (0) / FormatDefinition (5)

Definition of format as a sequence of values to merge.

Definition:

Join ConsistSkip (UBNatural) - Number of items before the merge
Join JoinCount (UBNatural) - Number of merged items
Consist FormatDeclaration - Declaration of format

Group Definition

Block: Basic (0) / GroupDefinition (6)

Definition of group as a sequence of values to merge.

Definition:

Join ConsistSkip (UBNatural) - Number of items before the merge
Join JoinCount (UBNatural) - Number of merged items
Consist GroupDeclaration - Declaration of group

Block Definition

Block: Basic (0) / BlockDefinition (7)

Definition of block as a sequence of values to merge.

Definition:

Join ConsistSkip (UBNatural) - Number of items before the merge
List ListSpecification - List specification
Join JoinCount (UBNatural) - Number of merged items
Join IsList (UBBoolean) - Indication of list merging
Consist BlockDeclaration - Declaration of block

List Declaration

Block: Basic (0) / ListDeclaration (8)

This specification block defines the potentially endless lists of parameters.

Definition:

Join ConsistSkip (UBNatural) - Number of items before the merge

Revision Definition

Block: Basic (0) / RevisionDefinition (9)

For a definition of revision separate list is needed.

Definition:

Join RevisionCount (UBNatural) - Number of revision items

Todo: Missing argumentation for order of basic blocks and their attributes, etc..

Attribute Type Specification

As an extension of first level there is possible to establish attributes typing. In the initial phase the meaning of the attributes will be defined using a text description, and later it will be extended for algorithmic definition, possibly based on the mathematical principles.

Basic Types

Basic types correspond to the above-mentioned types of attributes.

Compound Block Types

This group of blocks is needed for the construction of more complex blocks, which are consisting of more simpler parts. This is essentially about sequences, and collections. Examples of the use can be found in some already defined document specification for lists of blocks and groups.

Document Specification

Here are described some of the possible ways how to define the type of blocks in the document. (obsolete)

Document Definition

The definition of a document is a separate document determining the permitted ranges of groups and block types. In the case of the specifications it points the values of GroupListPointer and DocumentRootPointer to the same block.

Examples of Definition

Definitions may vary mainly in what part is externally available.

  Groups Reservation
    List
      Group Specification
        List
          Block Specification
           ...
          Block Specification
        List (...)
         ...
        List (...)
      Group Specification (...)
       ...
      Group Specification (...)
  Groups Reservation
    Link
      The Root of the Internet Catalog
  Groups Reservation
    List
      Link
        The Root of the Internet Catalog
       ...
      Link
        The Root of the Internet Catalog

It is possible to combine specifications or declare it on lower levels as needed.

TODO: Specification with alternative shape and with the reference to the catalog.

Document Processing

The following text describe the how to deal with the document specifications. This is mainly about the techniques of how to perform control checks and connect specifications into the sequence.

Specification's Processing

Defined specifications should be processed using appropriate method. Although it is possible to store the table for each block, it would be very inefficient. The outline of usable proposed method follows.

Active Specification

Current specification maintains values of the indexes to the catalog for the currently processed element and keeps a list of the existing range of groups up to lower levels. In the case that we want to handle another block of the document, it's possible to travel up in the tree so far as is necessary and delete definition of groups using the table. After that going through the blocks the way to the desired node and process block specifications.

Preprocessed Specifications

Lets walk through the document to depth and prepare a specification table for each specification block. For the current block it is possible to get copy of the specification. In the processing of another lets walk through his ancestries, until we hit on the specification block, which table we can use.

Document Validation

The rules for each level should be checked for compliance with the required limits. The corruption might be caused by a mistake of the applications, or with the file damages. Checking the document is split on the rules for determining the validity and to determine the document compatibility. While the validity determines if the file is properly written and, therefore, is processible for real work, compatibility checks to determine whether document is possible to use in the specific application. In the case of the XBUP protocol validation methods forms similar hierarchy as levels.

Document Validity

The document is valid if it is properly created and all types of blocks and their attributes are properly defined. This precisely means:

Document Compatibility

Compatibility is a property of the document saying that this document is processible by the given applications. The application is compatible if:

Todo: