(TBD) M. Hajda
Internet-Draft ExBin Project
Intended status: Standards Track June 17, 2020
Expires: December 19, 2020

Extensible Binary Universal Protocol (XBUP)


The Extensible Binary Universal Protocol (XBUP) is a prototype of general purpose binary data protocol and file format with primary focus on data abstraction and transformation.

This proposal describes specification of version 0.2 of bottom levels of the protocol and set of basic data types and the recommended API.


This document is being worked on by ExBin Project, published here in order to gather comments and to raise interest in this project.

To participate on the development of this project, visit https://xbup.exbin.org/?participate.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on December 19, 2020.

Copyright Notice

Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

The Extensible Binary Universal Protocol (XBUP) is a prototype of general purpose multi-layer binary data protocol and file format with primary focus on abstraction and data transformation.

Key features:

Secondary features includes some capabilities inspired by markup languages like SGML/XML [XML] and data representation languages like YAML [YAML], JSON [RFC4627] and similar binary formats like ASN.1 [ASN.1], HDF5 [HDF5], efficient XML [EfficientXML] or Protocol Buffers [ProtoBuf].

Primary focus on abstraction makes this protocol somewhat different compare to other similar binary formats which focus on efficiency, serialization or binary representation of a specific mark-up language. See Formats comparison for more information.

This protocol technically overlaps in functionality with many currently widely used protocols and formats including those defined by various RFCs and is very different in nature compare to currently used internet protocols. Therefore it should be considered as somewhat discruptive and various aspects should be inspected, whether potential advantages this protocol could provide overweight complexity and complications it brings with, see [RFC3117] for design consideration.

1.1. Goals

The primary goal of this project is to create a communication protocol / data format with the following characteristics, ordered by priority:

1.2. Motivation

Project should provide universal protocol as a more feature-rich alternative to currently used binary protocols. It should provide general methods for handling data of various character and types including:

From the users point of view, protocol should provide new capabilities or enable new development in various areas:

1.3. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

The term "byte" is used in its now-customary sense as a synonym for "octet" - sequence of 8 bits.

2. XBUP Specification

XBUP is multi-layer protocol for representation of data in bit/byte stream, where each layer is build on top of previous layer providing new capabilities, like new constraints and/or features. Typically, higher levels also declare back entities used in lower levels.

Applications can support only up to specific level of XBUP protocol when full support is not necessary.

Layers are indexed as levels by depth starting with level 0.

Layers of the protocol

Level Layer
0 Tree Structure
1 Type System
2 Transformations
3 Relations

2.1. Level 0: Tree Structure

Lowest protocol's level defines basic tree structure using two primitive types.

Nodes are represented as one or more blocks of bits with child blocks representing child nodes of the tree.

2.1.1. UBNumber Encoding

UBNumber is encoding which combines unary and binary encoding to represent values with dynamic length. It is typically representing natural non-negative integer number (or value of any other type with deterministic mapping to well ordered / countably infinite set).

Encoding is similar to UTF-8, except that UBNumber is applied recursively when unary part fills all bits of the first byte.

To decode value, non-zero bits are counted for length up to 8 bits and then rest of bits is used as value. Value is also shifted so that there is only one code for each number.

Examples of the UBNatural - codes sequence of bits = value represented value in basic natural non-negative integer number:

  0 0000000                                 = 0
  0 0000001                                 = 1
  0 0000010                                 = 2
  0 0000011                                 = 3
  0 1111111                                 = 7Fh = 127
  10 000000 | 00000000                      = 80h = 128
  10 000000 | 00000001                      = 81h = 129
  10 111111 | 11111111                      = 407Fh = 16511
  110 00000 | 00000000 | 00000000           = 4080h = 16512

UBNumber codes and values

Various interpretations can be mapped on UBNumber encoding. For level 0 following two mappings are used:

2.1.2. Document

Single document is typically represented as a single block, but there can be additional optional tail data present.

To store document in the file in file system or for use in the undeclared data stream, additional "Document header" should be present.

Document header contains information about protocol version shall be included. For the version 0.2 of the protocol, it is 6 bytes long data blob. Explanation of the each value is non-conformant, primary use is padding to help systems which uses beginning of file for identification of file type.

Document header with hexadecimal values:

Structure of file header

Document Header Bytes
Byte Content
FE Unary encoded size of cluster (byte)
00 Reserved for future versions
58 ASCII constant 'X'
42 ASCII constant 'B'
00 UBNatural encoded major version
02 UBNatural encoded minor version

Primary block called "Root Block" follows and any data after root block are optional blob called "Tail Data".

2.1.3. Block

Each block starts with single value:

  UBNatural attributePartSize

Value attributePartSize is not allowed to equal 0 for the block as it is used for "Terminator" handling (see below).

Block continues with attribute part which is blob of the length specified by attributePartSize in bytes.

First value in attribute part represents:

  UBENatural dataPartSize

Rest of the data (if any) in the attribute part is interpreted as a nonempty sequence of attribute values encoded in any UBNumber encoding. Binary blob called data part follows after attribute part (is optional / could be empty).

If the dataPartSize value fills exactly whole attribute part (there is exactly zero attributes in attribute part) then this block is called "Data Block" else block is called "Node Block".

After data part block ends.

  | == Block ========================== |
  |                                     |
  | UBNatural attributePartSize         |
  | == Attribute part ================= |
  |                                     |
  | UBENatural dataPartSize             |
  | UBNumber attribute 1                |
  | ...                                 |
  | UBNumber attribute n                |
  | == Data part (optional) =========== |
  |                                     |
  | Single data blob or child blocks    |

Effectively, transferred data are represented by a sequence of attributes and child blocks or data blob, while attributePartSize and dataPartSize values are present for the structural purpose.

See examples of blocks.

2.1.4. Node Block

When there is at least one attribute value in attribute part, block is called node block. Data in data part are interpreted as a sequence of (child) blocks.

Data part has length in bytes specified by dataPartSize value or if dataPartSize equals infinity, sequence of child blocks must be infinite or terminated by terminator (single zero byte value).

If there are no child blocks, block is called leaf block.

2.1.5. Data Block

When there is no attribute value in attribute part, block is called data block. Data in data part are interpreted as a binary blob.

Data part has length in bytes specified by dataPartSize value or if dataPartSize equals infinity, data part is processed by byte and each value zero is used as a escape code, where directly following byte means:

If there are data in data part, block is called empty block.

2.1.6. Validity

Binary stream is structured correctly as XBUP document (well-formed) if the following conditions are met. Description of invalid state is also included for each condition.

2.2. Level 1: Block Types

Level 1 introduces block types and catalog of types. Since this level, if attribute is defined, but not present, it's value is considered as zero as in the form of UBNumber encoding.

2.2.1. Block Type

First two attributes in node block are interpreted as follows (somewhat similar to XML Namespaces):

  UBNatural - TypeGroup
  UBNatural - BlockType

Block type attributes

These two values determines block type. Block types are organized into groups where TypeGroup value specifies to which group block type belongs and BlockType value specifies particular block type in the corresponding group.

TypeGroup with value 0 is basic build-in group and cannot be overridden. Basic blocks provides ability to specify meaning of other groups via block type declarations, definitions or links to catalog or external source.

2.2.2. Type Context

For each block, there is type context which provides mapping of particular block type (as defined above) to particular declaration/definition (similar to XML Namespaces context). Context is the same for block and all it's children, except for "Document Declaration" block which is used to change context.

Range of groups and range of blocks for each group is speficied.

2.2.3. Block Type Definition

Block type is defined as a finite sequence of operations where each operation defines one or more attributes and/or child blocks. Operation can define unspecified or refer build-in or previously defined types. There are variants for singular item and list of items, 8 operations in total:

Following syntax is used in this document (no final syntax is decided yet):

  any - Single block
  attribute - Single attribute
  block_type_name - Consist of definition
  +block_type_name - Append definition

Block type attributes

List variants of the operations ends with brackets "[]" after type name.

From the abstraction perspective (more about abstraction) type definition as data definition is in default mode done using consist operation (as child blocks), this allows to define data type as set of singular child types or sets of child types including infinite number of them, but additionaly members must be well ordered (list). Join operation is from this point of view doesn't change much, except it's restricted to finite sets.

2.2.4. Basic Blocks Definition

Following blocks are defined as build-in group 0, but also defined in catalog. Unspecified (0)

This block is used for unspecified block values or data padding. Can be used to represent nil / null values. Document Declaration (1)

Declaration block determines the allowed range of groups. This block should be located at the beginning of each file, if the application didn't provide any static/special meaning, but it might be used anywhere inside document as well.

  +Natural groupsCount - The number of allocated groups
  +Natural preserveGroups - The number of groups to keep from
    previous declarations
  FormatDeclaration formatDeclaration - Declaration of format
  Any documentRoot - Root node of document

Document Declaration

For subblocks of this block there is permitted range of values in the interval group preserveGroups + 1 .. preserveGroups + groupsCount + 1. preservedGroups + groupsCount + 1. If the value reserveGroups = 0, takes the highest not yet reserved group in the current or parental blocks + 1. For all values of zero and the application of rules of cutting the block of zeros coincides with the data block. Format Declaration (2)

Format declaration allows you use either declaration from catalog or local format definition or both.

  +CatalogFormatSpecPath catalogFormatSpecPath - Specification
    of format defined as path in catalog
  +Natural formatSpecRevision - Specification's revision number
  FormatDefinition formatDefinition

Format Declaration Format Definition (3)

This block allows to specify the basic structure of format specification. Specifies the sequence of parameters using either join or consist operation.

  Any[] formatParameters - Join or Consist format parameters
  +RevisionDefinition[] revisions

Format Definition Format Join Parameter (4)

Join parameter for format definition.

  +FormatDeclaration formatDeclaration

Format Join Parameter Format Consist Parameter (5)

Consist parameter for format definition.

  +GroupDeclaration groupDeclaration

Format Consist Parameter Group Declaration (6)

Group declaration allows you use either declaration from catalog or local group definition or both.

  +CatalogGroupSpecPath catalogGroupSpecPath - Specification
    of format defined as path in catalog
  +Natural groupSpecRevision - Specification's revision number
  GroupDefinition groupDefinition

Group Declaration Group Definition (7)

This block allows to specify the basic structure of group specification. Specifies the sequence of parameters using either join or consist operation.

  Any[] groupParameters - Join or Consist group parameters
  +RevisionDefinition[] revisions

Group Definition Group Join Parameter (8)

Join parameter for group definition.

  +GroupDeclaration groupDeclaration

Group Join Parameter Group Consist Parameter (9)

Consist parameter for group definition.

  +BlockDeclaration blockDeclaration

Group Consist Parameter Block Declaration (10)

Block declaration allows you use either declaration from catalog or local block definition or both.

  +CatalogBlockSpecPath catalogBlockSpecPath - Specification
    of format defined as path in catalog
  +Natural blockSpecRevision - Specification's revision number
  BlockDefinition blockDefinition

Block Declaration Block Definition (11)

This block allows to specify the basic structure of block specification. Specifies the sequence of parameters using either join, consist, list join or list consist operation.

  Any[] blockParameters - Join or Consist or List Join or List
    Consist block parameters
  +RevisionDefinition[] revisions

Block Definition Block Join Parameter (12)

Join parameter for block definition.

  +BlockDeclaration blockDeclaration

Block Join Parameter Block Consist Parameter (13)

Consist parameter for block definition.

  +BlockDeclaration blockDeclaration

Block Consist Parameter Block List Join Parameter (14)

List join parameter for block definition.

  +BlockDeclaration blockDeclaration

Block List Join Parameter Block List Consist Parameter (15)

List consist parameter for block definition.

  +BlockDeclaration blockDeclaration

Block List Consist Parameter Revision Definition (16)

Revision allows to define parameters count for particular specification definition.

  +Natural parametersCount

Revision Definition

2.2.5. Main Catalog

To specify basic data types, catalog of block type definitions is established.

Catalog is structured as a tree of definitions, where each block type has a unique identifier (sequence of natural numbers). Tree nodes are denoted by ownership base and are suppose to follow similar pattern like internet domain names.

Additional to block, group and format specifications, catalog can contain basically any other data which will be properly specified on further protocol levels, for example:

For basic access, catalog should be accesible as single document stored in XBUP format.

2.2.6. Additional Catalog

Additional catalogs can be addressed from external sources.

2.3. Level 2: Transformations

In general, block transformation is data flow from one block type to another block type (more about abstraction). Transformation can be used for multiple tasks and cover various operations with data.

This level introduces capability to define transformations in catalog and automatically performs conversion between blocks.

Transformations can be also used for:


2.4. Level 3: Ontologies

Following level can additionaly specify more about meaning of the data:

2.5. Data Types

Following section defines various data types considered for specification in catalog.

2.5.1. Natural

Natural type for UBNumber encoding is to use non-negative integer mapping.

2.5.2. Integer

Integer type is stored in UBNumber encoding using 2-complement form similar to how it is used in computing.

3. Appendixes

3.1. Examples of Blocks

Fixed size node block with one attribute

Stream Data
Byte Value
02 AttributePartSize
00 DataPartSize
77 Attribute 1 of value 0x77

Terminated node block with one attribute

Stream Data
Byte Value
02 AttributePartSize
7F DataPartSize
05 Attribute 1
00 Terminator

Fixed size data block

Stream Data
Byte Value
01 AttributePartSize
01 DataPartSize
BB One byte of data 0xBB

Terminated empty data block

Stream Data
Byte Value
01 AttributePartSize
7F DataPartSize
00 Data block escape
00 Termination value

Fixed block with one child

Stream Data
Byte Value
02 AttributePartSize
03 DataPartSize
66 Attribute 1 of value 0x66
02 AttributePartSize
00 DataPartSize
77 Attribute 1 of value 0x77

3.2. Appendix 1: Abstraction

Concept of data used in this protocol is using set theory, where set must be countable as well for each member of data there must exist mapping to well-ordered countable set.

At the same time data definitions are similar to table columns definition used in relation databases, except that infinite number of items is also supported.

Protocol processing is based on broad concept of dataflow paradigm, which typically state that there are input data, process and output data.

Additional requirement here is, that process must be deterministic (for same input return the same output), but other than that, it can be run in any manner - from local function in memory up to remote process in cloud.

3.3. Appendix 2: Parsing

Similar to parsing of textual formats, it's possible to provide parsing capability for binary protocol.

3.3.1. Level 0 Parsing

To process level 0 protocol following 4 types of tokens are used:

Following simplified grammar can be used for token processing.

  Document ::= header + Block + data
  Block ::= begin + Attributes + Blocks + end | begin + data + end
  Blocks ::= Block + Blocks | epsilon
  Attributes ::= attribute + Attributes | epsilon

Simplified grammar

3.4. Appendix 3: Format Comparison

While there are various binary formats and markup languages available, this project aims to take somewhat different approach to data representation.

4. IANA Considerations

In the current early state of the development of the protocol, just basic media type for general files is defined: application/x-xbup

5. Security Considerations

Security was not considered at current level of the development.

6. Acknowledgements


7. References

7.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.

7.2. Informative References

[ASN.1] Union, I. T., "Information Technology -- ASN.1 encoding rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER)", 1994.

ITU-T Recommendation X.690

[EBML] Lhomme, S., Rice, D. and M. Bunkus, "Extensible Binary Meta Language", Work in Progress, draft-ietf-cellar-ebml, 2020.
[EfficientXML] Schneider, J., Kamiya, T., Peintner, D. and R. Kyusakov, "Efficient XML Interchange (EXI) Format 1.0 (Second Edition)", February 2014.
[HDF5] Group, T. H., "HDF5 File Format Specification Version 3.0", April 2016.
[IEEE.754.1985] Institute of Electrical and Electronics Engineers, "Standard for Binary Floating-Point Arithmetic", August 1985.
[ProtoBuf] Google, "Protocol Buffers", 2020.
[RFC3117] Rose, M., "On the Design of Application Protocols", RFC 3117, DOI 10.17487/RFC3117, November 2001.
[RFC4627] Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, DOI 10.17487/RFC4627, July 2006.
[RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, October 2013.
[XML] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E. and F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth Edition)", November 2008.

W3C Recommendation REC-xml-20081126

[YAML] Ben-Kiki, O., Evans, C. and I. Net, "YAML Ain't Markup Language (YAML[TM]) Version 1.2, 3rd Edition", October 2009.



Author's Address

Miroslav Hajda ExBin Project EMail: exbinproject@gmail.com URI: https://exbin.org/

Table of Contents