(NONE) M. Hajda Internet-Draft ExBin Project Intended status: Experimental 22 September 2023 Expires: 25 March 2024 Extensible Binary Universal Protocol (XBUP) draft-ietf-exbin-xbup-core-00 Abstract The Extensible Binary Universal Protocol (XBUP) is general purpose binary data protocol and file format with primary focus on data abstraction and data transformation. This proposal describes specification of the currently developed prototype version, example set of basic data types and the recommended API. Protocol is part of the ExBin Project (https://exbin.org), which aims to provide proof-of-concept implementation and support for wider set of functionality. NOTICE: This is not official or finished document and is not yet enrolled for any official track to be registered as IETF RFC. Contributing This document is being worked on by ExBin Project (https://exbin.org), published here in order to gather comments and to raise interest in this project. To participate on the development of this project, visit https://xbup.exbin.org/?participate (https://xbup.exbin.org/?participate). Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Hajda Expires 25 March 2024 [Page 1] Internet-Draft XBUP September 2023 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 25 March 2024. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 2. XBUP Specification . . . . . . . . . . . . . . . . . . . . . 4 2.1. Level 0: Tree Structure . . . . . . . . . . . . . . . . . 5 2.1.1. UBNumber Encoding . . . . . . . . . . . . . . . . . . 5 2.1.2. Document . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3. Block . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.4. Node Block . . . . . . . . . . . . . . . . . . . . . 8 2.1.5. Data Block . . . . . . . . . . . . . . . . . . . . . 9 2.1.6. Validity . . . . . . . . . . . . . . . . . . . . . . 9 2.1.7. Summary . . . . . . . . . . . . . . . . . . . . . . . 10 2.2. Level 1: Block Types . . . . . . . . . . . . . . . . . . 10 2.2.1. Block Type . . . . . . . . . . . . . . . . . . . . . 10 2.2.2. Type Context . . . . . . . . . . . . . . . . . . . . 11 2.2.3. Block Type Definition . . . . . . . . . . . . . . . . 11 2.2.4. Basic Blocks Definition . . . . . . . . . . . . . . . 12 2.2.5. Main Catalog . . . . . . . . . . . . . . . . . . . . 16 2.2.6. Additional Catalogs . . . . . . . . . . . . . . . . . 16 2.3. Level 2: Transformations . . . . . . . . . . . . . . . . 17 2.3.1. Automatic Conversion . . . . . . . . . . . . . . . . 17 2.3.2. Paging . . . . . . . . . . . . . . . . . . . . . . . 18 2.4. Level 3: Ontologies . . . . . . . . . . . . . . . . . . . 18 2.5. Data Types . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.1. Boolean . . . . . . . . . . . . . . . . . . . . . . . 19 2.5.2. Natural Number . . . . . . . . . . . . . . . . . . . 19 2.5.3. Integer Number . . . . . . . . . . . . . . . . . . . 19 2.5.4. Real Number . . . . . . . . . . . . . . . . . . . . . 20 Hajda Expires 25 March 2024 [Page 2] Internet-Draft XBUP September 2023 2.5.5. String . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.6. Time . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.7. URL - Uniform Resource Locator . . . . . . . . . . . 23 2.5.8. Coordinates . . . . . . . . . . . . . . . . . . . . . 24 2.6. Algorithms . . . . . . . . . . . . . . . . . . . . . . . 24 3. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1. Appendix: Motivation . . . . . . . . . . . . . . . . . . 24 3.2. Appendix: Examples of Blocks . . . . . . . . . . . . . . 25 3.3. Appendix: Abstraction . . . . . . . . . . . . . . . . . . 27 3.4. Appendix: Parsing . . . . . . . . . . . . . . . . . . . . 28 3.4.1. Level 0 Parsing . . . . . . . . . . . . . . . . . . . 28 3.4.2. Level 1 Parsing . . . . . . . . . . . . . . . . . . . 29 3.4.3. Level 2 Parsing . . . . . . . . . . . . . . . . . . . 29 3.5. Appendix: Comparison to Other Formats . . . . . . . . . . 29 3.6. Appendix: User Interface . . . . . . . . . . . . . . . . 30 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 5. Security Considerations . . . . . . . . . . . . . . . . . . . 31 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 31 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 31 7.1. Normative References . . . . . . . . . . . . . . . . . . 31 7.2. Informative References . . . . . . . . . . . . . . . . . 31 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 33 1. Introduction The Extensible Binary Universal Protocol (XBUP) is a prototype of general purpose multi-layer binary data protocol and file format with primary focus on abstraction and data transformation. Key features: * Unified block-tree structure - Minimalist tree structure based on integer and binary blob only * Custom data types - Support for data type definitions and catalogs of types * Transformation framework - Automatic and manual data conversions and compatibility handling Secondary features includes some capabilities inspired by markup languages like SGML/XML [XML] and data representation languages like YAML [YAML], JSON [RFC4627] and similar binary formats like ASN.1 [ASN.1], HDF5 [HDF5], efficient XML [EfficientXML] or Protocol Buffers [ProtoBuf]. * Extensibility Hajda Expires 25 March 2024 [Page 3] Internet-Draft XBUP September 2023 * Unconstrained values * Internal and external referencing * Data life-cycle / definition evolution 1.1. Goals The primary goal of this project is to create a communication protocol / data format with the following characteristics, order by priority: * Universal - Capable of representation of any type of data, suitable for wide range of use including streaming, long-term storage and parallel accessing * Independent - Not tightly linked to a particular spoken language, product, company, processing architecture or programming language * Declarative - Self sufficient for data type definition and with the ability to build data types by combining existing ones * Normative - Providing reference form for data representation * Flexible - Support for data transformations, compatibility and extensibility handling * Efficient - Effective data compacting / compression support for plain binary and structured data 1.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The term "byte" is used in its now-customary sense as a synonym for "octet" - sequence of 8 bits. 2. XBUP Specification XBUP is multi-layer protocol for representation of data in bit/byte stream provided by other protocols / file data etc. Each layer is build on top of previous layer providing new capabilities, like new constraints and/or features. Higher levels can also declare retrospectivelly entities used in lower levels. Hajda Expires 25 March 2024 [Page 4] Internet-Draft XBUP September 2023 Applications can choose to support only up to specific layer of XBUP protocol when full support is not necessary. Layers are indexed as levels by depth starting with level 0. Layers of the protocol +=======+=================+ | Level | Layer | +=======+=================+ | 0 | Tree Structure | +-------+-----------------+ | 1 | Type System | +-------+-----------------+ | 2 | Transformations | +-------+-----------------+ | 3 | Relations | +-------+-----------------+ Table 1: Layers 2.1. Level 0: Tree Structure Lowest protocol's level defines basic tree structure using two primitive types. * UBNumber encoded value * Blob - Sequence of bits (bytes) with unspecified length or length specified by some attribute Sequence of those primitive types forms a block. Single block represents node of the tree and can contain child blocks which are representing child nodes. 2.1.1. UBNumber Encoding UBNumber is encoding which combines unary and binary encoding with varying length of units of bits (octets). It is typically representing natural non-negative integer number (or value of any other type with deterministic mapping to well ordered / countably infinite set). Encoding is applied recursively when unary part fills all bits of the first byte. This is similar to other varying length encoding, for example used in UTF-8 [RFC3629]. Hajda Expires 25 March 2024 [Page 5] Internet-Draft XBUP September 2023 To decode value, non-zero bits are counted for length up to 8 bits and then rest of bits is used as value + additional sequence of n bytes where n equals to length. Value is also shifted so that there is only one code for each number. For bit value 0xFF which corresponds to length 8, additional UBNatural value is added next. This new value contains additional length value. Examples of the UBNatural - codes sequence of bits = value represented value in basic natural non-negative integer number: 0 0000000 = 0 0 0000001 = 1 0 0000010 = 2 0 0000011 = 3 ... 0 1111111 = 7Fh = 127 10 000000 | 00000000 = 80h = 128 10 000000 | 00000001 = 81h = 129 ... 10 111111 | 11111111 = 407Fh = 16511 110 00000 | 00000000 | 00000000 = 4080h = 16512 ... 11111110 11111111 .. 11111111 = 10204081020407Fh \_____ 7 times ____/ ... 11111111 00000000 00000000 .. 00000000 = 102040810204080h \+len 0/ \_____ 8 times ____/ ... 11111111 00000001 00000000 .. 00000000 = 10102040810204080h \+len 1/ \_____ 9 times ____/ Figure 1: UBNatural example codes and values Other mappings to represent different values than natural numbers can be also used with UBNumber encoding. For level 0 following two mappings are used: * UBNatural encoding using directly value from UBNumber basic mapping as listed above * UBENatural where value 7Fh is reserved for infinity constant and higher codes are shifted by one 2.1.2. Document Single document is typically represented as a single block, but data after this block are also considered part of the document. Hajda Expires 25 March 2024 [Page 6] Internet-Draft XBUP September 2023 To store document in the file / file system or in data streams protocol version was not negotiated prior, additional "Document header" should be present. Document header contains information about protocol version. For the current version 0.2 of the protocol, it is 6 bytes long data blob. Explanation of each value is non-conformant, primary use is padding to help systems which uses beginning of file for identification of file type. Document header with hexadecimal values: Structure of file header +======+======================================+ | Byte | Content | +======+======================================+ | FE | Unary encoded size of cluster (byte) | +------+--------------------------------------+ | 00 | Reserved for future versions | +------+--------------------------------------+ | 58 | ASCII constant 'X' | +------+--------------------------------------+ | 42 | ASCII constant 'B' | +------+--------------------------------------+ | 00 | UBNatural encoded major version | +------+--------------------------------------+ | 02 | UBNatural encoded minor version | +------+--------------------------------------+ Table 2: Document Header Bytes Primary block called "Root Block" follows after header. Any data after root block are optional data blob called "Tail Data". 2.1.3. Block Block specifies encoding / decoding method for bytes sequence into sequence of blobs or child blocks and to defere its own size. Each block starts with the value: UBNatural attributePartSize Value attributePartSize is not allowed to equal to 0 for the block as value 0 is used for termination handling (see below). Hajda Expires 25 March 2024 [Page 7] Internet-Draft XBUP September 2023 Block continues with attribute part which is blob of the length in bytes specified by attributePartSize. First value in attribute part represents: UBENatural dataPartSize Rest of the data (if any) in the attribute part is interpreted as a nonempty sequence of attribute values encoded in any UBNumber encoding. Binary blob called data part follows after attribute part - is optional / can be empty. If the dataPartSize value fills exactly whole space of the attribute part (there are no more attributes in attribute part) then this block is called "Data Block" otherwise block is called "Node Block". After data part section, block ends. +-------------------------------------+ | == Block ========================== | | | | UBNatural attributePartSize | +-------------------------------------+ | == Attribute part ================= | | | | UBENatural dataPartSize | | UBNumber attribute 1 | | ... | | UBNumber attribute n | +-------------------------------------+ | == Data part (optional) =========== | | | | Single data blob or child blocks | +-------------------------------------+ Effectively, transferred data are represented as a sequence of attributes and child blocks or data blob, while attributePartSize and dataPartSize values are present for the structural purpose. See examples of blocks (Section 3.2). 2.1.4. Node Block When there is at least one attribute value in attribute part, block is called node block. Data in data part are interpreted as a sequence of (child) blocks. Hajda Expires 25 March 2024 [Page 8] Internet-Draft XBUP September 2023 Data part has length in bytes specified by dataPartSize value or if dataPartSize equals infinity, sequence of child blocks can be infinite or terminated by terminator (single zero byte value). If there are no child blocks, node block is also called leaf block. 2.1.5. Data Block When there is no attribute value in attribute part, block is called data block. Data in data part are interpreted as a binary blob. Data part has length in bytes specified by dataPartSize value or if dataPartSize equals infinity, data part is processed by byte and each value zero is used as a escape code, where directly following byte means: * Value 0 denotes end of the data part * Value 1 to 255 denotes sequence of zero bytes of given count and processing continues If there is no data in data part, data block is also called empty block. 2.1.6. Validity Binary stream is structured correctly as XBUP document (well-formed) if the following conditions are met. Description of invalid state is also included for each condition. * Optional: Stream header must be present (Corrupted or missing header) * Optional: Header version must be in supported range (Unsupported version) * In each block the end of last attribute corresponds to the end of the attribute part (Attribute overflow) * In each block the end of last subblock/child block corresponds to the end of the data part (Block overflow) * The terminal block is present only in blocks where it belongs to (Unexpected terminator) * End of file / data stream is before the end of the root block (Unexpected end of data) Hajda Expires 25 March 2024 [Page 9] Internet-Draft XBUP September 2023 2.1.7. Summary To sum it up, data in protocol are structured as a tree of blocks. * Block is either data blob or finite sequence of attributes and child blocks * Block can have specified size - this allows to skip block processing, but also requires to know size of the block in advance when encoding * For block with unknown size, it's possible to use infinity size + termination or sequence of child blocks cound never end * Data block has no attributes, so either have to be wrapped or meaning should be understandable from the content / context In theory, this should provide sufficient capability to represent any data when encoding to blob is available. More complex types can be either constructed using deeper tree structure or compacted into binary blob, but it should be possible to derive data type via transformation to basic data elements when needed. 2.2. Level 1: Block Types Level 1 introduces block types, how to specify type of the block and catalog of types. Approach is somewhat similar to XML Namespaces [XMLNamespaces]. Since this level, if attribute is defined, but not present, it's value is considered as zero code of the UBNumber encoding. 2.2.1. Block Type First two attributes in node block are interpreted as follows: UBNatural - TypeGroup UBNatural - BlockType Figure 2: Block type attributes These two values determine block type. Block types are organized into groups where TypeGroup value specifies to which group block type belongs and BlockType value specifies particular block type in the corresponding group. Hajda Expires 25 March 2024 [Page 10] Internet-Draft XBUP September 2023 TypeGroup with value 0 is always basic build-in group (cannot be overridden). Basic blocks provides ability to specify meaning of other groups via block type declarations, definitions or links to catalog or external source. 2.2.2. Type Context For each block, there is type context which provides mapping of particular block type (as defined above) to particular declaration/ definition (similar to XML Namespaces context). Context is the same for block and all it's children, except for "Document Declaration" block which is used to change context. Range of groups and range of blocks for each group is speficied. 2.2.3. Block Type Definition Block type is defined as a finite sequence of operations where each operation defines one or more attributes and/or child blocks. Operation can refer build-in or previously defined types or no type (for attribute and any). There are variants for singular item and list of items, 8 operations in total: * Single block - Single child block of any type. * Single attribute - Single attribute of any type. * Consist of definition - Single child block of referred type (as a component/element). * Append definition - Appends all attributes and all child blocks of referred type. * List of blocks - One attribute of type UBENatural to define count of blocks of any type and child blocks of that count. When count equals infinity, list of blocks ends with empty block. * List of attributes - One attribute of type UBNatural to define count of attributes of any type and attributes of that count. * List of consist of definitions - One attribute of type UBENatural to define count of blocks of referred type and child blocks of that count. When count equals infinity, list of blocks ends with empty block. Hajda Expires 25 March 2024 [Page 11] Internet-Draft XBUP September 2023 * List of appended definitions - Appends one attribute of type UBNatural to define count of blocks of defined type and appends all attributes and all child blocks of referred type of that count. Following syntax is used in this document (no final syntax is decided yet): any - Single block attribute - Single attribute Block_type_name - Consist of definition +Block_type_name - Append definition []any - List of blocks []attribute - List of attributes []Block_type_name - List of consist of definition +[]Block_type_name - List of append definition Figure 3: Block type attributes From the abstract point of view (more about abstraction (Section 3.3)) type definition is simply ordered list of child singular types or sets of child types including infinite number of them. At the same time data definitions are similar to the table columns definition used in relation databases, except that infinite number of items is also supported. 2.2.4. Basic Blocks Definition Following blocks are defined as build-in group 0, but also defined in catalog. 2.2.4.1. Unspecified (0) This block is used for unspecified block values or data padding. Can be used to represent nil / null values. 2.2.4.2. Document Declaration (1) Declaration block determines the allowed range of groups. This block should be located at the beginning of each file, if the application didn't provide any static/special meaning, but it might be used anywhere inside document as well. Hajda Expires 25 March 2024 [Page 12] Internet-Draft XBUP September 2023 +Natural groupsCount - The number of allocated groups +Natural preserveGroups - The number of groups to keep from previous declarations FormatDeclaration formatDeclaration - Declaration of format Any documentRoot - Root node of document Figure 4: Document Declaration For subblocks of this block there is permitted range of values in the interval group preserveGroups + 1 .. preserveGroups + groupsCount + 1. preservedGroups + groupsCount + 1. If the value reserveGroups = 0, takes the highest not yet reserved group in the current or parental blocks + 1. For all values of zero and the application of rules of cutting the block of zeros coincides with the data block. 2.2.4.3. Format Declaration (2) Format declaration allows you use either declaration from catalog or local format definition or both. +CatalogFormatSpecPath catalogFormatSpecPath - Specification of format defined as path in catalog +Natural formatSpecRevision - Specification's revision number FormatDefinition formatDefinition Figure 5: Format Declaration 2.2.4.4. Format Definition (3) This block allows to specify the basic structure of format specification. Specifies the sequence of parameters using either join or consist operation. Any[] formatParameters - Join or Consist format parameters +RevisionDefinition[] revisions Figure 6: Format Definition 2.2.4.5. Format Join Parameter (4) Join parameter for format definition. +FormatDeclaration formatDeclaration Figure 7: Format Join Parameter Hajda Expires 25 March 2024 [Page 13] Internet-Draft XBUP September 2023 2.2.4.6. Format Consist Parameter (5) Consist parameter for format definition. +GroupDeclaration groupDeclaration Figure 8: Format Consist Parameter 2.2.4.7. Group Declaration (6) Group declaration allows you use either declaration from catalog or local group definition or both. +CatalogGroupSpecPath catalogGroupSpecPath - Specification of format defined as path in catalog +Natural groupSpecRevision - Specification's revision number GroupDefinition groupDefinition Figure 9: Group Declaration 2.2.4.8. Group Definition (7) This block allows to specify the basic structure of group specification. Specifies the sequence of parameters using either join or consist operation. Any[] groupParameters - Join or Consist group parameters +RevisionDefinition[] revisions Figure 10: Group Definition 2.2.4.9. Group Join Parameter (8) Join parameter for group definition. +GroupDeclaration groupDeclaration Figure 11: Group Join Parameter 2.2.4.10. Group Consist Parameter (9) Consist parameter for group definition. +BlockDeclaration blockDeclaration Figure 12: Group Consist Parameter Hajda Expires 25 March 2024 [Page 14] Internet-Draft XBUP September 2023 2.2.4.11. Block Declaration (10) Block declaration allows you use either declaration from catalog or local block definition or both. +CatalogBlockSpecPath catalogBlockSpecPath - Specification of format defined as path in catalog +Natural blockSpecRevision - Specification's revision number BlockDefinition blockDefinition Figure 13: Block Declaration 2.2.4.12. Block Definition (11) This block allows to specify the basic structure of block specification. Specifies the sequence of parameters using either join, consist, list join or list consist operation. Any[] blockParameters - Join or Consist or List Join or List Consist block parameters +RevisionDefinition[] revisions Figure 14: Block Definition 2.2.4.13. Block Join Parameter (12) Join parameter for block definition. +BlockDeclaration blockDeclaration Figure 15: Block Join Parameter 2.2.4.14. Block Consist Parameter (13) Consist parameter for block definition. +BlockDeclaration blockDeclaration Figure 16: Block Consist Parameter 2.2.4.15. Block List Join Parameter (14) List join parameter for block definition. +BlockDeclaration blockDeclaration Figure 17: Block List Join Parameter Hajda Expires 25 March 2024 [Page 15] Internet-Draft XBUP September 2023 2.2.4.16. Block List Consist Parameter (15) List consist parameter for block definition. +BlockDeclaration blockDeclaration Figure 18: Block List Consist Parameter 2.2.4.17. Revision Definition (16) Revision allows to define parameters count for particular specification definition. +Natural parametersCount Figure 19: Revision Definition 2.2.5. Main Catalog To specify basic data types, catalog of block type definitions is established. Catalog is structured as a tree of definitions, where each block type has a unique identifier (sequence of natural numbers). Tree nodes are denoted by ownership base and are suppose to follow similar pattern like internet domain names. Additional to block, group and format specifications, catalog can contain basically any other data which will be properly specified on further protocol levels, for example: * Name of the type in multiple languages * Documentation for given type * Icon * Author / ownership * Custom viewer/editor For basic access, catalog should be accesible as single document stored in XBUP format. 2.2.6. Additional Catalogs Additional catalogs can be addressed from external sources. Hajda Expires 25 March 2024 [Page 16] Internet-Draft XBUP September 2023 2.3. Level 2: Transformations In general, block transformation is data flow from one block type to another block type (more about abstraction (Section 3.3)). Transformation can be used for multiple tasks and cover various operations with data. This level introduces capability to define transformations in catalog and automatically performs conversion between blocks. Protocol processing is based on broad concept of dataflow paradigm, which typically state that there are input data, operation and output data. Additional requirement here is, that operation must be deterministic (for same input returns the same output), but other than that, it can be run in any manner - as a local function in memory up to remote process in cloud. Transformations can be also used for: * Paging * Compression * Encryption * Specify operation between multiple blocks * TODO Additional properties can be specified for the transformation, like for example: * Time complexity * Space complexity * ... TODO 2.3.1. Automatic Conversion Support for transformations is used for automatic conversion of data when applications accesses this data with tools supporting this level of the protocol. Hajda Expires 25 March 2024 [Page 17] Internet-Draft XBUP September 2023 Typically application requests data to be send in a specific format, which it can process from a system service or a providing library and data are converted to the requested form. Depending on the accessing method, transformations can be provided omni- or bi-direction. Processing service can also handle additional requirements for combination of various conversions. General policy is to allow to include any type of data along side the main required type even when data are in transformed state, therefore it's still possible to include data outside the current specialized form for universal storage. 2.3.2. Paging Support for basic data paging is available in basic catalog. Paging is solved using single data blob which is split into pages of the same size. Either each block of the full block structure can be stored in a way, that each block starts in new page or specific behavior can be defined via algorithm. 2.4. Level 3: Ontologies Following level can additionaly specify more about meaning of the data: * Restrict number of items in list * Restrict type of any type * Specify restricted document structure * Restrict allowed transformations * Specify relations between blocks This level introduces entities and relations to the catalog. 2.5. Data Types Following section defines various data types considered for specification in catalog. Typically, where exists automatic transformation between types in each group, either full or with some exceptions. Hajda Expires 25 March 2024 [Page 18] Internet-Draft XBUP September 2023 2.5.1. Boolean For boolean logical value typical entities for "True" and "False" are declared. Boolean can be also stored as attribute 0/1 or 0/1 in blob value. TODO 2.5.1.1. UBBoolean Basic variant using single attribute to store 0 or 1 for false/true. +Natural value Figure 20: UBBoolean Definition 2.5.1.2. DataBoolean Variant using data blob to store single byte 0 or 1 for false/true. When compacting, single bit could be actually used. Blob value Figure 21: DataBoolean Definition 2.5.2. Natural Number Natural numbers represent non-negative integer values, also called unsigned integer. Natural type is also used as primary mapping for UBNumber encoding. Value can be stored as single attribute or blob value. Alternativelly value can be limited to specific maximum or blob length, typically specified in bits, for example natural value in 16, 32, 24, 64 bits, possibly even with swapped parts (endian etc.). TODO 2.5.3. Integer Number Integer value extends range to all integer values including negative values. Integer can be stored using UBNumber encoding using 2-complement form. Hajda Expires 25 March 2024 [Page 19] Internet-Draft XBUP September 2023 Value can be stored as single attribute or blob value. Alternativelly value can be limited to specific minimum and maximum or blob length, typically specified in bits, for example integer value in 16, 32, 24, 64 bits, possibly even with swapped parts (endian etc.). TODO 2.5.4. Real Number Real numbers have fractional part. Also called float or double. Basic supported form is to use two integer attributes, one to represent base and other for mantisa. This will allow to store any real number of finite precision. Alternative type is using [IEEE.754.1985] stored in blob. TODO 2.5.4.1. UBReal Basic variant using two UBInteger attributes to represent any real number with finite binary fraction. +UBInteger base +UBInteger mantissa Figure 22: UBReal Definition To eliminate redundancy, method of adding invisible bit before decimal point is used - with extra decrement for zero value. if (Base = 0 and Mantissa = 0) the Value := 0 else { Value := (Base * 2 + 1) * (2 ^ Mantissa) if (Base > 0 and Mantissa = 0) then Value := Value - 2 } Figure 23: UBReal algorithm Hajda Expires 25 March 2024 [Page 20] Internet-Draft XBUP September 2023 ... (10)111111 11111111 (0)0000000 = -81h (0)1000000 (0)0000000 = -7Fh (0)1000001 (0)0000000 = -7Dh ... (0)1111110 (0)0000000 = -3 (0)1111111 (0)0000000 = -1 (0)0000000 (0)0000000 = 0 (1) (0)0000001 (0)0000000 = 1 (3) (0)0000010 (0)0000000 = 3 (5) ... (0)0111111 (0)0000000 = 7Dh (7Fh) (10)000000 00000000 (0)0000000 = 7Fh (81h) ... Figure 24: UBReal example codes and values Examples with non-zero mantissa: (0)1111111 (0)0000001 = -2 (0)0000000 (0)0000001 = 2 (0)0000001 (0)0000001 = 6 (0)0000010 (0)0000001 = 10 (0)0000000 (0)0000010 = 4 (0)0000000 (0)0000011 = 8 (0)0000000 (0)1111111 = 0.5 (0)0000001 (0)1111111 = 1.5 Figure 25: UBReal example codes and values 2.5.4.2. DataReal Variant using data blob to store real numbers. Blob value Figure 26: DataReal Definition 2.5.4.3. UBRatio Variant of real number with fixed range using single UBNatural attribute to represent any real number with finite binary fraction in range <0, 1>. +UBNatural value Figure 27: UBRatio Definition Hajda Expires 25 March 2024 [Page 21] Internet-Draft XBUP September 2023 Method of reverting value is used. Value := Input if not (Value=0 or Value=1) then ( Value := Value + 1 while (Value = Trunc(Value)) do ( Value := Value * 2) Value := Trunc(Value/2) + 1 ) Figure 28: UBRatio algorithm (0)0000000 0 = 0 = 0 (0)0000001 1 = 1 = 1 (0)0000010 0.1 = 1/2 = 0.5 (0)0000011 0.01 = 1/4 = 0.25 (0)0000100 0.11 = 3/4 = 0.75 (0)0000101 0.001 = 1/8 = 0.125 (0)0000110 0.011 = 3/8 = 0.375 (0)0000111 0.101 = 5/8 = 0.625 (0)0001000 0.111 = 7/8 = 0,875 (0)0001001 0.0001 = 1/16 = 0,0625 (0)0001010 0.0011 = 3/16 = 0,1875 (0)0001011 0.0101 = 5/16 = 0,3125 ... Figure 29: UBRatio example codes and values 2.5.4.4. UBFixedPoint Variant of real number with fixed precision is simply stored as UBInteger and using specific scaling. There can be also non-negative variant using UBNatural attribute. +UBInteger value Figure 30: UBFixedPoint Definition Values are simply multiplied by scale, for example for ratio 1/100. Value := Input * 0.01 Figure 31: UBFixedPoint algorithm (0)0000000 = 0 (0)0000001 = 0.01 (0)0000010 = 0.02 ... Hajda Expires 25 March 2024 [Page 22] Internet-Draft XBUP September 2023 Figure 32: UBFixedPoint example codes and values 2.5.5. String Text string can be represented using various encodings. Basic string type is using UTF-8 encoding by default. Alternative type allows to specify used encoding using either IANA MIME name or encoding MIB index. TODO 2.5.5.1. String Basic UTF-8 encoded string stored as binary blob. Blob value Figure 33: String Definition 2.5.5.2. Utf16String Basic UTF-16 encoded string stored as binary blob. Blob value Figure 34: UTF16String Definition 2.5.6. Time Various types are defined to specify concrete date, time, timezone... Types for time interval / range TODO 2.5.7. URL - Uniform Resource Locator Basic URL type is using string representation of the URL. URL can be used to specify additional external catalogs. TODO Hajda Expires 25 March 2024 [Page 23] Internet-Draft XBUP September 2023 2.5.8. Coordinates Types to represent coordinates, like position on planet via latitude, longitude, altitude, elevation, rotation, GPS coordinates, distance. 2.6. Algorithms Algorithms in the protocol are based on data-flow concept similar to what is used for transformations. This allows to define algorithms in wide range of paradigms including functional, logical and imperative. 3. Appendixes 3.1. Appendix: Motivation Project should provide universal protocol as a more feature-rich alternative to currently used binary protocols. It should provide general methods for handling data of various form and types including: * Multimedia files - Audio, video, animation, 3D * Serialization protocol - Provide ability to serialize non- structured data * Application API - Remote or local method call execution, supporting parameters and result passing and error handling * Filesystem structure - Allow to represent data in the form of filesystem or as a compressed archive * Huge data - Use dynamic numeric values to allow support for data in terabytes range or greater * Random access - Segmented, paged, fragmented data * Parallel processing - Atomicity, structural data for database representation * Indexes, error detection and data correction From the users point of view, protocol should provide new capabilities or enable new development in various areas: Hajda Expires 25 March 2024 [Page 24] Internet-Draft XBUP September 2023 * Browseable binary content - Provide capability for viewing and editation of data, including visual and graphical tools and textual tools with multiple available syntaxes and supported languages * Flexible modular applications - With the ability to provide both independent API and data interchange format and with automatic transformation between both of them, it should be possible to utilize the protocol to enhance approach for modular applications design * Comprehensive scientific protocol - With the multiple levels of expresiveness and capability to define unlimited number of additional properties, it should be possible to utilize the protocol for definition and storage of specialized scientific data * Strong building blocks - Provide well specified data representation and ability to construct even complex data structures from combining data type definitions from wide libraries * Long-term storage - Provide way to define data with external or integrated specification 3.2. Appendix: Examples of Blocks Examples of blocks and how their are encoded using XBUP protocol. Fixed size node block with one attribute +======+===========================+ | Byte | Value | +======+===========================+ | 02 | AttributePartSize | +------+---------------------------+ | 00 | DataPartSize | +------+---------------------------+ | 77 | Attribute 1 of value 0x77 | +------+---------------------------+ Table 3: Stream Data Terminated node block with one attribute Hajda Expires 25 March 2024 [Page 25] Internet-Draft XBUP September 2023 +======+===================+ | Byte | Value | +======+===================+ | 02 | AttributePartSize | +------+-------------------+ | 7F | DataPartSize | +------+-------------------+ | 05 | Attribute 1 | +------+-------------------+ | 00 | Terminator | +------+-------------------+ Table 4: Stream Data Fixed size data block +======+=======================+ | Byte | Value | +======+=======================+ | 01 | AttributePartSize | +------+-----------------------+ | 01 | DataPartSize | +------+-----------------------+ | BB | One byte of data 0xBB | +------+-----------------------+ Table 5: Stream Data Terminated empty data block +======+===================+ | Byte | Value | +======+===================+ | 01 | AttributePartSize | +------+-------------------+ | 7F | DataPartSize | +------+-------------------+ | 00 | Data block escape | +------+-------------------+ | 00 | Termination value | +------+-------------------+ Table 6: Stream Data Fixed size block with one child Hajda Expires 25 March 2024 [Page 26] Internet-Draft XBUP September 2023 +======+===========================+ | Byte | Value | +======+===========================+ | 02 | AttributePartSize | +------+---------------------------+ | 03 | DataPartSize | +------+---------------------------+ | 66 | Attribute 1 of value 0x66 | +------+---------------------------+ | 02 | AttributePartSize | +------+---------------------------+ | 00 | DataPartSize | +------+---------------------------+ | 77 | Attribute 1 of value 0x77 | +------+---------------------------+ Table 7: Stream Data 3.3. Appendix: Abstraction Primary focus on abstraction makes this protocol somewhat different compare to other similar binary formats which focus on efficiency, serialization or binary representation of a specific mark-up language. See Formats comparison (Section 3.5) for more information. This protocol technically overlaps in functionality with many currently widely used protocols and formats including those defined by various RFCs. It has also somewhat different nature compare to currently used typically text-based internet protocols (on higher layers). Therefore various aspects should be evaluated, whether potential advantages this protocol could provide overweight complexity and other possible issues, see [RFC3117] for design consideration. With the primary focus on abstraction, data in the protocol are considered more as abstract entities than a specific method for data representation. Catalog is then viewed more as a set of general entities with unique identifier - using set theory terminology, it's well-ordered countable of items. On level 1 of the protocol, some of the items have specific meaning for definition of type and some are used to identify ownership and type definition. Hajda Expires 25 March 2024 [Page 27] Internet-Draft XBUP September 2023 Level 2 introduces transformation method item to define data conversion between two specific types (input and output) and various related items which allows to specify additional properties of types and transformations. Higher levels then define additional new meanings of categories of items for additional relations and also introduces dynamic processes to generate them. TODO 3.4. Appendix: Parsing Similar to parsing of textual formats, it's possible to provide parsing capability for binary protocol. * Object Model Parsing * Pull Parsing * Event Parsing * Hybrid Approaches 3.4.1. Level 0 Parsing To process level 0 protocol following 4 types of tokens are used: * begin (terminationMode flag) * attribute (UBNumber value) * data (Binary data) * end Following simplified grammar can be used for token processing. Document ::= header + Block + data Block ::= begin + Attributes + Blocks + end | begin + data + end Blocks ::= Block + Blocks | epsilon Attributes ::= attribute + Attributes | epsilon Figure 35: Simplified grammar Hajda Expires 25 March 2024 [Page 28] Internet-Draft XBUP September 2023 3.4.2. Level 1 Parsing To process level 1 protocol following 5 types of tokens are used: * begin (terminationMode flag) * type (block type) * attribute (UBNumber value) * data (Binary data) * end Newly added type token serves the purpose of identifying type of block. There are few methods how to represent type and it's possible to convert between them: * Two attributes for groupId and blockId * Pointer to block type in current type context * Pointer to block type in main catalog 3.4.3. Level 2 Parsing With support for transformations, additional interface to request specific transformation is available. Typical parsing on this level is performed in a manner, that specific block type ranges are requested for specific blocks and parsers provide automatically transformed data. TODO 3.5. Appendix: Comparison to Other Formats While there are various binary formats and markup languages available, this project aims to take somewhat different approach to data representation. * While SGML, XML [XML] and related technologies were huge inspiration for this project, it seems that it wouldn't be feasible to use them as base for the binary variant due to attribute vs. child tag duality and use of Unicode string as a primitive data type in contrast to countable set used by this project Hajda Expires 25 March 2024 [Page 29] Internet-Draft XBUP September 2023 * Using binary format is basically a necessity to make protocol reasonable usable for universal data like for example audio or video even thou text formats (for example JSON [RFC4627], YAML [YAML]) provide easy of use and readability advantages * Compare to wide range of existing binary formats with fixed block structure (for example RIFF), this project aims to provide more unified access to all data structures and their definitions * Compare to formats based on serialization of data primitives (for example Protocol Buffers [ProtoBuf], CBOR [RFC7049]) this project aims to provide capability for data definitions which would make transmitting primitive types unnecessary * Multi-level approach should allow to simplify and improve use compare to other dynamic binary formats (for example HDF5 [HDF5], ASN.1 [ASN.1] and EBML [EBML]) 3.6. Appendix: User Interface With unified tree structure it should be possible to provide tool which can process generic document encoded using XBUP protocol. Following capabilities should be implemented: * Show document as visual tree * Show document as text using various syntaxes (including editing) * Support catalog including external definitions * Support for transformations including working with data in transformed form Aim here is to provide comprehend tool to view and edit documents on different levels similar to what text editors provides for binary files representing text using typical encodings. Additionaly, support for multiple syntaxes should allow to evolve syntax over time while underlying abstract concepts remain the same or it should be possible to adjust them via automatic transformations without constriction to syntax compatibility. TODO Hajda Expires 25 March 2024 [Page 30] Internet-Draft XBUP September 2023 4. IANA Considerations In the current early state of the development of the protocol, just basic media type for general files is defined: application/x-xbup TODO 5. Security Considerations Security was not considered at current level of the development. 6. Acknowledgements TBD 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 2003, . 7.2. Informative References [RFC3117] Rose, M., "On the Design of Application Protocols", RFC 3117, DOI 10.17487/RFC3117, November 2001, . [RFC4627] Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, DOI 10.17487/RFC4627, July 2006, . [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, October 2013, . [ASN.1] Union, I. T., "Information Technology -- ASN.1 encoding rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER)", 1994, . ITU-T Recommendation X.690 Hajda Expires 25 March 2024 [Page 31] Internet-Draft XBUP September 2023 [YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup Language (YAML[TM]) Version 1.2, 3rd Edition", October 2009, . [XML] Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., and F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth Edition)", November 2008, . W3C Recommendation REC-xml-20081126 [XMLNamespaces] Bray, T., Hollander, D., Layman, A., Tobin, R., and H. S. Thompson, "Namespaces in XML 1.0 (Third Edition)", December 2009, . W3C Recommendation REC-xml-names-20091208 [EfficientXML] Schneider, J., Kamiya, T., Peintner, D., and R. Kyusakov, "Efficient XML Interchange (EXI) Format 1.0 (Second Edition)", February 2014, . [HDF5] Group, T. H., "HDF5 File Format Specification Version 3.0", April 2016, . [EBML] Lhomme, S., Rice, D., and M. Bunkus, "Extensible Binary Meta Language", Work in Progress, draft-ietf-cellar-ebml, 2020, . [IEEE.754.1985] Institute of Electrical and Electronics Engineers, "Standard for Binary Floating-Point Arithmetic", August 1985. [ProtoBuf] Google, "Protocol Buffers", 2020, . Index I I Introduction verbiage *_Section 6_* Hajda Expires 25 March 2024 [Page 32] Internet-Draft XBUP September 2023 Author's Address Miroslav Hajda ExBin Project Email: exbinproject@gmail.com URI: https://exbin.org/ Hajda Expires 25 March 2024 [Page 33]