[XBUP] XBUP - Extensible Binary Universal Protocol

» Concepts » SubProjects

Project: XBUP-XML

This document is part of the eXtensible Binary Universal Protocol project documentation. Provides description of the project for the creation of a prescription to save the XML document in XBUP format.

Introduction

The aim of XBUP-XML is to create a set of standard rules for the transfer of XML documents into the form of binary XBUP protocol and matching any semantic meaning the document.

Motivation

XML is a general text format for the representation of any data with description of the data blocks using words or abbreviations in selected language. The text representation of the markup symbols, however, has some drawbacks, especially in terms of performance and size. Therefore, there were attempts to create a binary XML variants that would introduce some positive aspects of the binary form, while resolving the negative. Although the objectives of the Protocol XBUP are somewhat different, it should be possible to use it appropriately and to represent the XML document in some useful binary form.

Principles

Proposal describe way how to represent various text XML items while maintaining the necessary information:

XML Data Encoding

The following variant is only an indicative idea of the possible solutions. There is white characters other than the elements which are not processed.

XML has the following types of items:

Document Header

Document starts with the specification block followed by XML document node. It has the following items:

XML/Document (0):

UBPointer PrologPointer
UBPointer ElementPointer
UBPointer MiscPointer

The first shows the links to XML Prolog with the following attributes:

XML/Prolog (1):

UBPointer DeclarationPointer
UBPointer DocTypePointer
UBPointer MiscPointer
UBPointer MiscAfterDTPointer

All items are optional. MiscAfterDTPointer line should be present only if the DocTypePointer is empty. DocTypePointer refers to the type of ML/Doctype. DeclarationPointer refers to the block type:

XML/Declaration (2):

XBVersion XMLVersion
UBPointer EncodingPointer
UBPointer StandalonePointer

EncodingPointer which refers to the type of “text/Encoding Type” and StandalonePointer to Boolean type.

Example: <?xml version=“A.B” encoding=“UTF-8”?>

Here is a description of the XML/Misc (3) structure, which is the List type. It may include items of the XML/Comment (4) type, or XML/Processing Instruction (5).

Item type XML/Comment is a text string, which may not include two characters ”–” in a row. Processing instruction includes another attribute

XML/Processing Instruction (5)

UBPointer PITargetPointer
UBPointer PIStringPointer

PITargetPointer refers to a string of XML/PITarget (6), which may not be equal to “XML”, regardless of the size of characters. PIStringPointer refers to a string XML/PIString (7), which may not contain characters in a row ”?>”.

Document Tag

There are two basic document elements. XML/Element is an extension of List type, with following values:

XML/Tag (8)

UBPointer TagName
UBPointer AttributeListPointer
UBList Content

Items of the content list may be one of the following types:

XML/CData is a text string, which may not include a sequence of characters ”]]>”. Text data are converted in the translation using XML references.

If there is a need for some reason to distinguish an empty element and a non-empty element without content, it is possible to use following block.

XML/EmptyTag (9)

UBPointer AttributeName
UBPointer AttributesPointer

Tag's Attributes

Tag attributes can be expressed as a list of XML attributes / Attribute List (11) containing the specific XML attributes / Attribute (12) with the following values:

UBPointer AttributeName
UBPointer AttributeValue

ML/DocType (1):

UBPointer NamePointer
UBPointer ExternalIDPointer
UBPointer InternalPointer

The Resulting Specification Format

Combining placed groups and blocks is a test specification for the format.

An Example Document

Here is an example of simple document conversion into a binary form.

Source XHTML document:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Web Page</title>
</head>
<body>
<h1>Welcome!</h1>
</body></html>

The total size: 317 bytes

CodeDescription
FE 00 58 42 00 01File Header
07 80 C2 00 00 03 01 02Specification Tag
0A 00 00 08 05 63 00 00 00 00 01Link to format specification in catalog (example)
06 80 B0 01 00 01 02XML/Document: Root tag of the XML document
05 7A 01 01 01 02XML/Prolog
06 05 01 02 01 00 01XML/Declaration
04 00 02 01 00Text/Encoding: encoding value
06 67 03 01 01 02 03SGML/DocType
01 04[68 74 6D 6C]Data: html
01 26[2D 2F 2F 57 33 43 2F 2F 44 54 44 20 58 48 54 4D 4C 20 31 2E 30 20 54 72 61 6E 73 69 74 69 6F 6E 61 6C 2F 2F 45 4E]Data: "-//W3C//DTD XHTML 1.0 Transitional//EN"
01 37[68 74 74 70 3A 2F 2F 77 77 77 2E 77 33 2E 6F 72 67 2F 54 52 2F 78 68 74 6D 6C 31 2F 44 54 44 2F 78 68 74 6D 6C 31 2D 74 72 61 6E 73 69 74 69 6F 6E 61 6C 2E 64 74 64]Data: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
08 80 27 01 08 01 02 02 03XML/Tag: Root tag
01 04[68 74 6D 6C]Data: "html"
05 4F 01 0B 03 01XML/Attribute List
05 25 01 0C 01 02XML/Attribute
01 05[78 6D 6C 6E 73]Data: "xmlns"
01 1C[68 74 74 70 3A 2F 2F 77 77 77 2E 77 33 2E 6F 72 67 2F 31 39 39 39 2F 78 68 74 6D 6C]Data: "http://www.w3.org/1999/xhtml"
05 0E 01 0C 01 02XML/Attribute
01 08[78 6D 6C 3A 6C 61 6E 67]Data: "xml:lang"
01 02[65 6E]Data: "en"
05 0A 01 0C 01 02XML/Attribute
01 04[6C 61 6E 67]Data: "lang"
01 02[65 6E]Data: "en"
07 17 01 08 01 00 01 02XML/Tag
01 04[68 65 61 64]Data: "head"
07 09 01 08 01 00 01 02XML/Tag
01 05[74 69 74 6C 65]Data: "title"
01 08[57 65 62 20 50 61 67 65]Data: "Web Page"
07 1C 01 08 01 00 01 02XML/Tag
01 04[62 6F 64 79]Data: "body"
07 0E 01 08 01 00 01 02XML/Tag
01 02[68 31]Data: "h1"
01 08[57 65 6C 63 6F 6D 65 21]Data: "Welcome!"

The total size: 335 bytes

Elements with Indexed Name

One possible optimization is the identification of elements by using the identification numbers instead of text items.