[XBUP] XBUP - Extensible Binary Universal Protocol

» Documentation » Format » Social

Format: Language

Language blocks allows to represent the data associated with spoken and written language, symbolic expression of the meaning of words and categorization. Later it could be part of a universal language and definition of its meaning and usage.

Language Data

Language data includes information related to the spoken word, such as text, language, text encoding, speaker identification, declaration of the meaning of words, translating languages ​​and so on.

Catalog: XBUP_Protocol / Society / Language /


This block is used to determine the language in which is the stored text or image, or other language-dependent data such as speech or sign language record. For the basic version there was used the definition of numbers available for the individual world languages. Other versions and index blocks are reserved for future use.

The next block will be possible to specify the conditions under which that language is used, such as spoken word on video streaming.

Index language - RFC

This block uses standard language for specifying numbers [RFC] LanguageNumber listed on the Internet, for example on the website [IANA].

Catalog: XBUP_Protocol / Society / Language / RFCLanguage

UBNatural - MajorLanguageIndex
UBNatural - MinorLanguageIndex

Language Name in ASCII

The second option is to use RFC LanguageString and state name in ASCII encoding. The preferred option of course number, because the name of the encoding is to be displayed at the application level.

Codes are usually in the form of xx_YY, where xx is the language code and YY is the country code.

Catalog: XBUP_Protocol/Society/Language/RFCASCIILanguage UBPointer - StringPointer

Multilanguage Data

This block is a simple derivation of the list of language identifiers. Is suitable when the data are used for multiple languages ​​simultaneously.

Catalog: XBUP_Protocol/Society/Language/MultiLanguage

UBList - Languages

Text Encoding

Text Encoding is basically mandatory in the general language text string. At a higher level protocol should be defined after the table of characters and their graphical representation, as well as the equality of different characters for encoding. For a definition of encoding is possible to use one of the following ways.

Catalog: XBUP_Protocol/Society/Language/Encoding/

IANA Encoding Index

The following block to determine the text encoding is based on well-established standard IANA indexes used for encoding.

Catalog: XBUP_Protocol/Society/Language/Encoding/

UBNatural - IANAEncodingMajorNumber
UBNatural - IANAEncodingMinorNumber

ASCII Encoding Name

The following block to determine the text encoding is based on well-established standard IANA indexes used for encoding.

UBPointer - IANAEncodingStringPointer

Text String

A text string is “meaning” of words encoded with an alphabet. When saving the text should take into account support for any language, code and other text attributes. If the text is a form of compression of graphic symbols and meaning.

The basic block for the general text is as follows (this is the transformation block):

Catalog: XBUP_Protocol/Society/Language/Text/String

UBPointer - StringDataPointer
UBPointer - EncodingPointer
UBPointer - LanguagePointer

Probably should not be used directly to encode a value, but use an external block, which will possibly be defined as automatic, or by referring else.

Another option is to create blocks for the chain with fixed values ​​of coding, where the value is actually included in the coding code block. And to create such blocks and ASCIIString UTFString.

ASCII Text String

A text string with the ASCII encoding was fixing the code value in the block String.

Catalog: XBUP_Protocol/Society/Language/Text/ASCIIString

UBPointer - StringDataPointer
UBPointer - LanguagePointer

UTF-8 Text String

Like in the previous case, the time value of fixed encoding to UTF-8.

Catalog: XBUP_Protocol/Society/Language/Text/UTF8String

UBPointer - StringDataPointer
UBPointer - LanguagePointer

Commentary Block

Direct application of the text block is a block for the realization of the text comments. It can be inserted at any level anywhere in the file. Annotation blocks will probably be several types based on their visual results.

[IANACharset] IANA MIBEnum Character Set Registry, URL: ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
[RFC] Request For Comment, URL: http://www.rfc.org
[ASCII] American Standard for Code Interchange
[UTF-8] UCS Transformation Format, URLs: http://www.faqs.org/rfcs/rfc2279.html
[ISO 639.2] Codes for the Representation of Names of Languages, URL: http://www.loc.gov/standards/iso639-2/php/English_list.php
[IANA Root-Zone] Root-Zone Whois Information, URL: http://www.iana.org/root-whois/index.html