Skip to content

Rewrite iproto protocol description #3151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Nov 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
0345685
Outline initial structure
patiencedaur Sep 16, 2022
84e76d5
Split most box_protocol content and add TODOs
patiencedaur Sep 16, 2022
aed915f
Correct the description of the greeting
patiencedaur Sep 19, 2022
1dc9d77
Add structure and basic content in the Keys section
patiencedaur Sep 20, 2022
31ca1e9
Elaborate on IPROTO_OPS and the different uses of IPROTO_TUPLE
patiencedaur Sep 20, 2022
6d669b0
Add initial structure in the Replication section
patiencedaur Sep 20, 2022
5d8391f
Format the Watchers part of the Keys section
patiencedaur Sep 21, 2022
7920de4
Fill in the blanks in the Keys section
patiencedaur Oct 4, 2022
540caeb
Change column widths
patiencedaur Oct 5, 2022
ad07890
Groom the Format section
patiencedaur Oct 5, 2022
91bdb8e
Adjust column widths for Keys
patiencedaur Oct 5, 2022
c08890c
Change some wordings
patiencedaur Oct 5, 2022
dfabeb3
Adjust column widths
patiencedaur Oct 5, 2022
37a24a5
Add tables for uniformity
patiencedaur Oct 5, 2022
9f9cd7b
Add a cross-link
patiencedaur Oct 5, 2022
b9d8352
Add some info on SQL_BIND
patiencedaur Oct 5, 2022
8bda487
Try out UML diagram illustrations
patiencedaur Oct 5, 2022
d1eaceb
Split table header to avoid ambiguity
patiencedaur Oct 5, 2022
b2fae4c
Express UML style through skinparams
patiencedaur Oct 6, 2022
1d7520b
Try adding a pre-generated SVG image
patiencedaur Oct 6, 2022
7c62d1f
Add SVG illustrations to packet format document
patiencedaur Oct 7, 2022
cb5227a
Add SVG illustrations for SELECT and INSERT
patiencedaur Oct 7, 2022
2482e06
Remove SQL response from SELECT and INSERT packet schemes
patiencedaur Oct 7, 2022
0df7cfe
Add diagrams for REPLACE, UPDATE, UPSERT, DELETE, CALL, EVAL
patiencedaur Oct 7, 2022
414721c
Refactor SQL-specific document and add diagrams
patiencedaur Oct 7, 2022
2b580ea
Add diagram for AUTH
patiencedaur Oct 7, 2022
81a5758
Add diagrams for ID and PING
patiencedaur Oct 7, 2022
cca91e9
Remove unneeded content from Symbols and terms
patiencedaur Oct 7, 2022
110013e
Make images in Format section clickable
patiencedaur Oct 7, 2022
99e5291
Make images in Client-server and SQL clickable
patiencedaur Oct 7, 2022
8f76115
Minor fixes
patiencedaur Oct 10, 2022
3f3cdce
Improve the Replication section
patiencedaur Oct 10, 2022
d66f5b9
Add PROMOTE and DEMOTE descriptions
patiencedaur Oct 10, 2022
b522629
Apply suggestions from Replication section review
patiencedaur Oct 11, 2022
e22b411
Replace remaining pseudo-code illustrations with SVGs
patiencedaur Oct 11, 2022
6c9d1a3
Adjust table widths
patiencedaur Oct 11, 2022
a743beb
Reorder toctree by relevance
patiencedaur Oct 11, 2022
4d3f712
Minor fixes
patiencedaur Oct 12, 2022
94452c8
Apply suggestions from Replication sections review
patiencedaur Oct 12, 2022
87706f4
Apply suggestions from technical writer's review
patiencedaur Oct 12, 2022
b232caa
Apply more suggestions from Replication review
patiencedaur Oct 13, 2022
d696c3c
Improve wording
patiencedaur Nov 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,127 changes: 29 additions & 2,098 deletions doc/dev_guide/internals/box_protocol.rst

Large diffs are not rendered by default.

49 changes: 43 additions & 6 deletions doc/dev_guide/internals/file_formats.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
.. _internals-data_persistence:

--------------------------------------------------------------------------------
File formats
--------------------------------------------------------------------------------
============

.. _internals-wal:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Data persistence and the WAL file format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
----------------------------------------

To maintain data persistence, Tarantool writes each data change request (insert,
update, delete, replace, upsert) into a write-ahead log (WAL) file in the
Expand Down Expand Up @@ -114,9 +112,8 @@ a secondary key, the record in the .xlog file will contain the primary key.

.. _internals-snapshot:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The snapshot file format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
------------------------

The format of a snapshot .snap file is nearly the same as the format of a WAL .xlog file.
However, the snapshot header differs: it contains the instance's global unique identifier
Expand All @@ -131,3 +128,43 @@ and ``_cluster`` -- will be at the start of the .snap file, before the records o
any spaces that were created by users.

Secondarily, the .snap file's records are ordered by primary key within space id.

.. _box_protocol-xlog:

Example
-------

The header of a ``.snap`` or ``.xlog`` file looks like:

.. code-block:: none

<type>\n SNAP\n or XLOG\n
<version>\n currently 0.13\n
Server: <server_uuid>\n where UUID is a 36-byte string
VClock: <vclock_map>\n e.g. {1: 0}\n
\n

After the file header come the data tuples.
Tuples begin with a row marker ``0xd5ba0bab`` and
the last tuple may be followed by an EOF marker
``0xd510aded``.
Thus, between the file header and the EOF marker, there
may be data tuples that have this form:

.. code-block:: none

0 3 4 17
+-------------+========+============+===========+=========+
| | | | | |
| 0xd5ba0bab | LENGTH | CRC32 PREV | CRC32 CUR | PADDING |
| | | | | |
+-------------+========+============+===========+=========+
MP_FIXEXT2 MP_INT MP_INT MP_INT ---

+============+ +===================================+
| | | |
| HEADER | | BODY |
| | | |
+============+ +===================================+
MP_MAP MP_MAP

64 changes: 64 additions & 0 deletions doc/dev_guide/internals/iproto/authentication.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
.. _box_protocol-authentication:

Session start and authentication
================================

Every iproto session begins with a greeting and optional authentication.

Greeting message
----------------

When a client connects to the server instance, the instance responds with
a 128-byte text greeting message, not in MsgPack format:

.. code-block:: none

Tarantool <version> (<protocol>) <instance-uuid>
<salt>

For example:

.. code-block:: none

Tarantool 2.10.0 (Binary) 29b74bed-fdc5-454c-a828-1d4bf42c639a
QK2HoFZGXTXBq2vFj7soCsHqTo6PGTF575ssUBAJLAI=

The greeting contains two 64-byte lines of ASCII text.
Each line ends with a newline character (:code:`\n`). If the line content is less than 64 bytes long,
the rest of the line is filled up with symbols with an ASCII code of 0 that aren't displayed in the console.

The first line contains
the instance version and protocol type. The second line contains the session salt --
a base64-encoded random string, which is usually 44 bytes long.
The salt is used in the authentication packet -- the :ref:`IPROTO_AUTH message <box_protocol-auth>`.

.. _box_protocol-authentication_sequence:

Authentication
--------------

If authentication is skipped, then the session user is ``'guest'``
(the ``'guest'`` user does not need a password).

If authentication is not skipped, then at any time an :ref:`authentication packet <box_protocol-auth>`
can be prepared using the greeting, the user's name and password,
and `sha-1 <https://en.wikipedia.org/wiki/SHA-1>`_ functions, as follows.

.. code-block:: none

PREPARE SCRAMBLE:

size_of_encoded_salt_in_greeting = 44;
size_of_salt_after_base64_decode = 32;
/* sha1() will only use the first 20 bytes */
size_of_any_sha1_digest = 20;
size_of_scramble = 20;

prepare 'chap-sha1' scramble:

salt = base64_decode(encoded_salt);
step_1 = sha1(password);
step_2 = sha1(step_1);
step_3 = sha1(first_20_bytes_of_salt, step_2);
scramble = xor(step_1, step_3);
return scramble;
71 changes: 71 additions & 0 deletions doc/dev_guide/internals/iproto/events.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
.. _internals-events:
.. _box-protocol-watchers:

Events and subscriptions
========================

The commands below support asynchronous server-client notifications signalled
with :ref:`box.broadcast() <box-broadcast>`.
Servers that support the new feature set the ``IPROTO_FEATURE_WATCHERS`` feature in reply to the ``IPROTO_ID`` command.
When the connection is closed, all watchers registered for it are unregistered.

The remote watcher (event subscription) protocol works in the following way:

#. The client sends an ``IPROTO_WATCH`` packet to subscribe to the updates of a specified key defined on the server.

#. The server sends an ``IPROTO_EVENT`` packet to the subscribed client after registration.
The packet contains the key name and its current value.
After that, the packet is sent every time the key value is updated with
``box.broadcast()``, provided that the last notification was acknowledged (see below).

#. After receiving the notification, the client sends an ``IPROTO_WATCH`` packet to acknowledge the notification.

#. If the client doesn't want to receive any more notifications, it unsubscribes by sending
an ``IPROTO_UNWATCH`` packet.

All the three request types are asynchronous -- the receiving end doesn't send a packet in reply to any of them.
Therefore, neither of them has a sync number.

.. _box_protocol-watch:

IPROTO_WATCH
------------

Code: 0x4a.

Register a new watcher for the given notification key or confirms a notification if the watcher is
already subscribed.
The watcher is notified after registration.
After that, the notification is sent every time the key is updated.
The server doesn't reply to the request unless it fails to parse the packet.

.. raw:: html
:file: images/events_watch.svg

.. _box_protocol-unwatch:

IPROTO_UNWATCH
--------------

Code: 0x4b.

Unregister a watcher subscribed to the given notification key.
The server doesn't reply to the request unless it fails to parse the packet.

.. raw:: html
:file: images/events_unwatch.svg

.. _box_protocol-event:

IPROTO_EVENT
------------

Code: 0x4c.

Sent by the server to notify a client about an update of a key.

.. raw:: html
:file: images/event.svg

``IPROTO_EVENT_DATA`` contains data sent to a remote watcher.
The parameter is optional, the default value is ``MP_NIL``.
145 changes: 145 additions & 0 deletions doc/dev_guide/internals/iproto/format.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
.. _internals-iproto-format:

Request and response format
===========================

The types referred to in this document are `MessagePack <http://MessagePack.org>`_ types.
For their definitions, see the :ref:`MP_* MessagePack types <box_protocol-notation>` section.

.. _internals-unified_packet_structure:

Packet structure
----------------

Requests and responses have similar structure. They contain three sections: size, header, and body.

.. raw:: html
:file: images/format.svg

It is legal to put more than one request in a packet.

Size
----

The size is an MP_UINT -- unsigned integer, usually 32-bit.
It the size of the header plus the size of the body.
It may be useful to compare it with the number of bytes remaining in the packet.

.. _box_protocol-header:

Header
------

The header is an MP_MAP. It may contain, in any order:

.. raw:: html
:file: images/header.svg

* Both the request and response make use of the :ref:`IPROTO_REQUEST_TYPE <internals-iproto-keys-request_type>` key.
It denotes the type of the packet.

* The request and the matching response have the same sync number (:ref:`IPROTO_SYNC <internals-iproto-keys-sync>`).

* :ref:`IPROTO_SCHEMA_VERSION <internals-iproto-keys-schema_version>` is an optional key that indicates
whether there was a major change in the schema.

* In :ref:`interactive transactions <txn_mode_stream-interactive-transactions>`,
every stream is identified by a unique :ref:`IPROTO_STREAM_ID <box_protocol-iproto_stream_id>`.

In case of replicating :ref:`synchronous transactions <repl_sync>`,
the header also contains the :ref:`IPROTO_FLAGS <box_protocol-flags>` key.

Encoding and decoding
~~~~~~~~~~~~~~~~~~~~~

To see how Tarantool encodes the header, have a look at file
`xrow.c <https://github.com/tarantool/tarantool/blob/master/src/box/xrow.c>`_,
function ``xrow_header_encode``.

To see how Tarantool decodes the header, have a look at file
`net_box.c <https://github.com/tarantool/tarantool/blob/master/src/box/lua/net_box.c>`__,
function ``netbox_decode_data``.

For example, in a successful response to ``box.space:select()``,
the IPROTO_REQUEST_TYPE value will be 0 = ``IPROTO_OK`` and the
array will have all the tuples of the result.

Read the source code file `net_box.c <https://github.com/tarantool/tarantool/blob/master/src/box/lua/net_box.c>`__
where the function ``decode_metadata_optional`` is an example of how Tarantool
itself decodes extra items.

Body
----

The body is an MP_MAP. Maximal iproto package body length is 2 GiB.

The body has the details of the request or response. In a request, it can also
be absent or be an empty map. Both these states will be interpreted equally.
Responses will contain the body anyway even for an
:ref:`IPROTO_PING <box_protocol-ping>` request, where it will be an empty MP_MAP.

A lot of responses contain the IPROTO_DATA map:

.. raw:: html
:file: images/body.svg

For most data-access requests (:ref:`IPROTO_SELECT <box_protocol-select>`,
:ref:`IPROTO_INSERT <box_protocol-insert>`, :ref:`IPROTO_DELETE <box_protocol-delete>`, etc.)
the body is an IPROTO_DATA map with an array of tuples that contain an array of fields.

IPROTO_DATA is what we get with net_box and :ref:`Module buffer <buffer-module>`
so if we were using net_box we could decode with
:ref:`msgpack.decode_unchecked() <msgpack-decode_unchecked_string>`,
or we could convert to a string with :samp:`ffi.string({pointer},{length})`.
The :ref:`pickle.unpack() <pickle-unpack>` function might also be helpful.

.. note::

For SQL-specific requests and responses, the body is a bit different.
:ref:`Learn more <internals-iproto-sql>` about this type of packets.

.. _box_protocol-responses_error:

Error responses
---------------

Instead of :ref:`IPROTO_OK <internals-iproto-ok>`, an error response header
has IPROTO_REQUEST_TYPE = :ref:`IPROTO_TYPE_ERROR <internals-iproto-type_error>`.
Its code is ``0x8XXX``, where ``XXX`` is the error code -- a value in
`src/box/errcode.h <https://github.com/tarantool/tarantool/blob/master/src/box/errcode.h>`_.
``src/box/errcode.h`` also has some convenience macros which define hexadecimal
constants for return codes.

The error response body is a map that contains two keys: :ref:`IPROTO_ERROR <internals-iproto-keys-error>`
and :ref:`IPROTO_ERROR_24 <internals-iproto-keys-error>`.
While IPROTO_ERROR contains an MP_EXT value, IPROTO_ERROR_24 contains a string.
The two keys are provided to accommodate clients with older and newer Tarantool versions.

.. raw:: html
:file: images/error.svg

Error responses before 2.4.1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Before Tarantool v. :doc:`2.4.1 </release/2.4.1>`, the key IPROTO_ERROR contained a string
and was identical to the current IPROTO_ERROR_24 key.

Let's consider an example. This is the fifth message, and the request was to create a duplicate
space with ``conn:eval([[box.schema.space.create('_space');]])``.
The unsuccessful response looks like this:

.. raw:: html
:file: images/error_24.svg

The tutorial :ref:`Understanding the binary protocol <box_protocol-illustration>`
shows actual byte codes of the response to the IPROTO_EVAL message.

Looking in `errcode.h <https://github.com/tarantool/tarantool/blob/master/src/box/errcode.h>`__,
we find that the error code ``0x0a`` (decimal 10) is
ER_SPACE_EXISTS, and the string associated with ER_SPACE_EXISTS is
"Space '%s' already exists".

Since version :doc:`2.4.1 </release/2.4.1>`, responses for errors have extra information
following what was described above. This extra information is given via the
MP_ERROR extension type. See details in the :ref:`MessagePack extensions
<msgpack_ext-error>` section.
24 changes: 24 additions & 0 deletions doc/dev_guide/internals/iproto/images/auth.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
@startuml

skinparam map {
HyperlinkColor #0077FF
FontColor #313131
BorderColor #313131
BackgroundColor transparent
}

json "**IPROTO_AUTH**" as auth_request {
"Size": "MP_UINT",
"Header": {
"[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/keys IPROTO_REQUEST_TYPE]]": "IPROTO_AUTH",
"[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/keys IPROTO_SYNC]]": "MP_UINT"
},
"Body": {
"[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/keys IPROTO_USER_NAME]]": "MP_STR",
"[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/keys IPROTO_TUPLE]]": {
"MP_ARRAY": "[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/authentication Authentication mechanism]], [[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/authentication scramble]]"
}
}
}

@enduml
Loading