Skip to content

Use repl_timeout instead of default value in heartbeat description #3014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 22 additions & 23 deletions doc/book/replication/repl_monitoring.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
.. _replication-monitoring:

================================================================================
Monitoring a replica set
================================================================================
========================

To learn what instances belong in the replica set, and obtain statistics for all
these instances, issue a :doc:`/reference/reference_lua/box_info/replication` request:

.. code-block:: tarantoolsession
.. code-block:: tarantoolsession

tarantool> box.info.replication
---
Expand Down Expand Up @@ -43,36 +42,36 @@ these instances, issue a :doc:`/reference/reference_lua/box_info/replication` re
This report is for a master-master replica set of three instances, each having
its own instance id, UUID and log sequence number.

.. image:: mm-3m-mesh.svg
.. image:: mm-3m-mesh.svg
:align: center

The request was issued at master #1, and the reply includes statistics for the
other two masters, given in regard to master #1.

The primary indicators of replication health are:

.. _heartbeat:
.. _heartbeat:

* :ref:`idle <box_info_replication_upstream_idle>`, the time (in seconds) since
the instance received the last event from a master.
* :ref:`idle <box_info_replication_upstream_idle>`, the time (in seconds) since
the instance received the last event from a master.

A master sends heartbeat messages to a replica every second, and the master
is programmed to disconnect if it does not see acknowledgments of the heartbeat messages
within :ref:`replication_timeout <cfg_replication-replication_timeout>` * 4
seconds.
If the master has no updates to send to the replicas, it sends heartbeat messages
every :ref:`replication_timeout <cfg_replication-replication_timeout>` seconds. The master
is programmed to disconnect if it does not see acknowledgments of the heartbeat messages
within ``replication_timeout`` * 4 seconds.

Therefore, in a healthy replication setup, ``idle`` should never exceed
``replication_timeout``: if it does, either the replication is lagging
seriously behind, because the master is running ahead of the replica, or the
network link between the instances is down.
Therefore, in a healthy replication setup, ``idle`` should never exceed
``replication_timeout``: if it does, either the replication is lagging
seriously behind, because the master is running ahead of the replica, or the
network link between the instances is down.

* :ref:`lag <box_info_replication_upstream_lag>`, the time difference between
the local time at the instance, recorded when the event was received, and the
local time at another master recorded when the event was written to the
:ref:`write ahead log <internals-wal>` on that master.
* :ref:`lag <box_info_replication_upstream_lag>`, the time difference between
the local time at the instance, recorded when the event was received, and the
local time at another master recorded when the event was written to the
:ref:`write ahead log <internals-wal>` on that master.

Since the ``lag`` calculation uses the operating system clocks from two different
machines, do not be surprised if it’s negative: a time drift may lead to the
remote master clock being consistently behind the local instance's clock.
Since the ``lag`` calculation uses the operating system clocks from two different
machines, do not be surprised if it’s negative: a time drift may lead to the
remote master clock being consistently behind the local instance's clock.

For multi-master configurations, ``lag`` is the maximal lag.
For multi-master configurations, ``lag`` is the maximal lag.
19 changes: 11 additions & 8 deletions locale/ru/LC_MESSAGES/book/replication/repl_monitoring.po
Original file line number Diff line number Diff line change
Expand Up @@ -99,16 +99,19 @@ msgstr ""
":ref:`бездействие <box_info_replication_upstream_idle>`, время (в секундах) "
"с момента получения последнего события от мастера."

#, fuzzy
msgid ""
"A master sends heartbeat messages to a replica every second, and the master "
"is programmed to disconnect if it does not see acknowledgments of the "
"heartbeat messages within :ref:`replication_timeout <cfg_replication-"
"replication_timeout>` * 4 seconds."
"If the master has no updates to send to the replicas, it sends heartbeat "
"messages every :ref:`replication_timeout <cfg_replication-"
"replication_timeout>` seconds. The master is programmed to disconnect if it "
"does not see acknowledgments of the heartbeat messages within "
"``replication_timeout`` * 4 seconds."
msgstr ""
"Мастер отправляет сообщения контрольного сигнала на реплику каждую секунду, "
"и мастер запрограммирован на отключение, если он не получает сообщения "
"контрольного сигнала дольше :ref:`replication_timeout <cfg_replication-"
"replication_timeout>` * 4 секунд."
"Если на мастере нет новых данных, требующих репликации, он отправляет на "
"реплики сообщения контрольного сигнала каждые :ref:`replication_timeout "
"<cfg_replication-replication_timeout>` секунд. Мастер запрограммирован на "
"отключение, если он не получает сообщения контрольного сигнала дольше "
"``replication_timeout`` * 4 секунд."

msgid ""
"Therefore, in a healthy replication setup, ``idle`` should never exceed "
Expand Down