Skip to content

Update the upgrade instructions #3080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 26, 2022
Merged

Update the upgrade instructions #3080

merged 11 commits into from
Aug 26, 2022

Conversation

patiencedaur
Copy link
Contributor

@patiencedaur patiencedaur commented Aug 10, 2022

@patiencedaur patiencedaur changed the title Add note about caveats when upgrading from 1.6 to 1.10 Update the upgrade instructions Aug 12, 2022
Copy link
Contributor

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!
It's good our upgrade page is getting up to date and more clear.
Please, find my comments below

@patiencedaur patiencedaur removed the request for review from Totktonada August 17, 2022 09:57
* Create a 'finishing the upgrade' section with `box.schema.upgrade()`
  and other steps in it
* Clarify version numbers in the note
@patiencedaur
Copy link
Contributor Author

Ro_reason: #2445

* Add instruction on direct upgrade from 1.6 to 2.x
* Clarify the purpose of ro_reason
* Clarify the appropriate `lag` value
* Provide a simpler instruction to view `upstream` and `downstream` values
Copy link
Contributor

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes!
Some more comments from me

* Explain lag value size
* Clarify incompatibility between 1.6 and later versions
* Add schema upgrade and snapshot to instruction for 1.6 -> 1.10
Copy link
Contributor

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me.

@patiencedaur patiencedaur requested a review from veod32 August 19, 2022 09:07

6. Update application files, if needed.
The value of the ``lag`` field can be less or equal than ``box.cfg.replication_timeout``,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this paragpaph to be the 3d item in the numbered list -- to make the check-out preparation points more clear

  1. ro/rw instances
  2. upstream/downtime
  3. replication lag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is unnecessary to divide the output of a single command into two separate steps.
Instead, I'll elaborate on the table output.

@@ -0,0 +1,21 @@
1. Pick any replica in the replica set.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be an RO instance? If yes, let's make this clear.

@@ -58,95 +25,114 @@ How to upgrade from Tarantool 1.7 to 2.x
3. Update the Tarantool server. See installation instructions at Tarantool
`download page <http://tarantool.org/download.html>`_.

4. Launch the updated Tarantool server using ``tarantoolctl`` or ``systemctl``.
After that, make sure to :ref:`finish the upgrade properly <admin-upgrades_db>`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Предлагаю сделать этот пункт снова явно номером #4 в нумерованном списке и привести здесь полный текст процедуры финализации -- или инклюдом включить, или просто текстом написать.

Мотивация:

В этой маленькой инструкции из 4х шагов, мы 3 раза отсылаем пользователя в другие места выполнять другие инструкции. Это всегда неудобно, и лучше сделать по максимуму, чтобы пользователь читал инструкцию на одной странице. Если пп. 2 и 3 еще более или менее норм отсылать, то п. 4 я бы лучше привел здесь в явном виде.

К тому же мы к этой инструкции отсылаем из следущей секции (Upgrading Tarantool in a replica set with no downtime) для апдейта каждого инстанса. Получается уж больно комплексная "матрешка" из редиректов по инструкциям. Это еще один плюс к тому, чтобы сделать максимально инструкцию в одном разделе тут.

Еще один плюс к этой мотивации -- в п.1 мы стоппим инстанс. По логике, в конце мы его должны стартануть. Но в текущем описании не понятно, где. Может в п.3, а может в п.4.
Лучше это сделать как-то явным образом, чтобы читатель видел - в начале мы стоппанули инстанс, в конце запустили. Особенно учитывая то, что к процедуре апдейта standalone инстанс мы обращаемся из шагов инструкции по "Upgrading ... with no downtime" и той инструкции проверяем коннект инстанса к остальным репликам -- логично, что для этого инстанс должен быть запущен, и лучше явно написать в процедуре апгрейда для standalone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Тогда нам нужно обновить доку по box.schema.upgrade :)

1. Pick any replica in the replica set.

2. Upgrade this replica to the new Tarantool version. See details in
:ref:`Upgrading a Tarantool instance <admin-upgrades_instance>`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:ref:`Upgrading a Tarantool instance <admin-upgrades_instance>`.
:ref:`Upgrading Tarantool on a standalone instance <admin-upgrades_instance>`.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Учитывая, что мы отсылаем пользователя сделать процедуру "Upgrading Tarantool on a standalone instance", где инстанс
инстанс стоппится, т.е. у него есть явный даунтайм, может имеет смысл пояснить пользователю, почему мы называем процедуру в репликасете "Upgrading ... with no downtime" -- что репликасет продолжает консистентно писать и отдавать данные на оставшихся работающих репликах, а реплику, которую мы стоппанули и проапгрейдили, мы потом снова подключим к репликасету, и она "догонит" остальных -- если я правильно понял идею этого апдегрейд with no downtime.

Может пользователю это и так очевидно, а может и нет.

2. Upgrade this replica to the new Tarantool version. See details in
:ref:`Upgrading a Tarantool instance <admin-upgrades_instance>`.

3. Make sure the replica connected to the rest of the replica set just fine:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Как писал в комменте к "Upgrading Tarantool on a standalone instance", лучше как-то явно написать, что в процедуре апгрейда одного инстанса мы его стоппим, а потом запускаем.
В этом шаге проверяется коннект инстанса ко всем репликам в сете, инстанс явно должен быть в работающем состоянии.

This page includes explanations and solutions to some common issues
when upgrading a replica set from Tarantool 1.6 to 1.10.

.. include:: ../_includes/1.6-to-2.x-condition.rst
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really clear why we're talking about the 1.6 > 2.x upgrade conditions in the topic for the 1.6 > 1.10 upgrade.


.. include:: ../_includes/1.6-to-2.x-condition.rst

Let's first reiterate the upgrade procedure for any Tarantool version:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be to rephrase this intro sentence somehow?
The previous sentence says, "However, a direct upgrade of a replica set from 1.6 to 2.x is also possible, but only with downtime."
And then we have this intro and the procedure description, but it's not clear that this is the generic procedure regardless the TT version.


**Step 2:** Tarantool 1.10+ fails to recover from 1.6 xlogs, unless ``box.cfg{force_recovery = true}`` is set.
There is some small difference between 1.6 and 1.10 xlogs, which makes 1.6 xlogs appear erroneous to 1.10+ instances.
In order to work around this, start the instance in ``force_recovery`` mode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to work around this, start the instance in force_recovery mode.

When and where a user should set box.cfg{force_recovery = true} ? In the init.lua starting file before launching the instance after upgrade? Better to make it more clear.

a[problematic_field_no].type = 'number'
box.space.problematic_space:format(a)

Once this is performed on the master, it's safe to proceed to step 8, making snapshot on every node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where a user should run this piece of Lua code

  • in TT console
  • or put it in the init.lua starting file before launching the instance
  • or as a separate script file that will be run after performing box.schema.upgrade()

Better to make it more clear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to proceed to step 8

Be seeing that, I started to read the paragraph below Step 8: The user...
instead of going to the real step in the procedure above.

@@ -0,0 +1,20 @@
.. _admin-upgrades-1.6-2.x_downtime:

Upgrade from 1.6 directly to 2.x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May to rename to

Upgrade from 1.6 directly to 2.x with downtime

to make it clear what the case is (it this is the case, of course )))

It's better than it used to be.
Now the Upgrades section can be restructured a bit.
* Some text is repeated. This is because Sphinx cannot correctly
  enumerate lists when list items come from different `include` files.

* Clarify the "replica" term in the instructions

* Clarify how to change the master in a replica set

* Add cross-links to guide the user

* Bring indents in accordance with the style guide

* Clarify how to run code to fix xlogs when upgrading from 1.6 to 1.10
@patiencedaur patiencedaur merged commit 4c54f66 into latest Aug 26, 2022
@patiencedaur patiencedaur deleted the gh-3042-caveats branch August 26, 2022 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants