Skip to content

Error fallback on router for faulty connections #298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Gerold103 opened this issue Oct 5, 2021 · 0 comments
Closed

Error fallback on router for faulty connections #298

Gerold103 opened this issue Oct 5, 2021 · 0 comments
Assignees
Labels
Milestone

Comments

@Gerold103
Copy link
Collaborator

Router continues to send requests to replicas which are proven to be broken. These are orhpan nodes which didn't finish recovery/bootstrap yet, or did finish but with an error and now are broken. It also includes instances who didn't do vshard.storage.cfg, or did but didn't finish yet.

In case of not finished boot all kinds of bad behaviour is possible. The worst ones:

  • Some of vshard.storage functions are recovered in _func, some are not, so the storage is half-usable;
  • Some user functions are not recovered yet, so nothing fails right inside of vshard, but fails in user's code.

It seems reasonable to rely on box.info.status ~= 'running' as a sign of the node being not ready to do anything. This can be used right in the storage functions. Once they see the instance is running, the storage can reload itself to a version without these checks (so as not to call the expensive box.info when unnecessary already).

In case the storage functions are not available yet, netbox will return something nasty like:

  • error: Execute access to function 'test' is denied for user 'guest';
  • error: Procedure 'test' is not defined.

If encounter these errors for any of vshard.storage functions or vshard.storage functions explicitly return an error about the instance being not 'running', the router must put such connections into a backoff state for some time before retrying. At the same time, the retry to another instance when see any of these errors must be automatic. Regardless of the request mode - read or write. These are not network errors, so can be freely retried.

See also #198 and #123.

@Gerold103 Gerold103 added storage router feature A new functionality labels Oct 5, 2021
@Gerold103 Gerold103 changed the title Error fallback on router for for faulty connections Error fallback on router for faulty connections Oct 5, 2021
@sergos sergos added the teamS Scaling label Oct 8, 2021
@Gerold103 Gerold103 self-assigned this Nov 15, 2021
@Gerold103 Gerold103 added this to the 0.2 milestone Nov 15, 2021
Gerold103 added a commit that referenced this issue Dec 4, 2021
RO requests use the replica with the highest prio as specified in
the weights matrix.

If the best replica is not available now and failover didn't
happen yet, then RO requests used to fallback to master. Even if
there were other RO replicas with better prio.

This patch makes RO call firstly try the currently selected most
prio replica. If it is not available (no other connected replicas
at all or failover didn't happen yet), the call will try to walk
the prio list starting from this replica until it finds an
available one.

If it also fails, the call will try to walk the list from the
beginning hoping that the unavailable replica wasn't the best one
and there might be better option on the other side of the prio
list.

The patch was done in scope of task about replica backoff (#298)
because the problem would additionally exist when the best replica
is in backoff, not only disconnected. It would get worse.

Closes #288
Gerold103 added a commit that referenced this issue Dec 4, 2021
RO requests use the replica with the highest prio as specified in
the weights matrix.

If the best replica is not available now and failover didn't
happen yet, then RO requests used to fallback to master. Even if
there were other RO replicas with better prio.

This patch makes RO call firstly try the currently selected most
prio replica. If it is not available (no other connected replicas
at all or failover didn't happen yet), the call will try to walk
the prio list starting from this replica until it finds an
available one.

If it also fails, the call will try to walk the list from the
beginning hoping that the unavailable replica wasn't the best one
and there might be better option on the other side of the prio
list.

The patch was done in scope of task about replica backoff (#298)
because the problem would additionally exist when the best replica
is in backoff, not only disconnected. It would get worse.

Closes #288
Needed for #298
Gerold103 added a commit that referenced this issue Dec 6, 2021
RO requests use the replica with the highest prio as specified in
the weights matrix.

If the best replica is not available now and failover didn't
happen yet, then RO requests used to fallback to master. Even if
there were other RO replicas with better prio.

This patch makes RO call firstly try the currently selected most
prio replica. If it is not available (no other connected replicas
at all or failover didn't happen yet), the call will try to walk
the prio list starting from this replica until it finds an
available one.

If it also fails, the call will try to walk the list from the
beginning hoping that the unavailable replica wasn't the best one
and there might be better option on the other side of the prio
list.

The patch was done in scope of task about replica backoff (#298)
because the problem would additionally exist when the best replica
is in backoff, not only disconnected. It would get worse.

Closes #288
Needed for #298
Gerold103 added a commit that referenced this issue Dec 6, 2021
RO requests use the replica with the highest prio as specified in
the weights matrix.

If the best replica is not available now and failover didn't
happen yet, then RO requests used to fallback to master. Even if
there were other RO replicas with better prio.

This patch makes RO call firstly try the currently selected most
prio replica. If it is not available (no other connected replicas
at all or failover didn't happen yet), the call will try to walk
the prio list starting from this replica until it finds an
available one.

If it also fails, the call will try to walk the list from the
beginning hoping that the unavailable replica wasn't the best one
and there might be better option on the other side of the prio
list.

The patch was done in scope of task about replica backoff (#298)
because the problem would additionally exist when the best replica
is in backoff, not only disconnected. It would get worse.

Closes #288
Needed for #298
Gerold103 added a commit that referenced this issue Dec 16, 2021
Storage configuration takes time. Firstly, box.cfg{} which can be
called before vshard.storage.cfg(). Secondly, vshard.storage.cfg()
is not immediate as well.

During that time accessing the storage is not safe. Attempts to
call vshard.storage functions can return weird errors, or the
functions can even be not available yet. They need to be created
in _func and get access rights in _priv before becoming public.

Routers used to forward errors like 'access denied error' and
'no such function' to users as is, treating them as critical.

Not only it was confusing for users, but also could make an entire
replicaset not available for requests - the connection to it is
alive, so router would send all requests into it and they all
would fail. Even if the replicaset has another instance which is
perfectly functional.

This patch handles such specific errors inside of the router. The
faulty replicas are put into a 'backoff' state. They remain in it
for some fixed time (5 seconds for now), new requests won't be
sent to them until the time passes. Router will use other
instances.

Backoff is activated only for vshard.* functions. If the errors
are about some user's function, it is considered a regular error.
Because the router can't tell whether any side effects were done
on the remote instance before the error happened. Hence can't
retry to another node.

For example, if access was denied to 'vshard.storage.call', then
it is backoff. If inside of vshard.storage.call the access was
denied to 'user_test_func', then it is not backoff.

It all works for read-only requests exclusively of course. Because
for read-write requests the instance is just one - master. Router
does not have other options so backoff here wouldn't help.

Part of #298
Gerold103 added a commit that referenced this issue Dec 16, 2021
While vshard.storage.cfg() is not done, accessing vshard functions
is not safe. It will fail with low level errors like
'access denied' or 'no such function'.

However there can be even worse cases. The user can have universe
access rights. And vshard can be already in global namespace after
require(). So vshard.storage functions are already visible.

The previous patch fixed only the case when function access was
restricted properly. And still fixed it just partially.

New problems are:

- box.cfg{} is already called, but the instance is still
  'loading'. Then data is not fully recovered yet. Accessing is
  not safe from the data consistency perspective.

- vshard.storage.cfg() is not started, or is not finished yet. In
  the end it might be doing something on what the public functions
  depend.

This patch addresses these issues. Now all non-trivial
vshard.storage functions are disabled until vshard.storage.cfg()
is finished and the instance is fully recovered.

They raise an error with a special code. Returning it via
'nil, err' pair wouldn't work. Because firstly, some functions
return a boolean value and are not documented as ever failing.
People would miss this new error.

Second reason - vshard.storage.call() needs to signal the remote
caller that the storage is disabled and it was found before the
user's function was called. If it would be done via 'nil, err',
then the user's function could emulate the storage being disabled.
Or even worse, it could make some changes and then get that error
accidentally by going to another storage remotely which would be
disabled. Hence it is not allowed. Too easy to break something.

It was an option to change vshard.storage.call() signature to
return 'true, retvals...' when user's function was called and
'false, err' when it wasn't, but that would break backward
compatibility. Supporting it only for new routers does not seem
possible.

Part of #298
Closes #123
Gerold103 added a commit that referenced this issue Dec 16, 2021
The patch introduces functions vshard.storage.enable()/disable().

They allow to control manually whether the instance can accept
requests.

It solves the following problems which were not covered by
previous patches:

- Even if box.cfg() is done, status is 'running', and
  vshard.storage.cfg() is finished, still user's application can
  be not ready to accept requests. For instance, it needs to
  create more functions and users on top of vshard. Then it wants
  it disable public requests until all preliminary work is done.

- After all is enabled, fine, and dandy, still the instance might
  want to disable self in case of an emergency. Such as its config
  got broken or too outdated, desynced with a centric storage.

vshard.storage.enable()/disable() can be called any time, before,
during, and after vshard.storage.cfg() to solve these issues.

Part of #298
Gerold103 added a commit that referenced this issue Dec 16, 2021
vshard.storage.call() and most of the other vshard.storage.*
functions now raise an exception STORAGE_IS_DISABLED when the
storage is disabled.

The router wants to catch it to handle in a special way. But
unfortunately,

- error(obj) in a Lua function is wrapper into LuajitError. 'obj'
  is saved into 'message' using its __tostring meta-method.

- It is not possible to create your own error type in a sane way.

These 2 facts mean that the router needs to be able to extract
the original error from LuajitError's message. In vshard errors
are serialized into json, so a valid vshard error, such as
STORAGE_IS_DISABLED, can be extracted from LuajitError's message
if it wasn't truncated due to being too long. For this particular
error it won't happen.

The patch introduces new method vshard.error.from_string() to
perform this extraction for its further usage in router.

Part of #298
Gerold103 added a commit that referenced this issue Dec 16, 2021
If a storage reports it is disabled, then it probably will take
some time before it can accept new requests.

This patch makes STORAGE_IS_DISABLED error cause the connection's
backoff. In line with 'access denied' and 'no such function'
errors. Because the reason for all 3 is the same - the storage is
not ready to accept requests yet.

Such requests are transparently retried now.

Closes #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
Storage configuration takes time. Firstly, box.cfg{} which can be
called before vshard.storage.cfg(). Secondly, vshard.storage.cfg()
is not immediate as well.

During that time accessing the storage is not safe. Attempts to
call vshard.storage functions can return weird errors, or the
functions can even be not available yet. They need to be created
in _func and get access rights in _priv before becoming public.

Routers used to forward errors like 'access denied error' and
'no such function' to users as is, treating them as critical.

Not only it was confusing for users, but also could make an entire
replicaset not available for requests - the connection to it is
alive, so router would send all requests into it and they all
would fail. Even if the replicaset has another instance which is
perfectly functional.

This patch handles such specific errors inside of the router. The
faulty replicas are put into a 'backoff' state. They remain in it
for some fixed time (5 seconds for now), new requests won't be
sent to them until the time passes. Router will use other
instances.

Backoff is activated only for vshard.* functions. If the errors
are about some user's function, it is considered a regular error.
Because the router can't tell whether any side effects were done
on the remote instance before the error happened. Hence can't
retry to another node.

For example, if access was denied to 'vshard.storage.call', then
it is backoff. If inside of vshard.storage.call the access was
denied to 'user_test_func', then it is not backoff.

It all works for read-only requests exclusively of course. Because
for read-write requests the instance is just one - master. Router
does not have other options so backoff here wouldn't help.

Part of #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
While vshard.storage.cfg() is not done, accessing vshard functions
is not safe. It will fail with low level errors like
'access denied' or 'no such function'.

However there can be even worse cases. The user can have universe
access rights. And vshard can be already in global namespace after
require(). So vshard.storage functions are already visible.

The previous patch fixed only the case when function access was
restricted properly. And still fixed it just partially.

New problems are:

- box.cfg{} is already called, but the instance is still
  'loading'. Then data is not fully recovered yet. Accessing is
  not safe from the data consistency perspective.

- vshard.storage.cfg() is not started, or is not finished yet. In
  the end it might be doing something on what the public functions
  depend.

This patch addresses these issues. Now all non-trivial
vshard.storage functions are disabled until vshard.storage.cfg()
is finished and the instance is fully recovered.

They raise an error with a special code. Returning it via
'nil, err' pair wouldn't work. Because firstly, some functions
return a boolean value and are not documented as ever failing.
People would miss this new error.

Second reason - vshard.storage.call() needs to signal the remote
caller that the storage is disabled and it was found before the
user's function was called. If it would be done via 'nil, err',
then the user's function could emulate the storage being disabled.
Or even worse, it could make some changes and then get that error
accidentally by going to another storage remotely which would be
disabled. Hence it is not allowed. Too easy to break something.

It was an option to change vshard.storage.call() signature to
return 'true, retvals...' when user's function was called and
'false, err' when it wasn't, but that would break backward
compatibility. Supporting it only for new routers does not seem
possible.

The patch also drops 'memtx_memory' setting from the config
because an attempt to apply it after calling box.cfg() (for
example, via boot_like_vshard()) raises an error - default memory
is bigger than this setting. It messed the new tests.

Part of #298
Closes #123
Gerold103 added a commit that referenced this issue Dec 17, 2021
The patch introduces functions vshard.storage.enable()/disable().

They allow to control manually whether the instance can accept
requests.

It solves the following problems which were not covered by
previous patches:

- Even if box.cfg() is done, status is 'running', and
  vshard.storage.cfg() is finished, still user's application can
  be not ready to accept requests. For instance, it needs to
  create more functions and users on top of vshard. Then it wants
  to disable public requests until all preliminary work is done.

- After all is enabled, fine, and dandy, still the instance might
  want to disable self in case of an emergency. Such as its config
  got broken or too outdated, desynced with a centric storage.

vshard.storage.enable()/disable() can be called any time, before,
during, and after vshard.storage.cfg() to solve these issues.

Part of #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
vshard.storage.call() and most of the other vshard.storage.*
functions now raise an exception STORAGE_IS_DISABLED when the
storage is disabled.

The router wants to catch it to handle in a special way. But
unfortunately,

- error(obj) in a Lua function is wrapped into LuajitError. 'obj'
  is saved into 'message' using its __tostring meta-method.

- It is not possible to create your own error type in a sane way.

These 2 facts mean that the router needs to be able to extract
the original error from LuajitError's message. In vshard errors
are serialized into json, so a valid vshard error, such as
STORAGE_IS_DISABLED, can be extracted from LuajitError's message
if it wasn't truncated due to being too long. For this particular
error it won't happen.

The patch introduces new method vshard.error.from_string() to
perform this extraction for its further usage in router.

Part of #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
If a storage reports it is disabled, then it probably will take
some time before it can accept new requests.

This patch makes STORAGE_IS_DISABLED error cause the connection's
backoff. In line with 'access denied' and 'no such function'
errors. Because the reason for all 3 is the same - the storage is
not ready to accept requests yet.

Such requests are transparently retried now.

Closes #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
The patch introduces functions vshard.storage.enable()/disable().

They allow to control manually whether the instance can accept
requests.

It solves the following problems which were not covered by
previous patches:

- Even if box.cfg() is done, status is 'running', and
  vshard.storage.cfg() is finished, still user's application can
  be not ready to accept requests. For instance, it needs to
  create more functions and users on top of vshard. Then it wants
  to disable public requests until all preliminary work is done.

- After all is enabled, fine, and dandy, still the instance might
  want to disable self in case of an emergency. Such as its config
  got broken or too outdated, desynced with a centric storage.

vshard.storage.enable()/disable() can be called any time, before,
during, and after vshard.storage.cfg() to solve these issues.

Part of #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
vshard.storage.call() and most of the other vshard.storage.*
functions now raise an exception STORAGE_IS_DISABLED when the
storage is disabled.

The router wants to catch it to handle in a special way. But
unfortunately,

- error(obj) in a Lua function is wrapped into LuajitError. 'obj'
  is saved into 'message' using its __tostring meta-method.

- It is not possible to create your own error type in a sane way.

These 2 facts mean that the router needs to be able to extract
the original error from LuajitError's message. In vshard errors
are serialized into json, so a valid vshard error, such as
STORAGE_IS_DISABLED, can be extracted from LuajitError's message
if it wasn't truncated due to being too long. For this particular
error it won't happen.

The patch introduces new method vshard.error.from_string() to
perform this extraction for its further usage in router.

Part of #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
If a storage reports it is disabled, then it probably will take
some time before it can accept new requests.

This patch makes STORAGE_IS_DISABLED error cause the connection's
backoff. In line with 'access denied' and 'no such function'
errors. Because the reason for all 3 is the same - the storage is
not ready to accept requests yet.

Such requests are transparently retried now.

Closes #298

@TarantoolBot document
Title: vshard.storage.enable/disable()
`vshard.storage.disable()` makes most of the `vshard.storage`
functions throw an error. As Lua exception, not via `nil, err`
pattern.

`vshard.storage.enable()` reverts the disable.

By default the storage is enabled.

Additionally, the storage is forcefully disabled automatically
until `vshard.storage.cfg()` is finished and the instance finished
recovery (its `box.info.status` is `'running'`, for example).

Auto-disable protects from usage of vshard functions before the
storage's global state is fully created.

Manual `vshard.storage.disable()` helps to achieve the same for
user's application. For instance, a user might want to do some
preparatory work after `vshard.storage.cfg` before the application
is ready for requests. Then the flow would be:
```Lua
vshard.storage.disable()
vshard.storage.cfg(...)
-- Do your preparatory work here ...
vshard.storage.enable()
```

The routers handle the errors signaling about the storage being disabled in a
special way. They put connections to such instances into a backoff state for
some time and will try to use other replicas. For example, assume a replicaset
has replicas 'replica_1' and 'replica_2'. Assume 'replica_1' is disabled due to
any reason. If a router will try to talk to 'replica_1', it will get a special
error and will transparently retry to 'replica_2'.

When 'replica_1' is enabled again, the router will notice it too and will send
requests to it again.

It all works exclusively for read-only requests. Read-write requests can only be
sent to a master, which is one per replicaset. They are not retried.
Gerold103 added a commit that referenced this issue Dec 17, 2021
Storage configuration takes time. Firstly, box.cfg{} which can be
called before vshard.storage.cfg(). Secondly, vshard.storage.cfg()
is not immediate as well.

During that time accessing the storage is not safe. Attempts to
call vshard.storage functions can return weird errors, or the
functions can even be not available yet. They need to be created
in _func and get access rights in _priv before becoming public.

Routers used to forward errors like 'access denied error' and
'no such function' to users as is, treating them as critical.

Not only it was confusing for users, but also could make an entire
replicaset not available for requests - the connection to it is
alive, so router would send all requests into it and they all
would fail. Even if the replicaset has another instance which is
perfectly functional.

This patch handles such specific errors inside of the router. The
faulty replicas are put into a 'backoff' state. They remain in it
for some fixed time (5 seconds for now), new requests won't be
sent to them until the time passes. Router will use other
instances.

Backoff is activated only for vshard.* functions. If the errors
are about some user's function, it is considered a regular error.
Because the router can't tell whether any side effects were done
on the remote instance before the error happened. Hence can't
retry to another node.

For example, if access was denied to 'vshard.storage.call', then
it is backoff. If inside of vshard.storage.call the access was
denied to 'user_test_func', then it is not backoff.

It all works for read-only requests exclusively of course. Because
for read-write requests the instance is just one - master. Router
does not have other options so backoff here wouldn't help.

Part of #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
While vshard.storage.cfg() is not done, accessing vshard functions
is not safe. It will fail with low level errors like
'access denied' or 'no such function'.

However there can be even worse cases. The user can have universe
access rights. And vshard can be already in global namespace after
require(). So vshard.storage functions are already visible.

The previous patch fixed only the case when function access was
restricted properly. And still fixed it just partially.

New problems are:

- box.cfg{} is already called, but the instance is still
  'loading'. Then data is not fully recovered yet. Accessing is
  not safe from the data consistency perspective.

- vshard.storage.cfg() is not started, or is not finished yet. In
  the end it might be doing something on what the public functions
  depend.

This patch addresses these issues. Now all non-trivial
vshard.storage functions are disabled until vshard.storage.cfg()
is finished and the instance is fully recovered.

They raise an error with a special code. Returning it via
'nil, err' pair wouldn't work. Because firstly, some functions
return a boolean value and are not documented as ever failing.
People would miss this new error.

Second reason - vshard.storage.call() needs to signal the remote
caller that the storage is disabled and it was found before the
user's function was called. If it would be done via 'nil, err',
then the user's function could emulate the storage being disabled.
Or even worse, it could make some changes and then get that error
accidentally by going to another storage remotely which would be
disabled. Hence it is not allowed. Too easy to break something.

It was an option to change vshard.storage.call() signature to
return 'true, retvals...' when user's function was called and
'false, err' when it wasn't, but that would break backward
compatibility. Supporting it only for new routers does not seem
possible.

The patch also drops 'memtx_memory' setting from the config
because an attempt to apply it after calling box.cfg() (for
example, via boot_like_vshard()) raises an error - default memory
is bigger than this setting. It messed the new tests.

Part of #298
Closes #123
Gerold103 added a commit that referenced this issue Dec 17, 2021
The patch introduces functions vshard.storage.enable()/disable().

They allow to control manually whether the instance can accept
requests.

It solves the following problems which were not covered by
previous patches:

- Even if box.cfg() is done, status is 'running', and
  vshard.storage.cfg() is finished, still user's application can
  be not ready to accept requests. For instance, it needs to
  create more functions and users on top of vshard. Then it wants
  to disable public requests until all preliminary work is done.

- After all is enabled, fine, and dandy, still the instance might
  want to disable self in case of an emergency. Such as its config
  got broken or too outdated, desynced with a centric storage.

vshard.storage.enable()/disable() can be called any time, before,
during, and after vshard.storage.cfg() to solve these issues.

Part of #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
vshard.storage.call() and most of the other vshard.storage.*
functions now raise an exception STORAGE_IS_DISABLED when the
storage is disabled.

The router wants to catch it to handle in a special way. But
unfortunately,

- error(obj) in a Lua function is wrapped into LuajitError. 'obj'
  is saved into 'message' using its __tostring meta-method.

- It is not possible to create your own error type in a sane way.

These 2 facts mean that the router needs to be able to extract
the original error from LuajitError's message. In vshard errors
are serialized into json, so a valid vshard error, such as
STORAGE_IS_DISABLED, can be extracted from LuajitError's message
if it wasn't truncated due to being too long. For this particular
error it won't happen.

The patch introduces new method vshard.error.from_string() to
perform this extraction for its further usage in router.

Part of #298
Gerold103 added a commit that referenced this issue Dec 17, 2021
If a storage reports it is disabled, then it probably will take
some time before it can accept new requests.

This patch makes STORAGE_IS_DISABLED error cause the connection's
backoff. In line with 'access denied' and 'no such function'
errors. Because the reason for all 3 is the same - the storage is
not ready to accept requests yet.

Such requests are transparently retried now.

Closes #298

@TarantoolBot document
Title: vshard.storage.enable/disable()
`vshard.storage.disable()` makes most of the `vshard.storage`
functions throw an error. As Lua exception, not via `nil, err`
pattern.

`vshard.storage.enable()` reverts the disable.

By default the storage is enabled.

Additionally, the storage is forcefully disabled automatically
until `vshard.storage.cfg()` is finished and the instance finished
recovery (its `box.info.status` is `'running'`, for example).

Auto-disable protects from usage of vshard functions before the
storage's global state is fully created.

Manual `vshard.storage.disable()` helps to achieve the same for
user's application. For instance, a user might want to do some
preparatory work after `vshard.storage.cfg` before the application
is ready for requests. Then the flow would be:
```Lua
vshard.storage.disable()
vshard.storage.cfg(...)
-- Do your preparatory work here ...
vshard.storage.enable()
```

The routers handle the errors signaling about the storage being disabled in a
special way. They put connections to such instances into a backoff state for
some time and will try to use other replicas. For example, assume a replicaset
has replicas 'replica_1' and 'replica_2'. Assume 'replica_1' is disabled due to
any reason. If a router will try to talk to 'replica_1', it will get a special
error and will transparently retry to 'replica_2'.

When 'replica_1' is enabled again, the router will notice it too and will send
requests to it again.

It all works exclusively for read-only requests. Read-write requests can only be
sent to a master, which is one per replicaset. They are not retried.
Gerold103 added a commit that referenced this issue Dec 20, 2021
vshard.storage.call() and most of the other vshard.storage.*
functions now raise an exception STORAGE_IS_DISABLED when the
storage is disabled.

The router wants to catch it to handle in a special way. But
unfortunately,

- error(obj) in a Lua function is wrapped into LuajitError. 'obj'
  is saved into 'message' using its __tostring meta-method.

- It is not possible to create your own error type in a sane way.

These 2 facts mean that the router needs to be able to extract
the original error from LuajitError's message. In vshard errors
are serialized into json, so a valid vshard error, such as
STORAGE_IS_DISABLED, can be extracted from LuajitError's message
if it wasn't truncated due to being too long. For this particular
error it won't happen.

The patch introduces new method vshard.error.from_string() to
perform this extraction for its further usage in router.

Part of #298
olegrok added a commit to tarantool/cartridge that referenced this issue Jan 10, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.

Closes #1411
olegrok added a commit to tarantool/cartridge that referenced this issue Jan 10, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.

Closes #1411
olegrok added a commit to tarantool/cartridge that referenced this issue Feb 18, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.

Closes #1411
olegrok added a commit to tarantool/cartridge that referenced this issue Feb 19, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.

Closes #1411
olegrok added a commit to tarantool/cartridge that referenced this issue May 24, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.

Closes #1411
olegrok added a commit to tarantool/cartridge that referenced this issue May 24, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.

Closes #1411
yngvar-antonsson pushed a commit to tarantool/cartridge that referenced this issue May 24, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.

Closes #1411
yngvar-antonsson pushed a commit to tarantool/cartridge that referenced this issue May 24, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.

Closes #1411
yngvar-antonsson pushed a commit to tarantool/cartridge that referenced this issue May 31, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.

Closes #1411
yngvar-antonsson added a commit to tarantool/cartridge that referenced this issue May 31, 2022
In case of OperationError (config was unsuccessfully applied on
storage) we shouldn't perform request to such storage.
After this feature was implemented in vshard
(tarantool/vshard#298) we could just
disable vshard storage on such instances. For this purpose simple
trigger on_apply_config was implemented.
Co-authored-by: Igor Zolotarev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants