Skip to content

Fix ZTS crashes with persistent resources in modules #13381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 13, 2024

Conversation

nielsdos
Copy link
Member

On shutdown in ZTS the following happens:

Notice that in the last step, the modules that the persistent resource belong to may already have been destroyed. This means that accessing globals will cause a crash (I previously fixed that for statically loaded modules with ifdef magic), or when the module is dynamically loaded we'll try jumping to a destructor that is no longer loaded in memory. These scenarios cause crashes.

It's not possible to move the ts_free_id call upwards, because that may break assumptions of callers, and furthermore this would deallocate the executor globals structure, which means that any access to those will cause a segfault.

This patch adds a new API to the TSRM that allows running a callback on a certain resource type. We use this API to destroy the persistent resources in all threads prior to the module destruction, and keep the rest of the executor dtor intact.

I verified this fix on Apache with postgres, tested both with dynamically and statically loaded modules.

I tried to think of other ideas, but I couldn't find other simple ones that won't break BC.

Important: This patch is only for master because it changes public API and might be slightly dangerous.
For stable branches I propose https://gist.github.com/nielsdos/338774e915a1b528eaa8bd68ba7cc54c which is safer and does not change public API.

Fixes GH-12974. Although the issue only mentions postgres, other modules are affected too.

On shutdown in ZTS the following happens:
- https://github.com/php/php-src/blob/master/Zend/zend.c#L1124-L1125
  gets executed. This destroys global persistent resources and destroys
  the modules. Furthermore, the modules are unloaded too.
- Further down, `ts_free_id(executor_globals_id)` gets executed, which
  calls `executor_globals_dtor`. This function destroys persistent
  resources for each thread.

Notice that in the last step, the modules that the persistent resource
belong to may already have been destroyed. This means that accessing
globals will cause a crash (I previously fixed this with ifdef magic),
or when the module is dynamically loaded we'll try jumping to a
destructor that is no longer loaded in memory. These scenarios cause
crashes.

It's not possible to move the `ts_free_id` call upwards, because that
may break assumptions of callers, and furthermore this would deallocate
the executor globals structure, which means that any access to those
will cause a segfault.

This patch adds a new API to the TSRM that allows running a callback on
a certain resource type. We use this API to destroy the persistent
resources in all threads prior to the module destruction, and keep the
rest of the resource dtor intact.

I verified this fix on Apache with postgres, both dynamically and
statically.

Fixes phpGH-12974.
Zend/zend.c Outdated
Comment on lines 1130 to 1133
zend_destroy_rsrc_list(&EG(persistent_list));
#ifdef ZTS
ts_callback_id(executor_globals_id, executor_globals_persistent_list_dtor);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably may be changed into (I'm not completely sure)

#ifndef ZTS
	zend_destroy_rsrc_list(&EG(persistent_list));
#else
	ts_callback_id(executor_globals_id, executor_globals_persistent_list_dtor);
#endif

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just quickly tested this by printing out all the lists that get destroyed, and I see that it now misses the global_persistent_list whereas it previously was destroyed by zend_destroy_rsrc_list(&EG(persistent_list));. So this seems wrong.

TSRM/TSRM.h Outdated
Comment on lines 107 to 109
/* Runs a callback on all resources of the given id.
* The caller is responsible for ensuring the underlying resources don't data-race. */
TSRM_API void ts_callback_id(ts_rsrc_id id, void (*cb)(void *));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to think about a better name. e.g. ts_apply_for_id()

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks right.

I thought, that extensions might be responsible to destroy their resources in all threads before unloading (through GSHUTDOWN, but this may be to later). Anyway, this would only "fix" extensions loaded by dl(), and this is already not compatible with ZTS.

Please, think in this direction a bit.
In any case the patch looks good enough.

@nielsdos
Copy link
Member Author

I had thought of destroying persistent resources via module shutdown code, but it seems modules currently do not do that.
I thought of using zend_clean_module_rsrc_dtors but that API is not public.
I also thought that it is the engine's responsibility to destroy those resources because of the code for destroying persistent resources in zend_shutdown.

@nielsdos nielsdos merged commit 5941cda into php:master Feb 13, 2024
nielsdos added a commit to nielsdos/php-src that referenced this pull request Feb 13, 2024
For master (8.4-dev) I merged phpGH-13381. But that PR changes public API
of TSRM, so cannot be used on lower branches.

This patch is a safe workaround for the issue, in combination with a
pre-existing fix using `ifdef ZTS + if (module_started)` inside pgsql
and odbc. The idea is to delay unloading modules until the persistent
resources are destroyed. This will keep the destructor code accessible
in memory.

This is not a proper fix on its own, because we still need the
workaround of not accessing globals after module destruction.
The proper fix is in master.
nielsdos added a commit that referenced this pull request Feb 20, 2024
For master (8.4-dev) I merged GH-13381. But that PR changes public API
of TSRM, so cannot be used on lower branches.

This patch is a safe workaround for the issue, in combination with a
pre-existing fix using `ifdef ZTS + if (module_started)` inside pgsql
and odbc. The idea is to delay unloading modules until the persistent
resources are destroyed. This will keep the destructor code accessible
in memory.

This is not a proper fix on its own, because we still need the
workaround of not accessing globals after module destruction.
The proper fix is in master.

Closes GH-13388.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apache crashes on shutdown when using pg_pconnect()
2 participants