Skip to content

Enhance zend_dump_op_array to Properly Represent Non-Printable Characters in String Literals #15680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WangYihang opened this issue Aug 31, 2024 · 6 comments

Comments

@WangYihang
Copy link

WangYihang commented Aug 31, 2024

Description

Description:

While working with the zend_dump_op_array function in PHP's Zend Engine, I noticed that non-printable characters in string literals are not always represented in their escaped (readable) form. Currently, the function escapes 1 only a limited set of characters by calling:

zend_string *escaped_string = php_addcslashes(Z_STR_P(zv), "\"\\", 2);

This can result in the direct output of characters such as newlines, tabs, and null bytes, which might make the dump less clear and harder to debug.

For example, for the following php code:

<?php

// irrelevant code omited

fwrite($buffer, "--\n");

Current Output:

0040 INIT_FCALL 2 112 string("fwrite")
0041 SEND_VAR CV6($buffer) 1
0042 SEND_VAL string("--
") 2                                     <------ new line (\n) is directly printed
0043 DO_ICALL

Expected Output:

I believe that modifying the function to escape all non-printable characters would enhance the clarity of the output. This could be achieved by updating the call to:

php_addcslashes(Z_STR_P(zv), "\\\"\n\t\0", 5);

With this change, the output might look like:

0040 INIT_FCALL 2 112 string("fwrite")
0041 SEND_VAR CV6($buffer) 1
0042 SEND_VAL string("--\n") 2           <------ properly escaped with readable representations
0043 DO_ICALL

Proposal:

I propose a small enhancement to the zend_dump_op_array function to ensure that all non-printable characters are properly represented in the output. This adjustment would improve readability and make debugging easier, without significantly altering the existing functionality.

Benefits:

Enhanced Readability: Ensures that non-printable characters are clearly represented, making the output easier to understand.
Improved Debugging: Developers can quickly identify and interpret all characters within string literals.
Minor Change: The proposed adjustment is a minimal change to the existing codebase but offers significant benefits.

I would greatly appreciate feedback from the community on this suggestion.

References

Footnotes

  1. https://github.com/php/php-src/blob/82c504fa9c36bed95dc28e79b2374889d8a952e7/Zend/Optimizer/zend_dump.c#L70

@WangYihang
Copy link
Author

WangYihang commented Aug 31, 2024

Additionally, I believe it would improve debugging if all non-printable characters were consistently marked for escaping.

If there is agreement, I would be happy to submit a pull request to implement this improvement.

@devnexen
Copy link
Member

devnexen commented Sep 1, 2024

Additionally, I believe it would improve debugging if all non-printable characters were consistently marked for escaping.

If there is agreement, I would be happy to submit a pull request to implement this improvement.

I'd say go ahead and submit a PR :)

@WangYihang
Copy link
Author

WangYihang commented Sep 3, 2024

It seems there’s already a related pull request [1-2] that attempted to address this issue. However, using php_addcslashes to quote all non-printable characters isn’t ideal, as it requires a long string to specify the characters to be quoted.

I propose two potential solutions:

  • Solution I: Create a New Function: Develop a dedicated function similar to Python’s repr, specifically designed to handle all non-printable characters. While there is a var_representation [3] function that appears to do something similar, its implementation is not present in the php-src codebase, so we cannot reuse it.

  • Solution II: Extend php_addcslashes_str: Modify the php_addcslashes_str function to make the flags variable more flexible, allowing it to automatically handle non-printable characters. This approach minimizes changes to the existing codebase while enhancing functionality.

@devnexen Which of these approaches do you think would be more suitable, or is there another direction you would recommend for implementing this functionality? @SerafimArts @dstogov

WangYihang added a commit to WangYihang/php-src that referenced this issue Sep 3, 2024
…acters (phpGH-15680)

This change enhances `zend_dump_op_array` to properly represent non-printable characters in strings. This is useful for debugging purposes, as it allows developers to see the actual content of strings that contain non-printable characters.
@WangYihang
Copy link
Author

WangYihang commented Sep 3, 2024

Implemented and passed tests in #15730 (Using Solution I), please kindly review, thanks for your time.

@SerafimArts
Copy link
Contributor

SerafimArts commented Sep 4, 2024

Which of these approaches do you think would be more suitable, or is there another direction you would recommend for implementing this functionality?

@WangYihang My pull request was aimed at providing the ability to programmatically read such output without parsing mistakes/errors.

I implement the lang server plugin, which showed the bytecode that PHP generates next to the code of functions and classes, which allows you to simply brute-force optimize time critical sections of code right in the IDE (since the opcode result is immediately visible, including optimization steps). And I simply couldn't find any other simple solution other than parsing this output)))

That is, at that time it was not critical to me whether there were line breaks or not, since this does not interfere with the programmatic parsing of the opcode output. The main thing is that it starts with a quotation mark and ends with one.

nielsdos added a commit to nielsdos/php-src that referenced this issue Dec 26, 2024
…nt Non-Printable Characters in String Literals

Replaces phpGH-15730 as that PR became stale.

But instead of introducing a new helper, reuse
smart_str_append_escaped(), this also removes the dependency on
ext/standard.
nielsdos added a commit to nielsdos/php-src that referenced this issue Dec 26, 2024
…nt Non-Printable Characters in String Literals

Replaces phpGH-15730 as that PR became stale.

But instead of introducing a new helper, reuse
smart_str_append_escaped(), this also removes the dependency on
ext/standard.
nielsdos added a commit to nielsdos/php-src that referenced this issue Dec 26, 2024
…nt Non-Printable Characters in String Literals

Replaces phpGH-15730 as that PR became stale.

But instead of introducing a new helper, reuse
smart_str_append_escaped(), this also removes the dependency on
ext/standard.
nielsdos added a commit that referenced this issue Dec 27, 2024
…Non-Printable Characters in String Literals

Replaces GH-15730 as that PR became stale.

But instead of introducing a new helper, reuse
smart_str_append_escaped(), this also removes the dependency on
ext/standard.

Closes GH-15730.
Closes GH-17277.
@nielsdos
Copy link
Member

Implemented via 55afe8b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants