Skip to content

Support for multibyte characters in ParserError exception messages #755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
terandard opened this issue Feb 23, 2025 · 2 comments · Fixed by #756
Closed

Support for multibyte characters in ParserError exception messages #755

terandard opened this issue Feb 23, 2025 · 2 comments · Fixed by #756
Labels

Comments

@terandard
Copy link

Since v2.7.3, the size of ParserError exception messages is limited to 32 bytes.

When using multibyte characters, the characters might be cut off in the middle of a byte.

> JSON.parse("あああああああああああああああああああああああ")
JSON::ParserError: unexpected character: 'ああああああああああ�'

This could potentially cause additional unexpected errors.

Would it be possible to change the limit from byte size to character count?

@byroot byroot added the bug label Feb 23, 2025
@byroot
Copy link
Member

byroot commented Feb 23, 2025

That's a good point.

byroot added a commit to byroot/json that referenced this issue Feb 25, 2025
Fix: ruby#755

Error messages now include a snippet of the document
that doesn't parse to help locate the issue, however
the way it was done wasn't UTF-8 aware, and it could
result in exception messages with truncated characters.

It would be nice to go a bit farther and actually support
codepoints, but it's a lot of complexity to do it in C,
perhaps if we move that logic to Ruby given it's not a
performance sensitive codepath.
@byroot byroot closed this as completed in e144793 Feb 26, 2025
@terandard
Copy link
Author

Thanks for the quick response!
I'm looking forward to the release!

matzbot pushed a commit to ruby/ruby that referenced this issue Feb 27, 2025
Fix: ruby/json#755

Error messages now include a snippet of the document
that doesn't parse to help locate the issue, however
the way it was done wasn't UTF-8 aware, and it could
result in exception messages with truncated characters.

It would be nice to go a bit farther and actually support
codepoints, but it's a lot of complexity to do it in C,
perhaps if we move that logic to Ruby given it's not a
performance sensitive codepath.

ruby/json@e144793b72
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants