Support for multibyte characters in `ParserError` exception messages #755

terandard · 2025-02-23T07:44:59Z

Since v2.7.3, the size of ParserError exception messages is limited to 32 bytes.

Limit the size of ParserError exception messages #625

When using multibyte characters, the characters might be cut off in the middle of a byte.

> JSON.parse("あああああああああああああああああああああああ")
JSON::ParserError: unexpected character: 'ああああああああああ�'

This could potentially cause additional unexpected errors.

Would it be possible to change the limit from byte size to character count?

The text was updated successfully, but these errors were encountered:

byroot · 2025-02-23T07:46:05Z

That's a good point.

Fix: ruby#755 Error messages now include a snippet of the document that doesn't parse to help locate the issue, however the way it was done wasn't UTF-8 aware, and it could result in exception messages with truncated characters. It would be nice to go a bit farther and actually support codepoints, but it's a lot of complexity to do it in C, perhaps if we move that logic to Ruby given it's not a performance sensitive codepath.

terandard · 2025-02-27T00:02:07Z

Thanks for the quick response!
I'm looking forward to the release!

Fix: ruby/json#755 Error messages now include a snippet of the document that doesn't parse to help locate the issue, however the way it was done wasn't UTF-8 aware, and it could result in exception messages with truncated characters. It would be nice to go a bit farther and actually support codepoints, but it's a lot of complexity to do it in C, perhaps if we move that logic to Ruby given it's not a performance sensitive codepath. ruby/json@e144793b72

byroot added the bug label Feb 23, 2025

byroot mentioned this issue Feb 25, 2025

Ensure parser error snippets are valid UTF-8 #756

Merged

byroot closed this as completed in #756 Feb 26, 2025

byroot closed this as completed in e144793 Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multibyte characters in `ParserError` exception messages #755

Support for multibyte characters in `ParserError` exception messages #755

terandard commented Feb 23, 2025

byroot commented Feb 23, 2025

terandard commented Feb 27, 2025

Support for multibyte characters in ParserError exception messages #755

Support for multibyte characters in ParserError exception messages #755

Comments

terandard commented Feb 23, 2025

byroot commented Feb 23, 2025

terandard commented Feb 27, 2025

Support for multibyte characters in `ParserError` exception messages #755

Support for multibyte characters in `ParserError` exception messages #755