Skip to content

PDFBOX-5747: Surrogate pairs with combining diacritics are incorrectly ordered on text extraction #200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

reckart
Copy link
Member

@reckart reckart commented Jan 26, 2025

  • Changed TextPosition.insertDiacritic() to preserve surrogate pairs
  • Added unit test
  • Included example test PDF file attached to PDFBOX-5747

…y ordered on text extraction

- Changed TextPosition.insertDiacritic() to preserve surrogate pairs
- Added unit test
- Included example test PDF file attached to PDFBOX-5747
@reckart reckart force-pushed the bugfix/PDFBOX-5747-Surrogate-pairs-with-combining-diacritics-are-incorrectly-ordered-on-text-extraction branch from a7e4da0 to 0841f61 Compare January 26, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant