Skip to content

Make building of search results work for multi-byte encoded characters #3113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Kristian-Krastev
Copy link

When a search is made, the information that is shown in every result snippet is taken from database tables 'bookshelves', 'books', 'chapters', 'pages', but the string data in them may be encoded in format different than one-to-one byte format. For example for every cyrillic character are used up to four bytes for representation.

Operations on strings in this context are not multi-byte safe and the returned snippets contain 'broken' information.
I think the multibyte string methods are a good alternative for the solution of that problem.

@ssddanbrown
Copy link
Member

Thanks @Kristian-Krastev for offering this PR. Could you provide a minimal example of content and search term that causes breakage? Would help so I can add a test case to prevent regression.

@Kristian-Krastev
Copy link
Author

Sure,
i have created a page with content:

На мен ми трябва нещо добро
Вкарай ги готовите в мойто число
Младо маняче за милион
Накрая да забравиме кво е било

and in the input search field i enter (the third row from the content):

Младо маняче за милион

@ssddanbrown
Copy link
Member

Thanks for confirming the content! PR now merged for next feature release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants