Skip to content

feat(ml): better multilingual search with nllb models #13567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 31, 2025
Merged

Conversation

mertalev
Copy link
Contributor

@mertalev mertalev commented Oct 17, 2024

Description

The NLLB models have the highest quality results for multilingual text in our model catalog. However, these models expect the input to explicitly indicate the language of the text. Since we don't provide this, the results are worse than other multilingual models.

This PR makes the machine learning service accept a language option that it maps to the corresponding FLORES200 token that NLLB expects. The server is updated to accept this parameter and forward it to the machine learning service, while web and mobile are updated to provide the language based on current user settings. Search still works without it, and the option only has an impact when using an NLLB model.

How Has This Been Tested?

I tested on web and can confirm the results for Turkish are significantly better. Changing the language in the user settings has a marked effect on the ranking. I haven't tested mobile so I'm not sure if all the language codes there line up with the map in the machine learning service.

Copy link
Contributor

@jrasm91 jrasm91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@@ -144,6 +144,7 @@
page: nextPage,
withExif: true,
isVisible: true,
language: $lang,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a new setting? I like to have my UI in English, but I prefer to use my native language for searching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we could have a Search Language user preference separate from the UI language.

Copy link
Contributor Author

@mertalev mertalev Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing is that the setting only applies to four non-default models, so I wonder if it'd be confusing to users. Maybe we can hide the setting if they're not using a relevant model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also have this be something the admin can override?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean making it use a certain search language by default for all users? That could be nice, possibly doing the same for the UI language as well. But I think we should avoid making too many changes at once in this PR, so the admin stuff can be done separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few related FRs came up that I consolidated into #13651.

@zackpollard
Copy link
Contributor

@mertalev I think this is good to go, but it needs rebasing, there's a bunch of conflicts. If you want it to go in still, can you work on that? 🙂

@mertalev
Copy link
Contributor Author

mertalev commented Mar 3, 2025

Sure! I got caught up trying to make it work in mobile, but the language codes for localization were different IIRC. The web portion should be good though after rebasing.

@mertalev mertalev marked this pull request as draft March 5, 2025 18:15
@mertalev mertalev marked this pull request as ready for review March 28, 2025 18:45
@mertalev
Copy link
Contributor Author

Tested on web and mobile with a variety of languages and confirmed the language is passed to the ML service, which uses it to pass the right language token to the NLLB model.

@bo0tzz
Copy link
Member

bo0tzz commented Mar 28, 2025

Do the model benchmarks account for this token being passed?

@mertalev
Copy link
Contributor Author

The results in the docs are when the language token is passed correctly. The models might not even make the list when no token is passed because the input is interpreted as English.

@mertalev mertalev merged commit 6789c2a into main Mar 31, 2025
96 of 99 checks passed
@mertalev mertalev deleted the feat/ml-flores branch March 31, 2025 15:06
kirill-dev-pro pushed a commit to kirill-dev-pro/immich-but-with-public-albums-within-instance that referenced this pull request Apr 6, 2025
savely-krasovsky pushed a commit to savely-krasovsky/immich that referenced this pull request Jun 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants