Skip to content

Support embedders setting and other vector/hybrid search related configuration #677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Jul 10, 2025

Conversation

Mubelotix
Copy link
Collaborator

@Mubelotix Mubelotix commented Jul 10, 2025

Pull Request

Rework of #554

Related issue

Fixes #541
Fixes #612
Fixes #621
Fixes #646

What does this PR do?

  • Adds the required settings

    • with_embedders does use the same "API" (not using impl AsRef for items passed) as with_synonyms, as this is the closest existing
    • given set_embedders has not been implemented upstream (at least when I try to PATCH the object, it does not work)
    • only {get,reset}_embedders settings have been implemented.
      Said implementation goes with the work done in Implement vector search experimental feature v2 (v1.6) meilisearch-python#924
  • adds the hybrid field to search via the vector search to add an end-to-end test of this feature with the huggingface configuration.

    userProvided seens more brittle, but we may want change to this instead
    using userProvided instead would mean (at the cost of hardcoding stuff)
    => lower cpu effort
    => no higher timeout necceeseary
    => aligning with meilisearch/meilisearch to only have a test for userProvided)

TODO:

  • find a combination of semantic search model + configuration that does not fail the assumptions (see search testcase) spectacularly

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

  • New Features

    • Added support for hybrid and vector-based search, including semantic search configuration and the ability to retrieve document embeddings with search results.
    • Introduced embedder configuration management, allowing users to define, fetch, and reset embedders for advanced embedding and semantic search capabilities.
  • Documentation

    • Expanded documentation and examples for new embedder-related methods.
  • Tests

    • Added tests to verify hybrid search, vector retrieval, and embedder configuration functionalities.

CommanderStorm and others added 30 commits April 17, 2024 18:18

This comment was marked as spam.

@Mubelotix
Copy link
Collaborator Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Jul 10, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/settings.rs (1)

967-1011: Consider documenting the limitation of embedder updates.

The PR objectives mention that set_embedders via PATCH is not implemented. Consider adding a note in the documentation that embedders can only be updated through the set_settings method, not individually via a dedicated endpoint.

Add a note to the get_embedders documentation:

     /// Get [embedders](https://www.meilisearch.com/docs/learn/vector_search) of the [Index].
     ///
+    /// Note: To update embedders, use the `set_settings` method. There is no dedicated
+    /// `set_embedders` method as the PATCH endpoint is not functional.
+    ///
     /// ```
src/search.rs (1)

1722-1722: Fix the typo in the comment.

-        // Search for an Harry Potter that but with lorem ipsum's id
+        // Search for a Harry Potter but with lorem ipsum's id
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6de4acf and 5dcee1b.

📒 Files selected for processing (2)
  • src/search.rs (21 hunks)
  • src/settings.rs (13 hunks)
🧰 Additional context used
🧠 Learnings (2)
src/search.rs (1)
Learnt from: LukasKalbertodt
PR: meilisearch/meilisearch-rust#625
File: src/search.rs:368-370
Timestamp: 2025-06-12T13:28:23.700Z
Learning: In the Meilisearch Rust client, `SearchQuery` serializes its per-query federation settings under the key `federationOptions`; only the top-level multi-search parameter is named `federation`.
src/settings.rs (1)
Learnt from: LukasKalbertodt
PR: meilisearch/meilisearch-rust#625
File: src/search.rs:368-370
Timestamp: 2025-06-12T13:28:23.700Z
Learning: In the Meilisearch Rust client, `SearchQuery` serializes its per-query federation settings under the key `federationOptions`; only the top-level multi-search parameter is named `federation`.
🧬 Code Graph Analysis (1)
src/settings.rs (1)
src/indexes.rs (24)
  • client (185-187)
  • client (231-233)
  • client (319-321)
  • client (375-377)
  • client (427-429)
  • client (472-474)
  • client (521-524)
  • client (555-557)
  • client (633-635)
  • client (699-701)
  • client (969-971)
  • client (1037-1039)
  • client (1088-1090)
  • client (1132-1134)
  • index (2089-2089)
  • index (2106-2107)
  • index (2144-2144)
  • index (2170-2170)
  • index (2198-2198)
  • index (2224-2224)
  • index (2251-2251)
  • new (80-88)
  • new (1730-1736)
  • new (1912-1918)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: integration-tests
🔇 Additional comments (19)
src/settings.rs (10)

50-60: LGTM!

The EmbedderSource enum is well-structured with appropriate derive macros and serialization attributes.


62-67: LGTM!

The EmbedderDistribution struct correctly models statistical distribution parameters.


208-210: LGTM!

The embedders field is properly integrated into the Settings struct with correct serialization attributes.


432-447: LGTM!

The with_embedders method is correctly implemented. The use of Into<String> trait bound (instead of AsRef<str> like other builder methods) appears intentional per the PR objectives.


967-1011: LGTM!

The get_embedders method is well-implemented with proper error handling and converts None responses to empty HashMap for consistent API behavior.


2360-2391: LGTM!

The reset_embedders method correctly implements the DELETE endpoint for resetting embedder settings.


945-945: LGTM!

Documentation formatting fixes improve consistency across examples.

Also applies to: 1661-1661, 1789-1789, 2305-2305, 2338-2338


2597-2602: LGTM!

Test correctly verifies that embedders are empty by default.


2670-2678: LGTM!

Test properly verifies the reset embedders functionality.


2856-2872: LGTM!

Test correctly validates setting embedders through the set_settings method, which is the recommended approach per the PR objectives.

src/search.rs (9)

156-166: LGTM!

The HybridSearch struct is well-designed with clear documentation and appropriate field types.


377-387: LGTM!

The new optional fields for hybrid and vector search are properly typed and documented, following the existing pattern in the codebase.

Also applies to: 417-419


521-528: LGTM!

The with_retrieve_vectors builder method follows the established pattern and has clear documentation.


641-652: LGTM!

The with_hybrid builder method is well-documented and provides a convenient API by accepting parameters directly rather than requiring a struct.


654-663: LGTM!

The with_vector builder method is properly documented with clear guidance on when to use it.


942-973: LGTM!

The vector test structures are well-designed with proper serialization attributes and a convenient From implementation for test data creation.


984-989: LGTM!

The test helper functions vectorize and setup_hybrid_searching are well-implemented. The vector dimension (11) is consistently used across both functions.

Also applies to: 1016-1032


1691-1715: LGTM!

The test_with_vectors integration test properly verifies both cases of the retrieve_vectors flag.


1717-1738: LGTM!

The test_hybrid integration test effectively validates the hybrid search functionality, correctly asserting the expected document order when using semantic search.

@Mubelotix Mubelotix merged commit 38b7571 into dev Jul 10, 2025
6 of 7 checks passed
@Mubelotix Mubelotix deleted the hybrid branch July 10, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants