Skip to content

Fix match_only_text keyword multi-field bug #131383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jordan-powers
Copy link
Contributor

In #131314 we fixed match_only_text fields with ignore_above keyword multi-fields in the case that the keyword multi-field is stored. However, the issue is still present if the keyword field is not stored, but instead has doc values.

This patch fixes that case.

Follow-up to #131314.

@jordan-powers jordan-powers self-assigned this Jul 16, 2025
@jordan-powers jordan-powers added >non-issue auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings v8.19.0 v9.1.0 v9.2.0 labels Jul 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- match: { "hits.total.value": 1 }
- match:
hits.hits.0._source.foo: "Apache Lucene powers Elasticsearch"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to convince myself that there won't ever be duplicate values when some values come from doc_values and some from the original field. It might be nice to have a test that covers this case. Something like:

synthetic_source match_only_text as multi-field with ignored stored keyword as parent with multiple values:
  - do:
      indices.create:
        index: synthetic_source_test
        body:
          settings:
            index:
              mapping.source.mode: synthetic
          mappings:
            properties:
              foo:
                type: keyword
                store: false
                doc_values: true
                ignore_above: 10
                fields:
                  text:
                    type: match_only_text

  - do:
      index:
        index: synthetic_source_test
        id: "1"
        refresh: true
        body:
          foo: ["Apache Lucene powers Elasticsearch", "Apache"]

  - do:
      search:
        index: synthetic_source_test
        body:
          query:
            match_phrase:
              foo.text: apache lucene

  - match: { "hits.total.value": 1 }
  - match:
      hits.hits.0._source.foo: ["Apache", "Apache Lucene powers Elasticsearch"]

Copy link
Contributor

@parkertimmins parkertimmins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one test suggestion, but looks good. Nice work!

@jordan-powers jordan-powers enabled auto-merge (squash) July 16, 2025 19:38
@jordan-powers jordan-powers force-pushed the fix_match_only_text_multi_fields_3 branch from 16a8e7f to ea6e60f Compare July 17, 2025 02:04
@jordan-powers jordan-powers disabled auto-merge July 17, 2025 02:06
@jordan-powers jordan-powers merged commit 7a01565 into elastic:main Jul 17, 2025
34 checks passed
jordan-powers added a commit to jordan-powers/elasticsearch that referenced this pull request Jul 17, 2025
In elastic#131314 we fixed match_only_text fields with ignore_above keyword
multi-fields in the case that the keyword multi-field is stored. However,
the issue is still present if the keyword field is not stored, but instead
has doc values.

This patch fixes that case.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19
9.1

jordan-powers added a commit to jordan-powers/elasticsearch that referenced this pull request Jul 17, 2025
In elastic#131314 we fixed match_only_text fields with ignore_above keyword
multi-fields in the case that the keyword multi-field is stored. However,
the issue is still present if the keyword field is not stored, but instead
has doc values.

This patch fixes that case.
elasticsearchmachine pushed a commit that referenced this pull request Jul 17, 2025
In #131314 we fixed match_only_text fields with ignore_above keyword
multi-fields in the case that the keyword multi-field is stored. However,
the issue is still present if the keyword field is not stored, but instead
has doc values.

This patch fixes that case.
elasticsearchmachine pushed a commit that referenced this pull request Jul 17, 2025
In #131314 we fixed match_only_text fields with ignore_above keyword
multi-fields in the case that the keyword multi-field is stored. However,
the issue is still present if the keyword field is not stored, but instead
has doc values.

This patch fixes that case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >non-issue :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.0 v9.1.0 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants