-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Replace percolate APIs with a percolator query #16349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2de0986
to
33c162a
Compare
This will also add lots of confusion, since percolation is a very confusing concept for people new to Elasticsearch and also to experienced users. Instead of having "percolator" a type of a query, can it be a sibling to query and highlight when using the search API?
I think this will go a long way in removing confusion. Another option is to leave the percolate API as is but internally use redirects so it will essentially use the search API via it's own Lucene Query type. |
I think part of this confusion is how the percolator feature is exposed right now? In the end what the percolator does is return documents that have a query defined under the I think this change rather removes confusion. Percolator queries are just document with a query field that is mapped as percolator field type in the mapping. The |
33c162a
to
a2f3c67
Compare
The confusion stems from the difference between what the percolator does (indexing a single doc via MemoryIndex and hitting it with many stored queries) to what 99.9% of Elasticsearch is about (indexing many documents and using a single query for search across them) to how percolator is presented to the public ("reverse search"; "indexing queries" and so on). The fact that stored queries can be filtered before execution via a query, doesn't make it much of a search. Just a filter stage before executing percolation (even if underneath a search is executed). Honestly, that's nothing a good documentation and careful working can't solve, and the current percolator documentation is quite good. What's missing is a clearer message (IMO, stop using the "reverse" terminology. Just explain technically what it does, the usage of the |
Right, the reverse part is that queries are stored as part of a document and a document is used to query queries. I still think that percolating via a query doesn't confuse things more. I think that form an api perspective the percolator doesn't need separate APIs just because how the percolator matches with queries (MemoryIndex, pre searching etc.). The I think that the special |
I agree with @synhershko, keeping a clear distinction of
But these have almost zero advantages to end users and instead actually causes unnecessary changes, confusion, etc. Instead I would suggest exploring other ways so that the |
I think you have to take a step back and forget about the implementation here. If you think about what the percolator does is:
now if you think of queries are documents (JSON) it's a perfect match for a search. It's conceptually exactly the same as MLT. You pass it a document and we process the terms in a way and return documents like this. Now the percolator query can process a document in a structured way and has a potentially costly collect method. It's like a geohash for prefiltering and then use a slower method to get exact matches. you have to plan the mind game of what would happen if we haven't had this API before. I think it's a very neat idea and might transport to users much better than a dedicated API. Anyway having a dedicated API is kinda required for BWC purposes but I think the implementation should go the path of using the search infra. We are also going that way somehow with suggest which is also a search at the end of the day.
I think this is far from correct!
|
I really like where this is going, using the search infrastructure to execute it is sooo much cleaner. As @s1monw said, we can keep the sugar percolator API on top of it, but on my end, I would have used it in the context of the search API, feels more natural (as much as possible, percolate is a mind bender :) ) |
I don't think keeping the percolate endpoint as a shortcut is a problem, and we will do it for bw comp anyways. But I think "A lot of requested features are now supported." is a big one for users, maybe we should list what these requested features are. |
I know that returning the percolator document source or part of it (via source filtering) is highly requested. Also pagination is a highly requested feature for the percolator. All these features would be supported. Also the following issues can be closed if this PR gets merged: |
a2f3c67
to
11855f7
Compare
0496013
to
552da7f
Compare
I've updated this PR (added tests & first stab at updating docs). I think it is ready for a review. |
552da7f
to
89058a3
Compare
@@ -781,9 +781,10 @@ The reason that this has changed is that on newly created indices the percolator | |||
and these query terms are used at percolate time to reduce the amount of queries the percolate API needs evaluate. | |||
This optimization didn't work in the percolate API mode where modifications to queries are immediately visible. | |||
|
|||
The percolator by defaults sets the `size` option to `10` whereas before this was set to unlimited. | |||
Percolator and multi percolate APIs have been deprecated and will be removed in the next major release. These APIs have | |||
been replaced by the `percolator` query that can be used the search and multi search APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be used IN the
49dd23f
to
fdc3ad9
Compare
builder.field(Fields._SCORE, match.getScore()); | ||
} | ||
if (match.getHighlightFields() != null) { | ||
if (match.getHighlightFields() != null && match.getHighlightFields().isEmpty() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make getHighlightFields always return a non-null value? (using Collections.emytyXXX if necessary)
I just did a 2nd deeper review and left some comments. |
@jpountz I've updated the PR. |
builder.docValues(true); | ||
builder.indexOptions(IndexOptions.NONE); | ||
builder.store(false); | ||
builder.fieldType().setDocValuesType(DocValuesType.BINARY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then maybe use a BinaryFieldMapper instead of KeywordFieldMapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I left some minor comments, otherwise LGTM. I'm looking forward to the follow-up PRs. :) |
3900ace
to
84f1af9
Compare
Also replaced the PercolatorQueryRegistry with the new PercolatorQueryCache. The PercolatorFieldMapper stores the rewritten form of each percolator query's xcontext in a binary doc values field. This make sure that the query rewrite happens only during indexing (some queries for example fetch shapes, terms in remote indices) and the speed up the loading of the queries in the percolator query cache. Because the percolator now works inside the search infrastructure a number of features (sorting fields, pagination, fetch features) are available out of the box. The following feature requests are automatically implemented via this refactoring: Closes elastic#10741 Closes elastic#7297 Closes elastic#13176 Closes elastic#13978 Closes elastic#11264 Closes elastic#10741 Closes elastic#4317
84f1af9
to
e3b7e5d
Compare
By replacing the percolator APIs by a percolator query means that percolating will be done via the search API. This has many advantages:
Using the percolator query in the search API:
Percolating an existing document:
The search response will now include hits of type
.percolator
.The percolate and mpercolate APIs still exist in this PR, but just these APIs just build a search request and delegates the request to the search API behind the scene for bwc reason. In the next major version after 3.0 these APIs can be removed.
This PR builds, but is not ready yet. Tests need to be added and the entire percolator docs need to be revised.
Closes #10741
Closes #7297
Closes #13176
Closes #13978
Closes #11264
Closes #10741
Closes #4317