Replace percolate APIs with a percolator query #16349

martijnvg · 2016-02-01T17:27:22Z

By replacing the percolator APIs by a percolator query means that percolating will be done via the search API. This has many advantages:

Removing a lot of code
A lot of requested features are now supported.
There will never be a need to sync the search and percolate apis.

Using the percolator query in the search API:

"query" : {
   "percolator" : {
        "doc_type" : "type",
        "source" : {
            "field1" : "value",
            ...
        }
    }
}

Percolating an existing document:

"query" : {
   "percolator" : {
        "doc_type" : "type",
        "get" : {
            "index" : "_index",
            "type" : "_type",
            "id" : "_id"
        }
    }
}

The search response will now include hits of type .percolator.

The percolate and mpercolate APIs still exist in this PR, but just these APIs just build a search request and delegates the request to the search API behind the scene for bwc reason. In the next major version after 3.0 these APIs can be removed.

This PR builds, but is not ready yet. Tests need to be added and the entire percolator docs need to be revised.

Closes #10741
Closes #7297
Closes #13176
Closes #13978
Closes #11264
Closes #10741
Closes #4317

synhershko · 2016-02-05T14:26:05Z

This will also add lots of confusion, since percolation is a very confusing concept for people new to Elasticsearch and also to experienced users. Instead of having "percolator" a type of a query, can it be a sibling to query and highlight when using the search API?

POST _search
{
   "percolator" : {
        "doc_type" : "type",
        "source" : {
            "field1" : "value",
            ...
        }
    }
}

I think this will go a long way in removing confusion. Another option is to leave the percolate API as is but internally use redirects so it will essentially use the search API via it's own Lucene Query type.

martijnvg · 2016-02-05T14:59:18Z

I think part of this confusion is how the percolator feature is exposed right now? In the end what the percolator does is return documents that have a query defined under the query field. All of these documents are being evaluated if its query matches with the provided document.

I think this change rather removes confusion. Percolator queries are just document with a query field that is mapped as percolator field type in the mapping. The percolator query can be used to match these queries.

synhershko · 2016-02-06T18:19:14Z

The confusion stems from the difference between what the percolator does (indexing a single doc via MemoryIndex and hitting it with many stored queries) to what 99.9% of Elasticsearch is about (indexing many documents and using a single query for search across them) to how percolator is presented to the public ("reverse search"; "indexing queries" and so on).

The fact that stored queries can be filtered before execution via a query, doesn't make it much of a search. Just a filter stage before executing percolation (even if underneath a search is executed).

Honestly, that's nothing a good documentation and careful working can't solve, and the current percolator documentation is quite good. What's missing is a clearer message (IMO, stop using the "reverse" terminology. Just explain technically what it does, the usage of the .percolator type is confusing enough), and then also keeping this in a separate API. Bringing it within the search API would IMO grow the confusion even bigger.

martijnvg · 2016-02-08T08:03:37Z

Right, the reverse part is that queries are stored as part of a document and a document is used to query queries. I still think that percolating via a query doesn't confuse things more. I think that form an api perspective the percolator doesn't need separate APIs just because how the percolator matches with queries (MemoryIndex, pre searching etc.). The query field is a special field that can only be matched with the source of a document, which can be specified in the percolator query.

I think that the special .percolator type should go away eventually and if someone wants to store queries they would need to configure the percolator field type in the mapping.

djschny · 2016-02-09T13:56:02Z

I agree with @synhershko, keeping a clear distinction of _percolate really is much more elegant. Throwing everything under _search really muddies the water and the line of separation. We are placing internal concerns ahead of usability in this situation. The advantages of this are declared as:

Removing a lot of code
A lot of requested features are now supported.
There will never be a need to sync the search and percolate apis.

But these have almost zero advantages to end users and instead actually causes unnecessary changes, confusion, etc. Instead I would suggest exploring other ways so that the _percolate endpoint code is extremely simple and behind the scenes delegates to existing backend business logic, so that way the duplicated code, syncing, etc. is avoided.

s1monw · 2016-02-09T14:22:30Z

The fact that stored queries can be filtered before execution via a query, doesn't make it much of a search. Just a filter stage before executing percolation (even if underneath a search is executed).

I think you have to take a step back and forget about the implementation here. If you think about what the percolator does is:

give me all the queries that match a given document

now if you think of queries are documents (JSON) it's a perfect match for a search. It's conceptually exactly the same as MLT. You pass it a document and we process the terms in a way and return documents like this. Now the percolator query can process a document in a structured way and has a potentially costly collect method. It's like a geohash for prefiltering and then use a slower method to get exact matches.

you have to plan the mind game of what would happen if we haven't had this API before. I think it's a very neat idea and might transport to users much better than a dedicated API. Anyway having a dedicated API is kinda required for BWC purposes but I think the implementation should go the path of using the search infra. We are also going that way somehow with suggest which is also a search at the end of the day.

But these have almost zero advantages to end users and instead actually causes unnecessary changes,

I think this is far from correct!

A lot of requested features are now supported. this is huge
There will never be a need to sync the search and percolate apis. has been a massive source of bugs in the past
Removing a lot of code this is very important growth wise for the project. calling this not an advantage to the user is short sighted.

kimchy · 2016-02-09T14:27:08Z

I really like where this is going, using the search infrastructure to execute it is sooo much cleaner. As @s1monw said, we can keep the sugar percolator API on top of it, but on my end, I would have used it in the context of the search API, feels more natural (as much as possible, percolate is a mind bender :) )

javanna · 2016-02-09T14:45:15Z

I don't think keeping the percolate endpoint as a shortcut is a problem, and we will do it for bw comp anyways. But I think "A lot of requested features are now supported." is a big one for users, maybe we should list what these requested features are.

martijnvg · 2016-02-09T15:10:11Z

I know that returning the percolator document source or part of it (via source filtering) is highly requested. Also pagination is a highly requested feature for the percolator. All these features would be supported.

Also the following issues can be closed if this PR gets merged:
#10741
#7297
#13176
#13978
#11264
#10741
#4317

martijnvg · 2016-03-08T10:57:47Z

I've updated this PR (added tests & first stab at updating docs). I think it is ready for a review.

clintongormley · 2016-03-08T11:52:16Z

docs/reference/migration/migrate_5_0.asciidoc

@@ -781,9 +781,10 @@ The reason that this has changed is that on newly created indices the percolator
 and these query terms are used at percolate time to reduce the amount of queries the percolate API needs evaluate.
 This optimization didn't work in the percolate API mode where modifications to queries are immediately visible.

-The percolator by defaults sets the `size` option to `10` whereas before this was set to unlimited.
+Percolator and multi percolate APIs have been deprecated and will be removed in the next major release. These APIs have
+been replaced by the `percolator` query that can be used the search and multi search APIs.


can be used IN the

jpountz · 2016-03-17T16:08:15Z

core/src/main/java/org/elasticsearch/action/percolate/PercolateResponse.java

                        builder.field(Fields._SCORE, match.getScore());
                    }
-                    if (match.getHighlightFields() != null) {
+                    if (match.getHighlightFields() != null && match.getHighlightFields().isEmpty() == false) {


Can we make getHighlightFields always return a non-null value? (using Collections.emytyXXX if necessary)

jpountz · 2016-03-18T08:38:10Z

I just did a 2nd deeper review and left some comments.

martijnvg · 2016-03-18T13:20:36Z

@jpountz I've updated the PR.

jpountz · 2016-03-18T20:56:50Z

core/src/main/java/org/elasticsearch/index/percolator/PercolatorFieldMapper.java

+            builder.docValues(true);
+            builder.indexOptions(IndexOptions.NONE);
+            builder.store(false);
+            builder.fieldType().setDocValuesType(DocValuesType.BINARY);


then maybe use a BinaryFieldMapper instead of KeywordFieldMapper?

jpountz · 2016-03-18T21:10:22Z

I left some minor comments, otherwise LGTM. I'm looking forward to the follow-up PRs. :)

Also replaced the PercolatorQueryRegistry with the new PercolatorQueryCache. The PercolatorFieldMapper stores the rewritten form of each percolator query's xcontext in a binary doc values field. This make sure that the query rewrite happens only during indexing (some queries for example fetch shapes, terms in remote indices) and the speed up the loading of the queries in the percolator query cache. Because the percolator now works inside the search infrastructure a number of features (sorting fields, pagination, fetch features) are available out of the box. The following feature requests are automatically implemented via this refactoring: Closes elastic#10741 Closes elastic#7297 Closes elastic#13176 Closes elastic#13978 Closes elastic#11264 Closes elastic#10741 Closes elastic#4317

martijnvg added WIP :Search Relevance/Percolator Reverse search: find queries that match a document labels Feb 1, 2016

martijnvg force-pushed the percolate_rewrite branch 2 times, most recently from 2de0986 to 33c162a Compare February 1, 2016 19:10

martijnvg mentioned this pull request Feb 3, 2016

Add doc match score support to percolate api #13827

Closed

martijnvg force-pushed the percolate_rewrite branch from 33c162a to a2f3c67 Compare February 6, 2016 13:25

clintongormley mentioned this pull request Feb 8, 2016

Refactor parsing of queries/filters, aggs, suggester APIs #10217

Closed

martijnvg force-pushed the percolate_rewrite branch from a2f3c67 to 11855f7 Compare February 21, 2016 18:20

martijnvg force-pushed the percolate_rewrite branch 3 times, most recently from 0496013 to 552da7f Compare March 8, 2016 10:56

martijnvg added review v5.0.0-alpha1 and removed WIP labels Mar 8, 2016

martijnvg added the >enhancement label Mar 8, 2016

martijnvg force-pushed the percolate_rewrite branch from 552da7f to 89058a3 Compare March 8, 2016 11:46

clintongormley reviewed Mar 8, 2016
View reviewed changes

martijnvg force-pushed the percolate_rewrite branch from 49dd23f to fdc3ad9 Compare March 17, 2016 11:45

jpountz reviewed Mar 17, 2016
View reviewed changes

jpountz reviewed Mar 18, 2016
View reviewed changes

martijnvg force-pushed the percolate_rewrite branch 3 times, most recently from 3900ace to 84f1af9 Compare March 21, 2016 11:01

martijnvg force-pushed the percolate_rewrite branch from 84f1af9 to e3b7e5d Compare March 21, 2016 11:36

martijnvg merged commit e3b7e5d into elastic:master Mar 21, 2016

martijnvg mentioned this pull request Mar 24, 2016

Percolator not supporting nested aggregations #16711

Closed

john-wagster mentioned this pull request Jan 28, 2025

Percolator is much slower than in ES1, and pre-selecting do not work #114392

Open

Replace percolate APIs with a percolator query #16349

Replace percolate APIs with a percolator query #16349

Uh oh!

Conversation

martijnvg commented Feb 1, 2016

Uh oh!

synhershko commented Feb 5, 2016

Uh oh!

martijnvg commented Feb 5, 2016

Uh oh!

synhershko commented Feb 6, 2016

Uh oh!

martijnvg commented Feb 8, 2016

Uh oh!

djschny commented Feb 9, 2016

Uh oh!

s1monw commented Feb 9, 2016

Uh oh!

kimchy commented Feb 9, 2016

Uh oh!

javanna commented Feb 9, 2016

Uh oh!

martijnvg commented Feb 9, 2016

Uh oh!

martijnvg commented Mar 8, 2016

Uh oh!

clintongormley Mar 8, 2016

Choose a reason for hiding this comment

Uh oh!

jpountz Mar 17, 2016

Choose a reason for hiding this comment

Uh oh!

jpountz commented Mar 18, 2016

Uh oh!

martijnvg commented Mar 18, 2016

Uh oh!

jpountz Mar 18, 2016

Choose a reason for hiding this comment

Uh oh!

martijnvg Mar 18, 2016

Choose a reason for hiding this comment

Uh oh!

jpountz commented Mar 18, 2016

Uh oh!

Uh oh!