Skip to content

Conversation

martijnvg
Copy link
Member

By replacing the percolator APIs by a percolator query means that percolating will be done via the search API. This has many advantages:

  • Removing a lot of code
  • A lot of requested features are now supported.
  • There will never be a need to sync the search and percolate apis.

Using the percolator query in the search API:

"query" : {
   "percolator" : {
        "doc_type" : "type",
        "source" : {
            "field1" : "value",
            ...
        }
    }
}

Percolating an existing document:

"query" : {
   "percolator" : {
        "doc_type" : "type",
        "get" : {
            "index" : "_index",
            "type" : "_type",
            "id" : "_id"
        }
    }
}

The search response will now include hits of type .percolator.

The percolate and mpercolate APIs still exist in this PR, but just these APIs just build a search request and delegates the request to the search API behind the scene for bwc reason. In the next major version after 3.0 these APIs can be removed.

This PR builds, but is not ready yet. Tests need to be added and the entire percolator docs need to be revised.

Closes #10741
Closes #7297
Closes #13176
Closes #13978
Closes #11264
Closes #10741
Closes #4317

@martijnvg martijnvg added WIP :Search Relevance/Percolator Reverse search: find queries that match a document labels Feb 1, 2016
@martijnvg martijnvg force-pushed the percolate_rewrite branch 2 times, most recently from 2de0986 to 33c162a Compare February 1, 2016 19:10
@synhershko
Copy link
Contributor

This will also add lots of confusion, since percolation is a very confusing concept for people new to Elasticsearch and also to experienced users. Instead of having "percolator" a type of a query, can it be a sibling to query and highlight when using the search API?

POST _search
{
   "percolator" : {
        "doc_type" : "type",
        "source" : {
            "field1" : "value",
            ...
        }
    }
}

I think this will go a long way in removing confusion. Another option is to leave the percolate API as is but internally use redirects so it will essentially use the search API via it's own Lucene Query type.

@martijnvg
Copy link
Member Author

I think part of this confusion is how the percolator feature is exposed right now? In the end what the percolator does is return documents that have a query defined under the query field. All of these documents are being evaluated if its query matches with the provided document.

I think this change rather removes confusion. Percolator queries are just document with a query field that is mapped as percolator field type in the mapping. The percolator query can be used to match these queries.

@synhershko
Copy link
Contributor

The confusion stems from the difference between what the percolator does (indexing a single doc via MemoryIndex and hitting it with many stored queries) to what 99.9% of Elasticsearch is about (indexing many documents and using a single query for search across them) to how percolator is presented to the public ("reverse search"; "indexing queries" and so on).

The fact that stored queries can be filtered before execution via a query, doesn't make it much of a search. Just a filter stage before executing percolation (even if underneath a search is executed).

Honestly, that's nothing a good documentation and careful working can't solve, and the current percolator documentation is quite good. What's missing is a clearer message (IMO, stop using the "reverse" terminology. Just explain technically what it does, the usage of the .percolator type is confusing enough), and then also keeping this in a separate API. Bringing it within the search API would IMO grow the confusion even bigger.

@martijnvg
Copy link
Member Author

Right, the reverse part is that queries are stored as part of a document and a document is used to query queries. I still think that percolating via a query doesn't confuse things more. I think that form an api perspective the percolator doesn't need separate APIs just because how the percolator matches with queries (MemoryIndex, pre searching etc.). The query field is a special field that can only be matched with the source of a document, which can be specified in the percolator query.

I think that the special .percolator type should go away eventually and if someone wants to store queries they would need to configure the percolator field type in the mapping.

@djschny
Copy link
Contributor

djschny commented Feb 9, 2016

I agree with @synhershko, keeping a clear distinction of _percolate really is much more elegant. Throwing everything under _search really muddies the water and the line of separation. We are placing internal concerns ahead of usability in this situation. The advantages of this are declared as:

  • Removing a lot of code
  • A lot of requested features are now supported.
  • There will never be a need to sync the search and percolate apis.

But these have almost zero advantages to end users and instead actually causes unnecessary changes, confusion, etc. Instead I would suggest exploring other ways so that the _percolate endpoint code is extremely simple and behind the scenes delegates to existing backend business logic, so that way the duplicated code, syncing, etc. is avoided.

@s1monw
Copy link
Contributor

s1monw commented Feb 9, 2016

The fact that stored queries can be filtered before execution via a query, doesn't make it much of a search. Just a filter stage before executing percolation (even if underneath a search is executed).

I think you have to take a step back and forget about the implementation here. If you think about what the percolator does is:

  • give me all the queries that match a given document

now if you think of queries are documents (JSON) it's a perfect match for a search. It's conceptually exactly the same as MLT. You pass it a document and we process the terms in a way and return documents like this. Now the percolator query can process a document in a structured way and has a potentially costly collect method. It's like a geohash for prefiltering and then use a slower method to get exact matches.

you have to plan the mind game of what would happen if we haven't had this API before. I think it's a very neat idea and might transport to users much better than a dedicated API. Anyway having a dedicated API is kinda required for BWC purposes but I think the implementation should go the path of using the search infra. We are also going that way somehow with suggest which is also a search at the end of the day.

But these have almost zero advantages to end users and instead actually causes unnecessary changes,

I think this is far from correct!

  • A lot of requested features are now supported. this is huge
  • There will never be a need to sync the search and percolate apis. has been a massive source of bugs in the past
  • Removing a lot of code this is very important growth wise for the project. calling this not an advantage to the user is short sighted.

@kimchy
Copy link
Member

kimchy commented Feb 9, 2016

I really like where this is going, using the search infrastructure to execute it is sooo much cleaner. As @s1monw said, we can keep the sugar percolator API on top of it, but on my end, I would have used it in the context of the search API, feels more natural (as much as possible, percolate is a mind bender :) )

@javanna
Copy link
Member

javanna commented Feb 9, 2016

I don't think keeping the percolate endpoint as a shortcut is a problem, and we will do it for bw comp anyways. But I think "A lot of requested features are now supported." is a big one for users, maybe we should list what these requested features are.

@martijnvg
Copy link
Member Author

I know that returning the percolator document source or part of it (via source filtering) is highly requested. Also pagination is a highly requested feature for the percolator. All these features would be supported.

Also the following issues can be closed if this PR gets merged:
#10741
#7297
#13176
#13978
#11264
#10741
#4317

@martijnvg martijnvg force-pushed the percolate_rewrite branch 3 times, most recently from 0496013 to 552da7f Compare March 8, 2016 10:56
@martijnvg
Copy link
Member Author

I've updated this PR (added tests & first stab at updating docs). I think it is ready for a review.

@@ -781,9 +781,10 @@ The reason that this has changed is that on newly created indices the percolator
and these query terms are used at percolate time to reduce the amount of queries the percolate API needs evaluate.
This optimization didn't work in the percolate API mode where modifications to queries are immediately visible.

The percolator by defaults sets the `size` option to `10` whereas before this was set to unlimited.
Percolator and multi percolate APIs have been deprecated and will be removed in the next major release. These APIs have
been replaced by the `percolator` query that can be used the search and multi search APIs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be used IN the

builder.field(Fields._SCORE, match.getScore());
}
if (match.getHighlightFields() != null) {
if (match.getHighlightFields() != null && match.getHighlightFields().isEmpty() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make getHighlightFields always return a non-null value? (using Collections.emytyXXX if necessary)

@jpountz
Copy link
Contributor

jpountz commented Mar 18, 2016

I just did a 2nd deeper review and left some comments.

@martijnvg
Copy link
Member Author

@jpountz I've updated the PR.

builder.docValues(true);
builder.indexOptions(IndexOptions.NONE);
builder.store(false);
builder.fieldType().setDocValuesType(DocValuesType.BINARY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then maybe use a BinaryFieldMapper instead of KeywordFieldMapper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@jpountz
Copy link
Contributor

jpountz commented Mar 18, 2016

I left some minor comments, otherwise LGTM. I'm looking forward to the follow-up PRs. :)

@martijnvg martijnvg force-pushed the percolate_rewrite branch 3 times, most recently from 3900ace to 84f1af9 Compare March 21, 2016 11:01
Also replaced the PercolatorQueryRegistry with the new PercolatorQueryCache.

The PercolatorFieldMapper stores the rewritten form of each percolator query's xcontext
in a binary doc values field. This make sure that the query rewrite happens only during
indexing (some queries for example fetch shapes, terms in remote indices) and
the speed up the loading of the queries in the percolator query cache.

Because the percolator now works inside the search infrastructure a number of features
(sorting fields, pagination, fetch features) are available out of the box.

The following feature requests are automatically implemented via this refactoring:

Closes elastic#10741
Closes elastic#7297
Closes elastic#13176
Closes elastic#13978
Closes elastic#11264
Closes elastic#10741
Closes elastic#4317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Percolator Reverse search: find queries that match a document v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants