-
Notifications
You must be signed in to change notification settings - Fork 25.4k
[PROTOTYPE] Refactor how we parse queries, filter and friends #9901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This refactoring splits the parsing and the creation of the lucene query This has a couple of advantages * XContent parsing creation are in one file and can be tested more easily * the class allows a typed in-memory representation of the query that can be modified before a lucene query is build * the query can be normalized and serialized via Streamable to be used as a normalized cache key (not depending on the order of the keys in the XContent) * the query can be parsed on the coordinating node to allow document prefetching etc. forwarding to the executing nodes would work via Streamable binary representation --> elastic#8150 * for the query cache a query tree can be "walked" to rewrite range queries into match all queries with MIN/MAX terms to get cache hits for sliding windows --> elastic#9526 * code wise two classes are merged into one which is nice * filter and query can maybe share once class and we add a `toFilter(QueryParserContenxt ctx)` method that returns a filter and by default return a `new QueryWrapperFilter(toQuery(context));`
I really like having all the logic for a given query in a single place! I suspect you will find some inconsistencies around parameters that are supported in parsers but not in builders while doing this refactoring!
Do not spend too much time on filters. They are currently being removed from Lucene, so let's focus on getting queries right? |
return query.toQuery(parseContext); | ||
} | ||
|
||
public void fromXContent(QueryParseContext context) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would feel much better making fromXContent
and toQuery
private here, otherwise I feel like it is a very "stateful" looking API, because if someone tries to use toQuery
without calling fromXContent
first they'll get exceptions.
Is there a reason they should be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is one of the big reasons why I did this. I want to have a stage where you can parse and then do something with the TermQuery instance and call toQuery on a later stage. ie. in the future fromXContent
will be called on the coordinating node to report parsing problems only once. Then we will use streamable binary representation to transport it to the executing nodes... makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I understand why it is this way.
What I am concerned about is the different ways that a TermQuery
is constructed here, there's:
new TermQuery(actualField, actualValue)
(new TermQuery()).fromXContent(context)
(new TermQuery()).parse(context) // <-- weird that this is not static
What I think would be better is maybe static methods that generate new versions for all except the plain construction version:
new TermQuery(actualField, actualValue)
TermQuery.fromXContent(context) // <-- static, returns new TermQuery
TermQuery.parse(context) // <-- static, returns new TermQuery
I dunno, maybe it's a gut feeling :), but the current implementation feels very "loose" and too flexible in what the "correct" way to create a new TermQuery, making the methods static instead of mutating the current object feels more functional (in both senses of the word!) to me.
I personally would rather have TermQuery()
constructor be private, but I guess that's an entirely different discussion about builders versus non-builders...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to have fromXContent
and parse
be static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guys please read the issue and my answers below It seems like I wasn't clear enough what this is going to do and static is not an option here sorry.
I like collapsing the two into a single class, though I'm a little worried about what we are exposing for doing the parsing (left a comment about that), but overall much cleaner! |
@dakrone from your comment I can tell that the description of this issue is not clear enough what this is going to enable in the future lemme try to clarify: Today a request is parsed on all the nodes causing lots of trouble. Yet in the future I think it makes sense to decouple that and once a request comes into the cooridinating node or even once it comes into the system alltogether ie via REST we parse the XContent and have the Today we don't do this so I just tried to model the current arch with the refactoring prototype hence the method:
makes sense now? |
Ok I think I understand, makes sense to me. +1 |
I think a common source of confusion is that currently those methods are not on the interface all queries need to implement. In the future they will be so they can't be static. |
I think we have some agreement that this refactoring can be beneficial. I'd like use to start working on it very soon maybe we can create a branch for it soon. @cbuescher do you think we can start this soon? |
@s1monw sure, will have to look at how long it takes me to do the same thing to another query on my own tomorrow. Would be great if the whole refactoring is structurally the same for all queries, since there are ~ 90 of them alone in .../index/query. |
* Produces a lucene query from this elasticsearch query | ||
*/ | ||
public Query toQuery(QueryParseContext parseContext) { | ||
if (value == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also check if fieldName
is null and fail? maybe use Preconditions here for simplicity?
I talked with @s1monw and we came up with this first rough sketch of how to do procede with the refactoring of the queries in the This is the rough plan of how to go step by step here:
I started by creating the feature branch https://github.com/elastic/elasticsearch/tree/feature/query-parse-refactoring and already stated to merged some Builder/Parser pairs there. |
I think we can close this PR, we are now working on the https://github.com/elastic/elasticsearch/tree/feature/query-parse-refactoring branch and opening PRs against it. |
…lders and QueryParsers The planed refactoring of search queries layed out in #9901 requires to split the "parse()" method in QueryParsers into two methods, first a "fromXContent(...)" method that allows parsing to an intermediate query representation (currently called FooQueryBuilder) and second a "Query toQuery(...)" method on these intermediate representations that create the actual lucene queries. This PR is a first step in that direction as it introduces the interface changes necessary for the further refactoring. It introduces the new interface methods while for now keeping the old Builder/Parsers still in place by delegating the new "toQuery()" implementations to the existing "parse()" methods, and by introducing a "catch-all" "fromXContent()" implementation in a BaseQueryParser that returns a temporary QueryBuilder wrapper implementation. This allows us to refactor the existing QueryBuilders step by step while already beeing able to start refactoring queries with nested inner queries. Closes #10580
…lders and QueryParsers The planed refactoring of search queries layed out in elastic#9901 requires to split the "parse()" method in QueryParsers into two methods, first a "fromXContent(...)" method that allows parsing to an intermediate query representation (currently called FooQueryBuilder) and second a "Query toQuery(...)" method on these intermediate representations that create the actual lucene queries. This PR is a first step in that direction as it introduces the interface changes necessary for the further refactoring. It introduces the new interface methods while for now keeping the old Builder/Parsers still in place by delegating the new "toQuery()" implementations to the existing "parse()" methods, and by introducing a "catch-all" "fromXContent()" implementation in a BaseQueryParser that returns a temporary QueryBuilder wrapper implementation. This allows us to refactor the existing QueryBuilders step by step while already beeing able to start refactoring queries with nested inner queries. Closes elastic#10580
Today we have a massive infrastructure to parse all our requests. We have client side builders and server side parsers but no real representation of the query, filter, aggregation etc until it's executed. What is produced from a XContent binary is a Lucene query directly which causes huge parse methods in separate classes etc. that hare hard to test and don't allow decoupled modifications or actions on the query itself between parsing and executing.
This PR is a small prototype how things could look in the future that would allow for more flexibility and cleaner code IMO.
This refactoring splits the parsing and the creation of the lucene query, this has a couple of advantages
toFilter(QueryParserContenxt ctx)
method that returns a filter and by default return anew QueryWrapperFilter(toQuery(context));