[PROTOTYPE] Refactor how we parse queries, filter and friends #9901

s1monw · 2015-02-26T14:39:32Z

Today we have a massive infrastructure to parse all our requests. We have client side builders and server side parsers but no real representation of the query, filter, aggregation etc until it's executed. What is produced from a XContent binary is a Lucene query directly which causes huge parse methods in separate classes etc. that hare hard to test and don't allow decoupled modifications or actions on the query itself between parsing and executing.

This PR is a small prototype how things could look in the future that would allow for more flexibility and cleaner code IMO.

This refactoring splits the parsing and the creation of the lucene query, this has a couple of advantages

XContent parsing creation are in one file and can be tested more easily
the class allows a typed in-memory representation of the query that can be modified before a lucene query is build
the query can be normalized and serialized via Streamable to be used as a normalized cache key (not depending on the order of the keys in the XContent)
the query can be parsed on the coordinating node to allow document prefetching etc. forwarding to the executing nodes would work via Streamable binary representation --> Should we parse search requests on the coordinating node? #8150
for the query cache a query tree can be "walked" to rewrite range queries into match all queries with MIN/MAX terms to get cache hits for sliding windows --> Kibana 4 unable to utilize query cache #9526
code wise two classes are merged into one which is nice
filter and query can maybe share once class and we add a toFilter(QueryParserContenxt ctx) method that returns a filter and by default return a new QueryWrapperFilter(toQuery(context));

This refactoring splits the parsing and the creation of the lucene query This has a couple of advantages * XContent parsing creation are in one file and can be tested more easily * the class allows a typed in-memory representation of the query that can be modified before a lucene query is build * the query can be normalized and serialized via Streamable to be used as a normalized cache key (not depending on the order of the keys in the XContent) * the query can be parsed on the coordinating node to allow document prefetching etc. forwarding to the executing nodes would work via Streamable binary representation --> elastic#8150 * for the query cache a query tree can be "walked" to rewrite range queries into match all queries with MIN/MAX terms to get cache hits for sliding windows --> elastic#9526 * code wise two classes are merged into one which is nice * filter and query can maybe share once class and we add a `toFilter(QueryParserContenxt ctx)` method that returns a filter and by default return a `new QueryWrapperFilter(toQuery(context));`

jpountz · 2015-02-26T14:46:47Z

I really like having all the logic for a given query in a single place! I suspect you will find some inconsistencies around parameters that are supported in parsers but not in builders while doing this refactoring!

filter and query can maybe share once class and we add a toFilter(QueryParserContenxt ctx) method that returns a filter and by default return a new QueryWrapperFilter(toQuery(context));

Do not spend too much time on filters. They are currently being removed from Lucene, so let's focus on getting queries right?

dakrone · 2015-02-26T16:15:44Z

src/main/java/org/elasticsearch/index/query/TermQuery.java

+        return query.toQuery(parseContext);
+    }
+
+    public void fromXContent(QueryParseContext context) throws IOException {


I would feel much better making fromXContent and toQuery private here, otherwise I feel like it is a very "stateful" looking API, because if someone tries to use toQuery without calling fromXContent first they'll get exceptions.

Is there a reason they should be public?

yes that is one of the big reasons why I did this. I want to have a stage where you can parse and then do something with the TermQuery instance and call toQuery on a later stage. ie. in the future fromXContent will be called on the coordinating node to report parsing problems only once. Then we will use streamable binary representation to transport it to the executing nodes... makes sense?

Okay, I think I understand why it is this way.

What I am concerned about is the different ways that a TermQuery is constructed here, there's:

new TermQuery(actualField, actualValue) (new TermQuery()).fromXContent(context) (new TermQuery()).parse(context) // <-- weird that this is not static

What I think would be better is maybe static methods that generate new versions for all except the plain construction version:

new TermQuery(actualField, actualValue) TermQuery.fromXContent(context) // <-- static, returns new TermQuery TermQuery.parse(context) // <-- static, returns new TermQuery

I dunno, maybe it's a gut feeling :), but the current implementation feels very "loose" and too flexible in what the "correct" way to create a new TermQuery, making the methods static instead of mutating the current object feels more functional (in both senses of the word!) to me.

I personally would rather have TermQuery() constructor be private, but I guess that's an entirely different discussion about builders versus non-builders...

+1 to have fromXContent and parse be static

guys please read the issue and my answers below It seems like I wasn't clear enough what this is going to do and static is not an option here sorry.

dakrone · 2015-02-26T16:17:05Z

I like collapsing the two into a single class, though I'm a little worried about what we are exposing for doing the parsing (left a comment about that), but overall much cleaner!

s1monw · 2015-02-27T10:18:36Z

@dakrone from your comment I can tell that the description of this issue is not clear enough what this is going to enable in the future lemme try to clarify:

Today a request is parsed on all the nodes causing lots of trouble. Yet in the future I think it makes sense to decouple that and once a request comes into the cooridinating node or even once it comes into the system alltogether ie via REST we parse the XContent and have the intermediate representation which is what fromXContent() does. Then if that stage was succcessful we send it further to the nodes executing the request as a binary representation via streamable. (coordinating node calls #writeTo()) On the target nodes we then use #readFrom() to gain the intermediate represenation back and call toQuery in order to get the query.

Today we don't do this so I just tried to model the current arch with the refactoring prototype hence the method:

    @Override
+    public Query parse(QueryParseContext parseContext) throws IOException, QueryParsingException {
+        TermQuery query = new TermQuery();
+        query.fromXContent(parseContext);
+        return query.toQuery(parseContext);
+    }

makes sense now?

rjernst · 2015-02-27T18:48:03Z

Ok I think I understand, makes sense to me.

+1

s1monw · 2015-03-02T08:54:26Z

I think a common source of confusion is that currently those methods are not on the interface all queries need to implement. In the future they will be so they can't be static.

s1monw · 2015-03-03T16:10:21Z

I think we have some agreement that this refactoring can be beneficial. I'd like use to start working on it very soon maybe we can create a branch for it soon. @cbuescher do you think we can start this soon?

cbuescher · 2015-03-03T17:31:23Z

@s1monw sure, will have to look at how long it takes me to do the same thing to another query on my own tomorrow. Would be great if the whole refactoring is structurally the same for all queries, since there are ~ 90 of them alone in .../index/query.

kimchy · 2015-03-03T18:52:42Z

src/main/java/org/elasticsearch/index/query/TermQuery.java

+     * Produces a lucene query from this elasticsearch query
+     */
+    public Query toQuery(QueryParseContext parseContext) {
+        if (value == null) {


I think we should also check if fieldName is null and fail? maybe use Preconditions here for simplicity?

cbuescher · 2015-03-20T22:10:21Z

I talked with @s1monw and we came up with this first rough sketch of how to do procede with the refactoring of the queries in the org.elasticsearch.index.query package. I'll start in small incremental steps, not including the filters at the moment.

This is the rough plan of how to go step by step here:

move all the *Parser code to the corresponding *Builder, make all Builders implement QueryParser
split existing parse() method according to this prototype into Query toQuery(), fromXContent() and still keep the exisiting Query parse() method
write tests using each querys doXContent -> fromXContent methods
make queries implement Streamable, write serialization and tests

I started by creating the feature branch https://github.com/elastic/elasticsearch/tree/feature/query-parse-refactoring and already stated to merged some Builder/Parser pairs there.

javanna · 2015-03-31T16:41:40Z

I think we can close this PR, we are now working on the https://github.com/elastic/elasticsearch/tree/feature/query-parse-refactoring branch and opening PRs against it.

…lders and QueryParsers The planed refactoring of search queries layed out in #9901 requires to split the "parse()" method in QueryParsers into two methods, first a "fromXContent(...)" method that allows parsing to an intermediate query representation (currently called FooQueryBuilder) and second a "Query toQuery(...)" method on these intermediate representations that create the actual lucene queries. This PR is a first step in that direction as it introduces the interface changes necessary for the further refactoring. It introduces the new interface methods while for now keeping the old Builder/Parsers still in place by delegating the new "toQuery()" implementations to the existing "parse()" methods, and by introducing a "catch-all" "fromXContent()" implementation in a BaseQueryParser that returns a temporary QueryBuilder wrapper implementation. This allows us to refactor the existing QueryBuilders step by step while already beeing able to start refactoring queries with nested inner queries. Closes #10580

…lders and QueryParsers The planed refactoring of search queries layed out in elastic#9901 requires to split the "parse()" method in QueryParsers into two methods, first a "fromXContent(...)" method that allows parsing to an intermediate query representation (currently called FooQueryBuilder) and second a "Query toQuery(...)" method on these intermediate representations that create the actual lucene queries. This PR is a first step in that direction as it introduces the interface changes necessary for the further refactoring. It introduces the new interface methods while for now keeping the old Builder/Parsers still in place by delegating the new "toQuery()" implementations to the existing "parse()" methods, and by introducing a "catch-all" "fromXContent()" implementation in a BaseQueryParser that returns a temporary QueryBuilder wrapper implementation. This allows us to refactor the existing QueryBuilders step by step while already beeing able to start refactoring queries with nested inner queries. Closes elastic#10580

dakrone reviewed Feb 26, 2015
View reviewed changes

kimchy reviewed Mar 3, 2015
View reviewed changes

This was referenced Mar 3, 2015

Roadmap for 2.0 #9970

Closed

Replace hand parsing of requests/queries/mappings/etc with a grammar #8965

Closed

s1monw assigned cbuescher Mar 20, 2015

clintongormley mentioned this pull request Mar 23, 2015

Refactor parsing of queries/filters, aggs, suggester APIs #10217

Closed

dakrone mentioned this pull request Mar 30, 2015

Query Refactoring: Merging Parser and Builder classes #10324

Merged

javanna closed this Mar 31, 2015

This was referenced Apr 7, 2015

Refactor MatchAllQueryBuilder, TermQueryBuilder, IdsQueryBuilder #10454

Closed

Query refactoring: Introduce toQuery() and fromXContent() methods in QueryBuilders and QueryParsers #10580

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PROTOTYPE] Refactor how we parse queries, filter and friends #9901

[PROTOTYPE] Refactor how we parse queries, filter and friends #9901

Uh oh!

s1monw commented Feb 26, 2015

Uh oh!

jpountz commented Feb 26, 2015

Uh oh!

dakrone Feb 26, 2015

Uh oh!

s1monw Feb 26, 2015

Uh oh!

dakrone Feb 26, 2015

Uh oh!

rjernst Feb 27, 2015

Uh oh!

s1monw Feb 27, 2015

Uh oh!

dakrone commented Feb 26, 2015

Uh oh!

s1monw commented Feb 27, 2015

Uh oh!

rjernst commented Feb 27, 2015

Uh oh!

s1monw commented Mar 2, 2015

Uh oh!

s1monw commented Mar 3, 2015

Uh oh!

cbuescher commented Mar 3, 2015

Uh oh!

kimchy Mar 3, 2015

Uh oh!

cbuescher commented Mar 20, 2015

Uh oh!

javanna commented Mar 31, 2015

Uh oh!

Uh oh!

[PROTOTYPE] Refactor how we parse queries, filter and friends #9901

[PROTOTYPE] Refactor how we parse queries, filter and friends #9901

Uh oh!

Conversation

s1monw commented Feb 26, 2015

Uh oh!

jpountz commented Feb 26, 2015

Uh oh!

dakrone Feb 26, 2015

Choose a reason for hiding this comment

Uh oh!

s1monw Feb 26, 2015

Choose a reason for hiding this comment

Uh oh!

dakrone Feb 26, 2015

Choose a reason for hiding this comment

Uh oh!

rjernst Feb 27, 2015

Choose a reason for hiding this comment

Uh oh!

s1monw Feb 27, 2015

Choose a reason for hiding this comment

Uh oh!

dakrone commented Feb 26, 2015

Uh oh!

s1monw commented Feb 27, 2015

Uh oh!

rjernst commented Feb 27, 2015

Uh oh!

s1monw commented Mar 2, 2015

Uh oh!

s1monw commented Mar 3, 2015

Uh oh!

cbuescher commented Mar 3, 2015

Uh oh!

kimchy Mar 3, 2015

Choose a reason for hiding this comment

Uh oh!

cbuescher commented Mar 20, 2015

Uh oh!

javanna commented Mar 31, 2015

Uh oh!

Uh oh!