Skip to content

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Feb 26, 2015

Today we have a massive infrastructure to parse all our requests. We have client side builders and server side parsers but no real representation of the query, filter, aggregation etc until it's executed. What is produced from a XContent binary is a Lucene query directly which causes huge parse methods in separate classes etc. that hare hard to test and don't allow decoupled modifications or actions on the query itself between parsing and executing.

This PR is a small prototype how things could look in the future that would allow for more flexibility and cleaner code IMO.

This refactoring splits the parsing and the creation of the lucene query, this has a couple of advantages

  • XContent parsing creation are in one file and can be tested more easily
  • the class allows a typed in-memory representation of the query that can be modified before a lucene query is build
  • the query can be normalized and serialized via Streamable to be used as a normalized cache key (not depending on the order of the keys in the XContent)
  • the query can be parsed on the coordinating node to allow document prefetching etc. forwarding to the executing nodes would work via Streamable binary representation --> Should we parse search requests on the coordinating node? #8150
  • for the query cache a query tree can be "walked" to rewrite range queries into match all queries with MIN/MAX terms to get cache hits for sliding windows --> Kibana 4 unable to utilize query cache #9526
  • code wise two classes are merged into one which is nice
  • filter and query can maybe share once class and we add a toFilter(QueryParserContenxt ctx) method that returns a filter and by default return a new QueryWrapperFilter(toQuery(context));

This refactoring splits the parsing and the creation of the lucene query
  This has a couple of advantages
   * XContent parsing creation are in one file and can be tested more easily
   * the class allows a typed in-memory representation of the query that can be modified before a lucene query is build
   * the query can be normalized and serialized via Streamable to be used as a normalized cache key (not depending on the order of the keys in the XContent)
   * the query can be parsed on the coordinating node to allow document prefetching etc. forwarding to the executing nodes would work via Streamable binary representation --> elastic#8150
   * for the query cache a query tree can be "walked" to rewrite range queries into match all queries with MIN/MAX terms to get cache hits for sliding windows --> elastic#9526
   * code wise two classes are merged into one which is nice
   * filter and query can maybe share once class and we add a `toFilter(QueryParserContenxt ctx)` method that returns a filter and by default return a `new QueryWrapperFilter(toQuery(context));`
@jpountz
Copy link
Contributor

jpountz commented Feb 26, 2015

I really like having all the logic for a given query in a single place! I suspect you will find some inconsistencies around parameters that are supported in parsers but not in builders while doing this refactoring!

filter and query can maybe share once class and we add a toFilter(QueryParserContenxt ctx) method that returns a filter and by default return a new QueryWrapperFilter(toQuery(context));

Do not spend too much time on filters. They are currently being removed from Lucene, so let's focus on getting queries right?

return query.toQuery(parseContext);
}

public void fromXContent(QueryParseContext context) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would feel much better making fromXContent and toQuery private here, otherwise I feel like it is a very "stateful" looking API, because if someone tries to use toQuery without calling fromXContent first they'll get exceptions.

Is there a reason they should be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is one of the big reasons why I did this. I want to have a stage where you can parse and then do something with the TermQuery instance and call toQuery on a later stage. ie. in the future fromXContent will be called on the coordinating node to report parsing problems only once. Then we will use streamable binary representation to transport it to the executing nodes... makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I understand why it is this way.

What I am concerned about is the different ways that a TermQuery is constructed here, there's:

new TermQuery(actualField, actualValue)
(new TermQuery()).fromXContent(context)
(new TermQuery()).parse(context) // <-- weird that this is not static

What I think would be better is maybe static methods that generate new versions for all except the plain construction version:

new TermQuery(actualField, actualValue)
TermQuery.fromXContent(context) // <-- static, returns new TermQuery
TermQuery.parse(context) // <-- static, returns new TermQuery

I dunno, maybe it's a gut feeling :), but the current implementation feels very "loose" and too flexible in what the "correct" way to create a new TermQuery, making the methods static instead of mutating the current object feels more functional (in both senses of the word!) to me.

I personally would rather have TermQuery() constructor be private, but I guess that's an entirely different discussion about builders versus non-builders...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to have fromXContent and parse be static

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guys please read the issue and my answers below It seems like I wasn't clear enough what this is going to do and static is not an option here sorry.

@dakrone
Copy link
Member

dakrone commented Feb 26, 2015

I like collapsing the two into a single class, though I'm a little worried about what we are exposing for doing the parsing (left a comment about that), but overall much cleaner!

@s1monw
Copy link
Contributor Author

s1monw commented Feb 27, 2015

@dakrone from your comment I can tell that the description of this issue is not clear enough what this is going to enable in the future lemme try to clarify:

Today a request is parsed on all the nodes causing lots of trouble. Yet in the future I think it makes sense to decouple that and once a request comes into the cooridinating node or even once it comes into the system alltogether ie via REST we parse the XContent and have the intermediate representation which is what fromXContent() does. Then if that stage was succcessful we send it further to the nodes executing the request as a binary representation via streamable. (coordinating node calls #writeTo()) On the target nodes we then use #readFrom() to gain the intermediate represenation back and call toQuery in order to get the query.

Today we don't do this so I just tried to model the current arch with the refactoring prototype hence the method:

    @Override
+    public Query parse(QueryParseContext parseContext) throws IOException, QueryParsingException {
+        TermQuery query = new TermQuery();
+        query.fromXContent(parseContext);
+        return query.toQuery(parseContext);
+    }

makes sense now?

@rjernst
Copy link
Member

rjernst commented Feb 27, 2015

Ok I think I understand, makes sense to me.

+1

@s1monw
Copy link
Contributor Author

s1monw commented Mar 2, 2015

I think a common source of confusion is that currently those methods are not on the interface all queries need to implement. In the future they will be so they can't be static.

@s1monw
Copy link
Contributor Author

s1monw commented Mar 3, 2015

I think we have some agreement that this refactoring can be beneficial. I'd like use to start working on it very soon maybe we can create a branch for it soon. @cbuescher do you think we can start this soon?

@cbuescher
Copy link
Member

@s1monw sure, will have to look at how long it takes me to do the same thing to another query on my own tomorrow. Would be great if the whole refactoring is structurally the same for all queries, since there are ~ 90 of them alone in .../index/query.

* Produces a lucene query from this elasticsearch query
*/
public Query toQuery(QueryParseContext parseContext) {
if (value == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also check if fieldName is null and fail? maybe use Preconditions here for simplicity?

@cbuescher
Copy link
Member

I talked with @s1monw and we came up with this first rough sketch of how to do procede with the refactoring of the queries in the org.elasticsearch.index.query package. I'll start in small incremental steps, not including the filters at the moment.

This is the rough plan of how to go step by step here:

  • move all the *Parser code to the corresponding *Builder, make all Builders implement QueryParser
  • split existing parse() method according to this prototype into Query toQuery(), fromXContent() and still keep the exisiting Query parse() method
  • write tests using each querys doXContent -> fromXContent methods
  • make queries implement Streamable, write serialization and tests

I started by creating the feature branch https://github.com/elastic/elasticsearch/tree/feature/query-parse-refactoring and already stated to merged some Builder/Parser pairs there.

@javanna
Copy link
Member

javanna commented Mar 31, 2015

I think we can close this PR, we are now working on the https://github.com/elastic/elasticsearch/tree/feature/query-parse-refactoring branch and opening PRs against it.

@javanna javanna closed this Mar 31, 2015
cbuescher pushed a commit that referenced this pull request Apr 17, 2015
…lders and QueryParsers

The planed refactoring of search queries layed out in #9901 requires to split the "parse()"
method in QueryParsers into two methods, first a "fromXContent(...)" method that allows parsing
to an intermediate query representation (currently called FooQueryBuilder) and second a
"Query toQuery(...)" method on these intermediate representations that create the actual lucene queries.

This PR is a first step in that direction as it introduces the interface changes necessary for the further
refactoring. It introduces the new interface methods while for now keeping the old Builder/Parsers still
in place by delegating the new "toQuery()" implementations to the existing "parse()" methods, and by
introducing a "catch-all" "fromXContent()" implementation in a BaseQueryParser that returns a temporary
QueryBuilder wrapper implementation. This allows us to refactor the existing QueryBuilders step by step
while already beeing able to start refactoring queries with nested inner queries.

Closes #10580
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
…lders and QueryParsers

The planed refactoring of search queries layed out in elastic#9901 requires to split the "parse()"
method in QueryParsers into two methods, first a "fromXContent(...)" method that allows parsing
to an intermediate query representation (currently called FooQueryBuilder) and second a
"Query toQuery(...)" method on these intermediate representations that create the actual lucene queries.

This PR is a first step in that direction as it introduces the interface changes necessary for the further
refactoring. It introduces the new interface methods while for now keeping the old Builder/Parsers still
in place by delegating the new "toQuery()" implementations to the existing "parse()" methods, and by
introducing a "catch-all" "fromXContent()" implementation in a BaseQueryParser that returns a temporary
QueryBuilder wrapper implementation. This allows us to refactor the existing QueryBuilders step by step
while already beeing able to start refactoring queries with nested inner queries.

Closes elastic#10580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants