Query refactoring: TermQueryBuilder refactoring and test #10669

cbuescher · 2015-04-20T08:17:25Z

Split the parse(QueryParseContext ctx) method into a parsing and a query building part, adding Streamable for serialization and hashCode(), equals() for better testing.
Add basic unit test for Builder and Parser.

PR goes agains query-refacoring feature branch.

javanna · 2015-04-20T09:56:36Z

src/main/java/org/elasticsearch/index/query/TermQueryBuilder.java


-    private float boost = -1;
+    float boost = 1.0f;


why package private? Other members are private?

changed that in new version of this PR

cbuescher · 2015-04-22T16:35:33Z

Rebased this PR on top of current tip of feature branch and changed test to make use of new BaseQueryTestCase.

javanna · 2015-04-22T23:05:14Z

src/main/java/org/elasticsearch/index/query/TermQueryBuilder.java

+    public Query toQuery(QueryParseContext parseContext) throws QueryParsingException, IOException {
+        Query query = null;
+        Preconditions.checkNotNull(this.fieldName, "Fieldname of a TermQuery cannot be null.");
+        Preconditions.checkNotNull(this.value, "Value of a TermQuery cannot be null.");


about these checks... we check the same right after parsing (at some point will be moved to coord node), is that correct? and we check again here (which will stay on the data node). I wonder if we should really repeat these checks, or share them between the two methods and introduce some validation method.

I added them here because in theory one could create query objects without parsing them. Maybe a void validate() on each query builder that also the parsers can use to check before they return a new builder from the fromXContent() method?

I added a validation method for TermQueryBuilder and reused that in the parser. Only downside of this is where previously we had a QueryParseException we now have a ElasticsearchNullPointerException, or we need to try/catch and rethrow which IMHO defeats the purpose. Have a look and let me know what you think.

First thing which I keep forgetting: you are right, queries can be created without going through fromXContent, java api will do just that. There are different places where we need the same validation, I think having a validate method is a good choice.

That said, one question is where do we call it from? fromXContent after parsing (on coordinating node in the future) makes perfect sense, but that is called only when the query is provided as json. Java api allows to create the intermediate query representation directly, meaning that the fromXContent step will be skipped there. What can happen in this case is 1) that the query gets serialized to other nodes without being validated, and validated on the data nodes multiple times leading to multiple errors, which is not quite what we want (we want to catch errors once and earlier), 2) that the coordinating node is the data node where the query gets executed, query might not even get serialized, we execute toQuery straight-away.

I believe the proper way to validate the query is hooking into the existing request validation mechanism (ActionRequest#validate). Every request gets already validated on the coordinating node. The SearchRequest should simply call query#validate then I think (only problem being that we cannot quite do it yet, we need to refactor the search request first as right now the whole search request is just a big json object)?

As for exceptions, if we adopt this approach I think we should be consistent with the existing validation infra and have ActionRequestValidationExceptions.

Does this make sense to you guys?

If so, we need to decide what to do for now, I think having the validate method on the base class makes sense as Lee mentioned. For now we could call it from a single place though (toQuery is safer as it covers all the cases), once we move to parsing to the coord node we need to move it to the validation api instead.

+1 to this, I think this is a great ide, for now we can do the validate method and figure out where to hook it in at a later time.

Also as an aside, I'd love to get rid of Preconditions.checkNotNull and just use Objects.requireNonNull since it's part of the JVM, but that's just personal preference.

When I add this to the QueryBuilder interface, I'll have add an empty impl to the BaseQueryBuilder for now (can be removed later once all queries have validate() method). For the details I would like to open a separate issue in which we can then have discussion about exact signature, which exceptions to be thrown, Preconditions vs. Objects.requireNonNull etc... Let me know if you agree with this plan.

Opened #10777 for separate discussion on how to handle validation.

javanna · 2015-04-22T23:23:13Z

left a few comments

cbuescher · 2015-04-23T13:19:16Z

I pulled out the validation of the two important fields into a method shared by builder and parser part, but not sure if this is the best way to go. Happy about comments there. Hope I adressed the rest of your comments, I would prefer tracking the use of QueryParseContext in toQuery() in a separate issue before making huge changes there. Same goes for generifying named queries (lookup but also maybe simplification)

dakrone · 2015-04-23T18:31:45Z

src/main/java/org/elasticsearch/index/query/TermQueryBuilder.java

@@ -105,10 +113,28 @@ public TermQueryBuilder(String name, boolean value) {
     * @param value The value of the term
     */
    public TermQueryBuilder(String name, Object value) {
-        this.name = name;
+        this.fieldName = name;


Can you change name to fieldName in the constructor too?

cbuescher · 2015-04-24T12:25:52Z

Changed variable name in constructor and added validate() to the QueryBuilder interface. As long as this is not implemented by all queries, added empty impl to BaseQueryBuilder. Also opened two separate issues to keep track of further ideas for validation (#10777) and for generifying named queries (#10776).
I'm still not sure about how to improve the tests for toQuery() because the resulting lucene queries hide a lot of details, so assertions depend on explicit casts and knowing implementation details that might break often. I'm inclined to leave the toQuery()tests here very general and rely on other integration tests with real cluster setup / mappings etc. to check that queries work. Thoughts on this appreciated.

dakrone · 2015-04-24T19:47:15Z

src/main/java/org/elasticsearch/index/query/TermQueryBuilder.java

+        return query;
+    }
+
+    public void validate() {


Maybe add the @Override annotation here?

cbuescher · 2015-04-27T09:31:08Z

src/main/java/org/elasticsearch/index/query/TermQueryParser.java

@@ -80,27 +74,19 @@ public Query parse(QueryParseContext parseContext) throws IOException, QueryPars
            }
            parser.nextToken();
        } else {
-            value = parser.text();
+            value = parser.objectText();


One question I had myself here: I had to change this for the fromXContentent() test to work. Otherwise, if the value is e.g. an int, it is written to the XContent query as a number value. With the original parser.text() it is then read in as String (which doesn't seem to matter for the later toQuery(), since there BytesRefs.toBytesRef(value) is used, which calls toString() anyway) but blows up the newly introduced equality test. I changes this to objectText() because this does some checking based in the JsonToken type.

Thinking about it, I tend to want to revert this change since I'm not sure about it's implications. The problem then is that Object value can be of different type in original query and after fromXContent(), but in the end it only matters that toQuery() produces the same result, and there every Object is converted to lucene BytesRef. Thats why I tend to use that conversion also in the TermQueryBuilder#equals(). Will push that change to show what I mean.

we discussed this and we went for always keeping BytesRef within TermQueryBuilder rather than mixing up String and BytesRef depending on whether we received the query via java api or we parsed it through json. Java api will still have a setter that accepts a string, but internally that setter will convert from String to BytesRef.

We do have to fix the above parser.text() which looks like a bug that never manifested as when parsing and executing on the same (data) node everything still works. But we are currenly parsing the value in the term query short format (term: { field: value}) always as a string.

cbuescher · 2015-04-28T14:17:37Z

@javanna went through the comments and current diff again, here are the things that are open from my point of view:

using context.setMapUnmappedFieldAsString(true) in the BaseQueryTestCase
using BytesRefs.toBytesRef(value) in equals() and hashCode() from my last update commit

Anything else open here that I'm missing?

javanna · 2015-04-28T14:40:44Z

src/main/java/org/elasticsearch/index/query/TermQueryBuilder.java

+    @Override
+    public void validate() throws ElasticsearchNullPointerException {
+        Preconditions.checkNotNull(this.fieldName, "Fieldname of a TermQuery cannot be null.");
+        Preconditions.checkNotNull(this.value, "Value of a TermQuery cannot be null.");


I think we have to check if these are empty too and barf in that case?

one more thing: s/Fieldname/ field name
I think in general the first line of an error should be lowercase, that's pretty much the convention we use in the existing validation code.

javanna · 2015-04-30T09:34:19Z

src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java

@@ -388,6 +388,9 @@ public void writeGenericValue(@Nullable Object value) throws IOException {
        } else if (type == double[].class) {
            writeByte((byte) 20);
            writeDoubleArray((double[]) value);
+        } else if (type == BytesRef.class) {
+            writeByte((byte) 21);
+            writeBytesRef((BytesRef) value);


if you rebase you'll get this change that was made upstream and merged back into our branch

Also extended BaseQueryTestCase so it has helper methods for parsing the query header and extended the toQuery() test method so it passes down parse context to sublass to make assertions on side effects calling toQuery() has on the parseContext.

cbuescher · 2015-05-05T09:22:14Z

Rebased this PR on current feature branch and changed validation exception class to own version similar to the previously suggested ActionRequestValidationException. Also opened separate issue #10974 for keeping track of adding tests for invalid json and set the TermQueryBuilder test to 20 repetitions.

javanna · 2015-05-05T14:47:56Z

src/main/java/org/elasticsearch/index/query/QueryBuilder.java

+
+    /**
+     * Validate the query.
+     * @return {@code null} if query is valid, otherwise {@link ActionRequestValidationException} containing error messages,


need to update exception in javadocs

javanna · 2015-05-05T14:57:48Z

LGTM besides the minor comments left, if you can address them that would be great, this is good to merge then!

cbuescher · 2015-05-05T21:22:19Z

Thanks for the quick response, pushed this to the feature branch.

cbuescher · 2015-05-06T08:16:02Z

Thanks for the review and for closing the issue, somehow it got opened
again yesterday automatically.

On Wed, May 6, 2015 at 7:34 AM, Luca Cavanna [email protected]
wrote:

Closed #10669 #10669.

—
Reply to this email directly or view it on GitHub
#10669 (comment).

Christoph Büscher

kevinkluge added the in progress label Apr 20, 2015

cbuescher added review labels Apr 20, 2015

javanna reviewed Apr 20, 2015
View reviewed changes

javanna mentioned this pull request Apr 20, 2015

Refactor MatchAllQueryBuilder, TermQueryBuilder, IdsQueryBuilder #10454

Closed

cbuescher force-pushed the feature/query-refactoring-termquery branch from cfc2069 to 8f4218e Compare April 22, 2015 16:31

javanna reviewed Apr 22, 2015
View reviewed changes

dakrone reviewed Apr 23, 2015
View reviewed changes

This was referenced Apr 24, 2015

Investigate more generic handling of queryName field in QueryBuilders #10776

Closed

Add validation method to QueryBuilders #10777

Closed

dakrone reviewed Apr 24, 2015
View reviewed changes

cbuescher force-pushed the feature/query-refactoring-termquery branch from 99690c7 to 80d3571 Compare April 26, 2015 22:39

cbuescher reviewed Apr 27, 2015
View reviewed changes

cbuescher force-pushed the feature/query-refactoring-termquery branch from bcfb8e2 to f8625b4 Compare April 28, 2015 13:05

javanna reviewed Apr 28, 2015
View reviewed changes

javanna reviewed Apr 30, 2015
View reviewed changes

Christoph Büscher added 4 commits May 4, 2015 20:23

Add //norelease to validation method in BaseQueryBuilder

9eaf56c

Added QueryValidationException to have own exception type for queries

04c4c8b

Added 20 iteration repeat to test

21afc61

cbuescher force-pushed the feature/query-refactoring-termquery branch from 8d18c4f to 21afc61 Compare May 5, 2015 09:18

cbuescher mentioned this pull request May 5, 2015

Query refactoring: IdsQuery #10670

Closed

cbuescher self-assigned this May 5, 2015

javanna reviewed May 5, 2015
View reviewed changes

Minor changes adressing last comments

11e841c

cbuescher force-pushed the feature/query-refactoring-termquery branch from 5e1d936 to 11e841c Compare May 5, 2015 21:16

cbuescher closed this May 5, 2015

kevinkluge reopened this May 5, 2015

kevinkluge removed in progress labels May 5, 2015

javanna closed this May 6, 2015

clintongormley mentioned this pull request Sep 8, 2015

Refactor parsing of queries/filters, aggs, suggester APIs #10217

Closed

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query Refactoring labels Feb 14, 2018

cbuescher deleted the feature/query-refactoring-termquery branch March 20, 2024 20:15

Query refactoring: TermQueryBuilder refactoring and test #10669

Query refactoring: TermQueryBuilder refactoring and test #10669

Uh oh!

Conversation

cbuescher commented Apr 20, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher commented Apr 22, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javanna commented Apr 22, 2015

Uh oh!

cbuescher commented Apr 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher commented Apr 24, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher commented Apr 28, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher commented May 5, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javanna commented May 5, 2015

Uh oh!

cbuescher commented May 5, 2015

Uh oh!

cbuescher commented May 6, 2015

Uh oh!

Uh oh!