Skip to content

Conversation

cbuescher
Copy link
Member

Split the parse(QueryParseContext ctx) method into a parsing and a query building part, adding Streamable for serialization and hashCode(), equals() for better testing.
Add basic unit test for Builder and Parser.

PR goes agains query-refacoring feature branch.


private float boost = -1;
float boost = 1.0f;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why package private? Other members are private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed that in new version of this PR

@cbuescher
Copy link
Member Author

Rebased this PR on top of current tip of feature branch and changed test to make use of new BaseQueryTestCase.

public Query toQuery(QueryParseContext parseContext) throws QueryParsingException, IOException {
Query query = null;
Preconditions.checkNotNull(this.fieldName, "Fieldname of a TermQuery cannot be null.");
Preconditions.checkNotNull(this.value, "Value of a TermQuery cannot be null.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about these checks... we check the same right after parsing (at some point will be moved to coord node), is that correct? and we check again here (which will stay on the data node). I wonder if we should really repeat these checks, or share them between the two methods and introduce some validation method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them here because in theory one could create query objects without parsing them. Maybe a void validate() on each query builder that also the parsers can use to check before they return a new builder from the fromXContent() method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a validation method for TermQueryBuilder and reused that in the parser. Only downside of this is where previously we had a QueryParseException we now have a ElasticsearchNullPointerException, or we need to try/catch and rethrow which IMHO defeats the purpose. Have a look and let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thing which I keep forgetting: you are right, queries can be created without going through fromXContent, java api will do just that. There are different places where we need the same validation, I think having a validate method is a good choice.

That said, one question is where do we call it from? fromXContent after parsing (on coordinating node in the future) makes perfect sense, but that is called only when the query is provided as json. Java api allows to create the intermediate query representation directly, meaning that the fromXContent step will be skipped there. What can happen in this case is 1) that the query gets serialized to other nodes without being validated, and validated on the data nodes multiple times leading to multiple errors, which is not quite what we want (we want to catch errors once and earlier), 2) that the coordinating node is the data node where the query gets executed, query might not even get serialized, we execute toQuery straight-away.

I believe the proper way to validate the query is hooking into the existing request validation mechanism (ActionRequest#validate). Every request gets already validated on the coordinating node. The SearchRequest should simply call query#validate then I think (only problem being that we cannot quite do it yet, we need to refactor the search request first as right now the whole search request is just a big json object)?

As for exceptions, if we adopt this approach I think we should be consistent with the existing validation infra and have ActionRequestValidationExceptions.

Does this make sense to you guys?

If so, we need to decide what to do for now, I think having the validate method on the base class makes sense as Lee mentioned. For now we could call it from a single place though (toQuery is safer as it covers all the cases), once we move to parsing to the coord node we need to move it to the validation api instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this, I think this is a great ide, for now we can do the validate method and figure out where to hook it in at a later time.

Also as an aside, I'd love to get rid of Preconditions.checkNotNull and just use Objects.requireNonNull since it's part of the JVM, but that's just personal preference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I add this to the QueryBuilder interface, I'll have add an empty impl to the BaseQueryBuilder for now (can be removed later once all queries have validate() method). For the details I would like to open a separate issue in which we can then have discussion about exact signature, which exceptions to be thrown, Preconditions vs. Objects.requireNonNull etc... Let me know if you agree with this plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #10777 for separate discussion on how to handle validation.

@javanna
Copy link
Member

javanna commented Apr 22, 2015

left a few comments

@cbuescher
Copy link
Member Author

I pulled out the validation of the two important fields into a method shared by builder and parser part, but not sure if this is the best way to go. Happy about comments there. Hope I adressed the rest of your comments, I would prefer tracking the use of QueryParseContext in toQuery() in a separate issue before making huge changes there. Same goes for generifying named queries (lookup but also maybe simplification)

@@ -105,10 +113,28 @@ public TermQueryBuilder(String name, boolean value) {
* @param value The value of the term
*/
public TermQueryBuilder(String name, Object value) {
this.name = name;
this.fieldName = name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change name to fieldName in the constructor too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cbuescher
Copy link
Member Author

Changed variable name in constructor and added validate() to the QueryBuilder interface. As long as this is not implemented by all queries, added empty impl to BaseQueryBuilder. Also opened two separate issues to keep track of further ideas for validation (#10777) and for generifying named queries (#10776).
I'm still not sure about how to improve the tests for toQuery() because the resulting lucene queries hide a lot of details, so assertions depend on explicit casts and knowing implementation details that might break often. I'm inclined to leave the toQuery()tests here very general and rely on other integration tests with real cluster setup / mappings etc. to check that queries work. Thoughts on this appreciated.

return query;
}

public void validate() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the @Override annotation here?

@cbuescher cbuescher force-pushed the feature/query-refactoring-termquery branch from 99690c7 to 80d3571 Compare April 26, 2015 22:39
@@ -80,27 +74,19 @@ public Query parse(QueryParseContext parseContext) throws IOException, QueryPars
}
parser.nextToken();
} else {
value = parser.text();
value = parser.objectText();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I had myself here: I had to change this for the fromXContentent() test to work. Otherwise, if the value is e.g. an int, it is written to the XContent query as a number value. With the original parser.text() it is then read in as String (which doesn't seem to matter for the later toQuery(), since there BytesRefs.toBytesRef(value) is used, which calls toString() anyway) but blows up the newly introduced equality test. I changes this to objectText() because this does some checking based in the JsonToken type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, I tend to want to revert this change since I'm not sure about it's implications. The problem then is that Object value can be of different type in original query and after fromXContent(), but in the end it only matters that toQuery() produces the same result, and there every Object is converted to lucene BytesRef. Thats why I tend to use that conversion also in the TermQueryBuilder#equals(). Will push that change to show what I mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed this and we went for always keeping BytesRef within TermQueryBuilder rather than mixing up String and BytesRef depending on whether we received the query via java api or we parsed it through json. Java api will still have a setter that accepts a string, but internally that setter will convert from String to BytesRef.

We do have to fix the above parser.text() which looks like a bug that never manifested as when parsing and executing on the same (data) node everything still works. But we are currenly parsing the value in the term query short format (term: { field: value}) always as a string.

@cbuescher cbuescher force-pushed the feature/query-refactoring-termquery branch from bcfb8e2 to f8625b4 Compare April 28, 2015 13:05
@cbuescher
Copy link
Member Author

@javanna went through the comments and current diff again, here are the things that are open from my point of view:

  • using context.setMapUnmappedFieldAsString(true) in the BaseQueryTestCase
  • using BytesRefs.toBytesRef(value) in equals() and hashCode() from my last update commit

Anything else open here that I'm missing?

@Override
public void validate() throws ElasticsearchNullPointerException {
Preconditions.checkNotNull(this.fieldName, "Fieldname of a TermQuery cannot be null.");
Preconditions.checkNotNull(this.value, "Value of a TermQuery cannot be null.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to check if these are empty too and barf in that case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing: s/Fieldname/ field name
I think in general the first line of an error should be lowercase, that's pretty much the convention we use in the existing validation code.

@@ -388,6 +388,9 @@ public void writeGenericValue(@Nullable Object value) throws IOException {
} else if (type == double[].class) {
writeByte((byte) 20);
writeDoubleArray((double[]) value);
} else if (type == BytesRef.class) {
writeByte((byte) 21);
writeBytesRef((BytesRef) value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you rebase you'll get this change that was made upstream and merged back into our branch

Christoph Büscher added 4 commits May 4, 2015 20:23
Also extended BaseQueryTestCase so it has helper methods for parsing the query header and
extended the toQuery() test method so it passes down parse context to sublass to make
assertions on side effects calling toQuery() has on the parseContext.
@cbuescher cbuescher force-pushed the feature/query-refactoring-termquery branch from 8d18c4f to 21afc61 Compare May 5, 2015 09:18
@cbuescher
Copy link
Member Author

Rebased this PR on current feature branch and changed validation exception class to own version similar to the previously suggested ActionRequestValidationException. Also opened separate issue #10974 for keeping track of adding tests for invalid json and set the TermQueryBuilder test to 20 repetitions.

@cbuescher cbuescher self-assigned this May 5, 2015

/**
* Validate the query.
* @return {@code null} if query is valid, otherwise {@link ActionRequestValidationException} containing error messages,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update exception in javadocs

@javanna
Copy link
Member

javanna commented May 5, 2015

LGTM besides the minor comments left, if you can address them that would be great, this is good to merge then!

@cbuescher cbuescher force-pushed the feature/query-refactoring-termquery branch from 5e1d936 to 11e841c Compare May 5, 2015 21:16
@cbuescher
Copy link
Member Author

Thanks for the quick response, pushed this to the feature branch.

@cbuescher
Copy link
Member Author

Thanks for the review and for closing the issue, somehow it got opened
again yesterday automatically.

On Wed, May 6, 2015 at 7:34 AM, Luca Cavanna [email protected]
wrote:

Closed #10669 #10669.


Reply to this email directly or view it on GitHub
#10669 (comment).

Christoph Büscher

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query Refactoring labels Feb 14, 2018
@cbuescher cbuescher deleted the feature/query-refactoring-termquery branch March 20, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants