-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Query refactoring: TermQueryBuilder refactoring and test #10669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query refactoring: TermQueryBuilder refactoring and test #10669
Conversation
|
||
private float boost = -1; | ||
float boost = 1.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why package private? Other members are private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed that in new version of this PR
cfc2069
to
8f4218e
Compare
Rebased this PR on top of current tip of feature branch and changed test to make use of new BaseQueryTestCase. |
public Query toQuery(QueryParseContext parseContext) throws QueryParsingException, IOException { | ||
Query query = null; | ||
Preconditions.checkNotNull(this.fieldName, "Fieldname of a TermQuery cannot be null."); | ||
Preconditions.checkNotNull(this.value, "Value of a TermQuery cannot be null."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about these checks... we check the same right after parsing (at some point will be moved to coord node), is that correct? and we check again here (which will stay on the data node). I wonder if we should really repeat these checks, or share them between the two methods and introduce some validation method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added them here because in theory one could create query objects without parsing them. Maybe a void validate()
on each query builder that also the parsers can use to check before they return a new builder from the fromXContent() method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a validation method for TermQueryBuilder and reused that in the parser. Only downside of this is where previously we had a QueryParseException we now have a ElasticsearchNullPointerException, or we need to try/catch and rethrow which IMHO defeats the purpose. Have a look and let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First thing which I keep forgetting: you are right, queries can be created without going through fromXContent
, java api will do just that. There are different places where we need the same validation, I think having a validate method is a good choice.
That said, one question is where do we call it from? fromXContent
after parsing (on coordinating node in the future) makes perfect sense, but that is called only when the query is provided as json. Java api allows to create the intermediate query representation directly, meaning that the fromXContent
step will be skipped there. What can happen in this case is 1) that the query gets serialized to other nodes without being validated, and validated on the data nodes multiple times leading to multiple errors, which is not quite what we want (we want to catch errors once and earlier), 2) that the coordinating node is the data node where the query gets executed, query might not even get serialized, we execute toQuery
straight-away.
I believe the proper way to validate the query is hooking into the existing request validation mechanism (ActionRequest#validate
). Every request gets already validated on the coordinating node. The SearchRequest
should simply call query#validate then I think (only problem being that we cannot quite do it yet, we need to refactor the search request first as right now the whole search request is just a big json object)?
As for exceptions, if we adopt this approach I think we should be consistent with the existing validation infra and have ActionRequestValidationException
s.
Does this make sense to you guys?
If so, we need to decide what to do for now, I think having the validate method on the base class makes sense as Lee mentioned. For now we could call it from a single place though (toQuery
is safer as it covers all the cases), once we move to parsing to the coord node we need to move it to the validation api instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this, I think this is a great ide, for now we can do the validate method and figure out where to hook it in at a later time.
Also as an aside, I'd love to get rid of Preconditions.checkNotNull
and just use Objects.requireNonNull
since it's part of the JVM, but that's just personal preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I add this to the QueryBuilder interface, I'll have add an empty impl to the BaseQueryBuilder for now (can be removed later once all queries have validate() method). For the details I would like to open a separate issue in which we can then have discussion about exact signature, which exceptions to be thrown, Preconditions vs. Objects.requireNonNull etc... Let me know if you agree with this plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #10777 for separate discussion on how to handle validation.
left a few comments |
I pulled out the validation of the two important fields into a method shared by builder and parser part, but not sure if this is the best way to go. Happy about comments there. Hope I adressed the rest of your comments, I would prefer tracking the use of QueryParseContext in toQuery() in a separate issue before making huge changes there. Same goes for generifying named queries (lookup but also maybe simplification) |
@@ -105,10 +113,28 @@ public TermQueryBuilder(String name, boolean value) { | |||
* @param value The value of the term | |||
*/ | |||
public TermQueryBuilder(String name, Object value) { | |||
this.name = name; | |||
this.fieldName = name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change name
to fieldName
in the constructor too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Changed variable name in constructor and added validate() to the QueryBuilder interface. As long as this is not implemented by all queries, added empty impl to BaseQueryBuilder. Also opened two separate issues to keep track of further ideas for validation (#10777) and for generifying named queries (#10776). |
return query; | ||
} | ||
|
||
public void validate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the @Override
annotation here?
99690c7
to
80d3571
Compare
@@ -80,27 +74,19 @@ public Query parse(QueryParseContext parseContext) throws IOException, QueryPars | |||
} | |||
parser.nextToken(); | |||
} else { | |||
value = parser.text(); | |||
value = parser.objectText(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I had myself here: I had to change this for the fromXContentent()
test to work. Otherwise, if the value is e.g. an int, it is written to the XContent query as a number value. With the original parser.text()
it is then read in as String (which doesn't seem to matter for the later toQuery(), since there BytesRefs.toBytesRef(value) is used, which calls toString() anyway) but blows up the newly introduced equality test. I changes this to objectText()
because this does some checking based in the JsonToken type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, I tend to want to revert this change since I'm not sure about it's implications. The problem then is that Object value
can be of different type in original query and after fromXContent()
, but in the end it only matters that toQuery()
produces the same result, and there every Object is converted to lucene BytesRef. Thats why I tend to use that conversion also in the TermQueryBuilder#equals()
. Will push that change to show what I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we discussed this and we went for always keeping BytesRef
within TermQueryBuilder
rather than mixing up String
and BytesRef
depending on whether we received the query via java api or we parsed it through json. Java api will still have a setter that accepts a string, but internally that setter will convert from String
to BytesRef
.
We do have to fix the above parser.text()
which looks like a bug that never manifested as when parsing and executing on the same (data) node everything still works. But we are currenly parsing the value in the term query short format (term: { field: value}
) always as a string.
bcfb8e2
to
f8625b4
Compare
@javanna went through the comments and current diff again, here are the things that are open from my point of view:
Anything else open here that I'm missing? |
@Override | ||
public void validate() throws ElasticsearchNullPointerException { | ||
Preconditions.checkNotNull(this.fieldName, "Fieldname of a TermQuery cannot be null."); | ||
Preconditions.checkNotNull(this.value, "Value of a TermQuery cannot be null."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to check if these are empty too and barf in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more thing: s/Fieldname/ field name
I think in general the first line of an error should be lowercase, that's pretty much the convention we use in the existing validation code.
@@ -388,6 +388,9 @@ public void writeGenericValue(@Nullable Object value) throws IOException { | |||
} else if (type == double[].class) { | |||
writeByte((byte) 20); | |||
writeDoubleArray((double[]) value); | |||
} else if (type == BytesRef.class) { | |||
writeByte((byte) 21); | |||
writeBytesRef((BytesRef) value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you rebase you'll get this change that was made upstream and merged back into our branch
Also extended BaseQueryTestCase so it has helper methods for parsing the query header and extended the toQuery() test method so it passes down parse context to sublass to make assertions on side effects calling toQuery() has on the parseContext.
8d18c4f
to
21afc61
Compare
Rebased this PR on current feature branch and changed validation exception class to own version similar to the previously suggested ActionRequestValidationException. Also opened separate issue #10974 for keeping track of adding tests for invalid json and set the TermQueryBuilder test to 20 repetitions. |
|
||
/** | ||
* Validate the query. | ||
* @return {@code null} if query is valid, otherwise {@link ActionRequestValidationException} containing error messages, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to update exception in javadocs
LGTM besides the minor comments left, if you can address them that would be great, this is good to merge then! |
5e1d936
to
11e841c
Compare
Thanks for the quick response, pushed this to the feature branch. |
Thanks for the review and for closing the issue, somehow it got opened On Wed, May 6, 2015 at 7:34 AM, Luca Cavanna [email protected]
Christoph Büscher |
Split the parse(QueryParseContext ctx) method into a parsing and a query building part, adding Streamable for serialization and hashCode(), equals() for better testing.
Add basic unit test for Builder and Parser.
PR goes agains query-refacoring feature branch.