-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Refactoring of FuzzyQuery #11865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of FuzzyQuery #11865
Conversation
import org.elasticsearch.common.ParseField; | ||
import org.elasticsearch.common.inject.Inject; | ||
import org.elasticsearch.common.unit.Fuzziness; | ||
import org.elasticsearch.common.xcontent.XContentParser; | ||
import org.elasticsearch.index.mapper.FieldMapper; | ||
import org.elasticsearch.index.mapper.MappedFieldType; | ||
import org.elasticsearch.index.query.support.QueryParsers; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add some text or remove the empty comment?
Did a round of review, agree with most of the changes. I left some comments, mostly concerning questions of where to set default Rewrite method when in filter context and some thoughts concerning leaving internal query value representation as |
Thanks for the review. I went for storing |
@alexksikes I went through your last commit, left a few comments. Sorry I didn't realize they show up in such a strange way if I do the comments on the commit directly here in Github, will try to avoid this in the future. |
@Override | ||
public void writeTo(StreamOutput out) throws IOException { | ||
out.writeString(this.fieldName); | ||
out.writeGenericValue(this.value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since value
internally is a String now, we can change read/write here as well.
Went through the changes another time, would like for @javanna to have another look and for an opinion about how much we should restrict the current constructor that allows (but won't work) for any |
Thanks for the review. I restricted the set of accepted values for now. |
} else if (value instanceof Float) { | ||
return new Fuzziness((Float) value); | ||
} else if (value instanceof String) { | ||
return new Fuzziness((String) value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a new constructor Fuzziness(Object) instead of these ifs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexksikes did you see this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why it is rendered like this in github, but I am using a factory Fuzziness.newFuziness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see but I wonder if we should remove this method and simply accept Object and add a Fuzziness constructor that accepts Object instead
I had a quick look at this and left a few comments. There is an inconsistency between parser and builder here in the existing code. The builder supports an object for the value mamber but the parser only parses a string. I am not 100% sure which of the two is wrong. Feels like fuzzy query should allow for an object similar to what the term query does, but then our MappedFieldType#fuzzyQuery method takes a string as argument, so object wouldn't really work out, that would lead us to support strings only then? Thoughts @jpountz ? |
Also @alexksikes would be nice if you can rebase, it is hard to review at this point after #11974 got in. Thanks! |
@javanna I looked at how classes that extend MappedFieldType implement fuzzyQuery, and they seem to all need to perform some parsing of the string value. So maybe we could make it an object and update all fuzzyQuery implementations to parse the object to the appropriate type (Date, Long, ...) using exactly the same logic as the termQuery method? |
Agreed @jpountz thanks for looking. We will do that upstream rather than in the query-refactoring branch then. Makes sense @alexksikes ? |
+1 to doing it in master |
Yes agreed. Thanks @jpountz. |
5ac4b5f
to
5f655f5
Compare
INT_FIELD_NAME, "type=integer", | ||
DOUBLE_FIELD_NAME, "type=double", | ||
BOOLEAN_FIELD_NAME, "type=boolean", | ||
STRING_FIELD_NAME, "type=string", | ||
DATE_FIELD_NAME, "type=date", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain the reason behind this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the ordering changed
left come comments |
f196754
to
45e6afd
Compare
@javanna I addressed the comments and rebased. Thanks for the review. |
|
||
public static String randomTimeValue() { | ||
final String[] values = new String[]{"d", "H", "ms", "s", "S", "w"}; | ||
return randomIntBetween(0, 3) + randomFrom(values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't we say that we would change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you meant between 0 - 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I meant that I don't understand why we randomize between 0 and 3. Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I guess that is more of a fantasy related to the name of the method, although 0s or 0d could be corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lemme clarify I mean that if we add this method to ESTestCase it should be generic enough. I guess there are much more valid values possible for a time value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK then the value will be 0 - 1000
I did another round and replied to your comments. @cbuescher wanna have a quick look too? |
FuzzyQueryBuilder fuzzyQueryBuilder = new FuzzyQueryBuilder(in.readString(), in.readGenericValue()); | ||
fuzzyQueryBuilder.fuzziness = Fuzziness.readFuzzinessFrom(in); | ||
fuzzyQueryBuilder.prefixLength = in.readVInt(); | ||
fuzzyQueryBuilder.maxExpansions = in.readVInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those two fields are now non-optional, so could use readInt()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since these are more likely to be small, better use the V
methods. There are always written and yes there are none optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm sorry I mixed VInt with optional, my fault. Must be the heat. Please ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, the heat does affect me too :-)
@alexksikes did a very quick round and left only a few comments, mostly small stuff and questions for clarfication |
3ebce4b
to
fae6cb7
Compare
@javanna rebased and comments addressed. Thanks for the review as always. |
private Fuzziness(int fuzziness) { | ||
Preconditions.checkArgument(fuzziness >= 0 && fuzziness <= 2, "Valid edit distances are [0, 1, 2] but was [" + fuzziness + "]"); | ||
this.fuzziness = Integer.toString(fuzziness); | ||
} | ||
|
||
private Fuzziness(String fuzziness) { | ||
this.fuzziness = fuzziness; | ||
if (fuzziness == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be replaced with Objects.requireNotNull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see it throws an NPE exception, not as neat as the error we return to the user, unless you want to wrap the call in a try catch and still return the nicer error to the user.
left a small comment |
@@ -121,7 +130,7 @@ public int asDistance() { | |||
} | |||
|
|||
public int asDistance(String text) { | |||
if (this == AUTO) { //AUTO | |||
if (fuzziness.equals("AUTO")) { //AUTO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to leverage our own equals method rather than a string comparison. AUTO.equals(this) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha yes!
LGTM |
f96badb
to
626b1ab
Compare
Relates to #10217
This PR is against the query-refactoring branch.