Skip to content

[SPARK-8432] [SQL] fix hashCode() and equals() of BinaryType in Row #6876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Jun 18, 2015

Also added more tests in LiteralExpressionSuite

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35112 has finished for PR 6876 at commit 53c38b1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

for (int i = 0; i < arr.length; i++) {
hash = hash * 37 + (int)arr[i];
}
return hash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it same with java.util.Arrays.hashCode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, we should use that.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35114 has finished for PR 6876 at commit 0fff25d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35144 has finished for PR 6876 at commit 5819d33.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35146 has finished for PR 6876 at commit 6ad2a90.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Jun 18, 2015

@marmbrus Could you help to review this one?

/**
* A generic version of Row.equals(Row), which is used for tests.
*/
@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing: can you add some javadoc to this class to explain what its used for and why its in Java?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Row is a trait, UnsafeRow and SpecificRow are both in Java, they can not inherit some default implementations from Row, so created BaseRow in Java for them. Right now, we have InternalRow, will be clean these in another PR.

@marmbrus
Copy link
Contributor

@davies, thanks for working on this! I'm okay with this approach, but did you consider the alternative, where we instead change the internal type of BytesType to be a wrapper class instead of Array[Byte], and define a good equality function for that wrapper?

@davies
Copy link
Contributor Author

davies commented Jun 18, 2015

@marmbrus We're working to have more efficient representation in catalyst, putting Array[Byte] inside a wrapper sounds not in the same direction. I'd like to go this approach.

@marmbrus
Copy link
Contributor

I think using a wrapper might be necessary for efficiency. For example, we will want to reuse the same byte array when reading from something like parquet, instead of needing to allocate one of the exact size each time (think java.nio.ByteBuffer). However, I'm fine merging this first.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35189 timed out for PR 6876 at commit d96929b after a configured wait of 175m.

override def copy(): InternalRow = this

override def equals(o: Any): Boolean = {
if (!o.isInstanceOf[Row]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we change it to isInstanceOf[InternalRow] after #6869?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -31,8 +45,16 @@ class LiteralExpressionSuite extends SparkFunSuite with ExpressionEvalHelper {
}

test("int literals") {
checkEvaluation(Literal(1), 1)
checkEvaluation(Literal(0L), 0L)
List(0, 1, Int.MinValue, Int.MaxValue).foreach {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pretty weird way of indenting. you can do

List(0, 1, Int.MinValue, Int.MaxValue).foreach { d =>
  ...
}

or

for (d <- List(0, 1, Int.MinValue, Int.MaxValue)) {
  ...
}

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35225 has finished for PR 6876 at commit 41caec6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35283 has finished for PR 6876 at commit bd20780.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #937 has finished for PR 6876 at commit a0626ed.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35284 has finished for PR 6876 at commit a0626ed.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #939 has finished for PR 6876 at commit a0626ed.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SerializableConfiguration(@transient var value: Configuration) extends Serializable
    • class SerializableJobConf(@transient var value: JobConf) extends Serializable

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35307 has finished for PR 6876 at commit a0626ed.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35322 has finished for PR 6876 at commit 32d9811.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -127,6 +127,7 @@ object GenerateProjection extends CodeGenerator[Seq[Expression], Projection] {
case FloatType => s"Float.floatToIntBits($col)"
case DoubleType =>
s"(int)(Double.doubleToLongBits($col) ^ (Double.doubleToLongBits($col) >>> 32))"
case BinaryType => s"java.util.Arrays.hashCode($col)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also update equals for generated code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, genEqual has already handled BinaryType.

@SparkQA
Copy link

SparkQA commented Jun 21, 2015

Test build #946 has finished for PR 6876 at commit 32d9811.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Conflicts:
	unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35483 has finished for PR 6876 at commit 429c2c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAlias(child: Expression) extends NamedExpression
    • abstract class ExtractValueWithStruct extends ExtractValue

@marmbrus
Copy link
Contributor

Thanks, merging to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants