Skip to content

[SPARK-18134][SQL] Orderable MapType #19330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jinxing64
Copy link

@jinxing64 jinxing64 commented Sep 23, 2017

What changes were proposed in this pull request?

We can make MapType orderable, and thus usable in aggregates and joins.

Closes #15970

How was this patch tested?

Unit tests.

case map: MapType =>
val compareFunc = freshName("compareMap")
val compare = this.addReferenceObj("compare", map.interpretedOrdering,
classOf[Ordering[MapData]].getCanonicalName)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: we should code generate this.

@jinxing64
Copy link
Author

It seems #15970 is not being worked.
I resolved conflicts and add some tests in this pr.

@SparkQA
Copy link

SparkQA commented Sep 23, 2017

Test build #82106 has finished for PR 19330 at commit 2e2b98d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

hvanhovell commented Sep 23, 2017

@jinxing64 thanks for taking over. I have glanced over the PR, and I miss the explicit sorting of maps. We can't assume that maps are sorted (or even have unique keys) out of the box, for example the following example should evaluate to true but it won't in your PR: map(1, 2, 3, 4) = map(3, 4, 1, 2).

@SparkQA
Copy link

SparkQA commented Sep 23, 2017

Test build #82107 has finished for PR 19330 at commit bd50495.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor

@hvanhovell : based on your comment over the jira, it seemed that the approach to be used is yet to be finalised. Are we moving ahead with this approach ?

@jinxing64
Copy link
Author

jinxing64 commented Sep 24, 2017

@hvanhovell @tejasapatil
Thanks a lot for comment.
I got your point. I will refine soon.

@jinxing64 jinxing64 force-pushed the SPARK-18134-v2 branch 2 times, most recently from 33e532b to 0385487 Compare September 26, 2017 04:57
@jinxing64 jinxing64 changed the title Orderable MapType [SPARK-18134][SQL] Orderable MapType Sep 26, 2017
@SparkQA
Copy link

SparkQA commented Sep 26, 2017

Test build #82174 has finished for PR 19330 at commit 0385487.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@SparkQA
Copy link

SparkQA commented Sep 26, 2017

Test build #82173 has finished for PR 19330 at commit 33e532b.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@jinxing64
Copy link
Author

Jenkins, retest this plesase.

@SparkQA
Copy link

SparkQA commented Sep 26, 2017

Test build #82196 has finished for PR 19330 at commit 5279338.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@jinxing64
Copy link
Author

It seems the failed SparkR unit tests is not related.
In current change, I added trait OrderSpecified, expressions(BinaryComparison, Max, Min, SortArray, SortOrder) using ordering of data type extends OrderSpecified. During analyzing, expressions with map type output will be wrapped in OrderMaps , thus changed to be an ordered map.
Expressions with map type output in Aggregate, Distinct and SetOperation will also be wrapped in OrderMaps.

@kiszk
Copy link
Member

kiszk commented Sep 27, 2017

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Sep 27, 2017

Test build #82232 has finished for PR 19330 at commit 5279338.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@jinxing64
Copy link
Author

Conflicts resolved.

@SparkQA
Copy link

SparkQA commented Sep 27, 2017

Test build #82242 has finished for PR 19330 at commit e071892.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@kiszk
Copy link
Member

kiszk commented Sep 27, 2017

@jinxing64 While I am not familiar with R, may we have to add one more value for ordered at this line?

keyType: DataType,
valueType: DataType,
valueContainsNull: Boolean,
ordered: Boolean = false) extends DataType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to take care of ordered in buildFormattedString and jsonValue?

@SparkQA
Copy link

SparkQA commented Sep 28, 2017

Test build #82264 has finished for PR 19330 at commit 0b2b52c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2017

Test build #82272 has finished for PR 19330 at commit 80b6238.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2017

Test build #82282 has finished for PR 19330 at commit a9d09f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jinxing64
Copy link
Author

@kiszk
Thanks a lot for comments. Tests passed now. In current change ordered is included in jsonValue. But I'm not sure it is appropriate.

Thanks again for taking time looking into this :)

@kiszk
Copy link
Member

kiszk commented Sep 29, 2017

@jinxing64 I would like to hear opinions from other reviewers whether ordered should be included in these methods like ToString, buildFormattedString, and jsonValue.
IMO, I think we should make these methods consistent, ordered is included or not. The method prints the status of MapType

@jinxing64 jinxing64 force-pushed the SPARK-18134-v2 branch 2 times, most recently from 5445834 to a472dae Compare November 15, 2017 12:13
@SparkQA
Copy link

SparkQA commented Nov 15, 2017

Test build #83892 has finished for PR 19330 at commit 5445834.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@SparkQA
Copy link

SparkQA commented Nov 15, 2017

Test build #83893 has finished for PR 19330 at commit a472dae.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@SparkQA
Copy link

SparkQA commented Nov 16, 2017

Test build #83926 has finished for PR 19330 at commit b1886f6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@SparkQA
Copy link

SparkQA commented Nov 16, 2017

Test build #83929 has finished for PR 19330 at commit 6aa4b8b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@jinxing64
Copy link
Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Nov 16, 2017

Test build #83936 has finished for PR 19330 at commit 6aa4b8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@maropu
Copy link
Member

maropu commented Jan 9, 2018

Any update?

@jinxing64
Copy link
Author

@maropu @kiszk
In current change, ordered is excluded from toString, buildFormattedString, jsonValue; I prefer to keep ordered internal and used only when ordering.
Actually this change has been working in my cluster for a while.
Do you agree on the approach -- inject OrderMaps when needed ?
I'm happy to keep working on this, please comment.

@maropu
Copy link
Member

maropu commented Jan 11, 2018

Aha, I think we can't touch the stable interface (MapType) until the next release (v3.0). So, I think we need to hold this for a while. Or, we need to look for other approaches to fix this.

@xxzzycq
Copy link

xxzzycq commented Feb 27, 2018

Does the community currently have a join and group by code that supports mapType?

@jinxing64
Copy link
Author

@xxzzycq
Currently no

@maropu
Copy link
Member

maropu commented Oct 4, 2018

I think we can resume this pr for v3.0. Are u still there and can you resume this? @jinxing64
Also, can you add Closes #15970 in the PR description?

@jinxing64
Copy link
Author

@maropu
Thanks, and yes I'm still here and I can keep going if this pr is interested.
I will update this pr this weekend.

@maropu
Copy link
Member

maropu commented Oct 11, 2018

Thanks!

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97739 has finished for PR 19330 at commit 6aa4b8b.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97751 has finished for PR 19330 at commit 6aa4b8b.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97767 has finished for PR 19330 at commit 6aa4b8b.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • case class Max(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Min(child: Expression) extends DeclarativeAggregate with OrderSpecified
  • case class Least(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class Greatest(children: Seq[Expression]) extends Expression with OrderSpecified
  • case class OrderMaps(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class InterpretedOrdering(orders: Seq[SortOrder]) extends Ordering[InternalRow]
  • case class In(value: Expression, list: Seq[Expression]) extends Predicate with OrderSpecified
  • case class InSet(child: Expression, hset: Set[Any])
  • abstract class BinaryComparison extends BinaryOperator with Predicate with OrderSpecified

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97817 has started for PR 19330 at commit 6aa4b8b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants