Skip to content

[SPARK-26555][SQL] make ScalaReflection subtype checking thread safe #24085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

mwlon
Copy link

@mwlon mwlon commented Mar 14, 2019

What changes were proposed in this pull request?

Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).

How was this patch tested?

Existing tests and a new one for the new subtype checking function.

@@ -66,19 +66,25 @@ object ScalaReflection extends ScalaReflection {
*/
def dataTypeFor[T : TypeTag]: DataType = dataTypeFor(localTypeOf[T])

private[catalyst] def isSubtype(tpe1: `Type`, tpe2: `Type`): Boolean = {
this.synchronized {
tpe1 <:< tpe2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, crazy. This is a global lock though and I'm afraid it will bottleneck/deadlock something. I'm wondering if there is any other way, or if this even catches all the thread-safety issues

Copy link
Author

@mwlon mwlon Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experiments it does. Are you familiar with a situation where it might not?

One thing I'm not totally certain about is whether it's as efficient (performance-wise) to lock here (which means accessing the lock many times per function) vs. locking once each function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the problem, I would bet this creates some thread contention bottlenecks. It's a problem although somewhat niche w.r.t. Spark as I'm not sure you'd perform operations in parallel, usually, that called this code. It's more of a Scala bug; lightweight workarounds on the Spark end are OK but I'm unsure if this, even if it's the whole fix, is worth the contention. That said it's only an educated guess.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important to workaround this in Spark, since it breaks thread safety in many important functions like createDataset. I can give examples of where it is necessary to use multithreading.

Trying this with concurrency 5, this seems to slow down subtype checking by ~4%, which I wouldn't worry about since it's never the bottleneck in a Spark application. I'm happy to change this to larger blocks of locking rather than a helper function, but it's important that we resolve this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree it's worth the cost, for correctness. I'd hope there's a finer-grained way to lock to avoid this, but I don't know if there is, as it's quite specific to Scala internals. Let me ask dev@ for any ideas or input.

@@ -66,19 +66,25 @@ object ScalaReflection extends ScalaReflection {
*/
def dataTypeFor[T : TypeTag]: DataType = dataTypeFor(localTypeOf[T])

private[catalyst] def isSubtype(tpe1: `Type`, tpe2: `Type`): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some comments to explain it and link to the scala bug report?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Added a comment

@skonto
Copy link
Contributor

skonto commented Mar 15, 2019

There used to be a similar logic: https://github.com/apache/spark/pull/1423/files for a similar issue for 2.10

@srowen
Copy link
Member

srowen commented Mar 15, 2019

Good point @skonto , that looks like it was added to address a Scala bug that had been fixed, and then removed as a result. That's good precedent for reintroducing it.

@cloud-fan
Copy link
Contributor

interesting. We removed that hack because we believe the bug has been fixed in Scala 2.11. Seems we were wrong...

@srowen
Copy link
Member

srowen commented Mar 15, 2019

@cloud-fan yeah I think something else was fixed and this is a different issue of the same type: scala/bug#10766 (comment)

@SparkQA
Copy link

SparkQA commented Mar 15, 2019

Test build #4626 has finished for PR 24085 at commit d4c50ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 18, 2019

I'd like to merge this back to 2.3 if there are no concerns. It was indeed a mechanism that already existed and looks like it is needed post-2.10 anyway.

@mwlon
Copy link
Author

mwlon commented Mar 18, 2019

Shall we go ahead and merge this?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

We need individual PRs for 2.4 and 2.3, as ScalaReflection has been changed quite a bit.

@cloud-fan cloud-fan closed this in d5c08fc Mar 19, 2019
JoshRosen pushed a commit to JoshRosen/spark that referenced this pull request Jun 19, 2019
Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).

Existing tests and a new one for the new subtype checking function.

Closes apache#24085 from mwlon/SPARK-26555.

Authored-by: mwlon <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@JoshRosen
Copy link
Contributor

I've opened a 2.4.x backport for this at #24913

JoshRosen pushed a commit that referenced this pull request Jun 20, 2019
…thread safe

This is a Spark 2.4.x backport of #24085. Original description follows below:

## What changes were proposed in this pull request?

Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).

## How was this patch tested?

Existing tests and a new one for the new subtype checking function.

Closes #24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport.

Authored-by: mwlon <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
…thread safe

This is a Spark 2.4.x backport of apache#24085. Original description follows below:

## What changes were proposed in this pull request?

Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).

## How was this patch tested?

Existing tests and a new one for the new subtype checking function.

Closes apache#24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport.

Authored-by: mwlon <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
…thread safe

This is a Spark 2.4.x backport of apache#24085. Original description follows below:

## What changes were proposed in this pull request?

Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).

## How was this patch tested?

Existing tests and a new one for the new subtype checking function.

Closes apache#24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport.

Authored-by: mwlon <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
igreenfield pushed a commit to axiomsl/spark that referenced this pull request Nov 4, 2019
…thread safe

This is a Spark 2.4.x backport of apache#24085. Original description follows below:

## What changes were proposed in this pull request?

Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).

## How was this patch tested?

Existing tests and a new one for the new subtype checking function.

Closes apache#24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport.

Authored-by: mwlon <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
    ## What changes were proposed in this pull request?

    Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).

    ## How was this patch tested?

    Existing tests and a new one for the new subtype checking function.

    Closes apache#24085 from mwlon/SPARK-26555.

    Authored-by: mwlon <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>

RB=1924052
BUG=LIHADOOP-50758
G=superfriends-reviewers
R=fli,latang,mshen,zolin,yezhou
A=mshen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants