-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-26555][SQL] make ScalaReflection subtype checking thread safe #24085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -66,19 +66,25 @@ object ScalaReflection extends ScalaReflection { | |||
*/ | |||
def dataTypeFor[T : TypeTag]: DataType = dataTypeFor(localTypeOf[T]) | |||
|
|||
private[catalyst] def isSubtype(tpe1: `Type`, tpe2: `Type`): Boolean = { | |||
this.synchronized { | |||
tpe1 <:< tpe2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, crazy. This is a global lock though and I'm afraid it will bottleneck/deadlock something. I'm wondering if there is any other way, or if this even catches all the thread-safety issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experiments it does. Are you familiar with a situation where it might not?
One thing I'm not totally certain about is whether it's as efficient (performance-wise) to lock here (which means accessing the lock many times per function) vs. locking once each function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the problem, I would bet this creates some thread contention bottlenecks. It's a problem although somewhat niche w.r.t. Spark as I'm not sure you'd perform operations in parallel, usually, that called this code. It's more of a Scala bug; lightweight workarounds on the Spark end are OK but I'm unsure if this, even if it's the whole fix, is worth the contention. That said it's only an educated guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important to workaround this in Spark, since it breaks thread safety in many important functions like createDataset
. I can give examples of where it is necessary to use multithreading.
Trying this with concurrency 5, this seems to slow down subtype checking by ~4%, which I wouldn't worry about since it's never the bottleneck in a Spark application. I'm happy to change this to larger blocks of locking rather than a helper function, but it's important that we resolve this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree it's worth the cost, for correctness. I'd hope there's a finer-grained way to lock to avoid this, but I don't know if there is, as it's quite specific to Scala internals. Let me ask dev@ for any ideas or input.
@@ -66,19 +66,25 @@ object ScalaReflection extends ScalaReflection { | |||
*/ | |||
def dataTypeFor[T : TypeTag]: DataType = dataTypeFor(localTypeOf[T]) | |||
|
|||
private[catalyst] def isSubtype(tpe1: `Type`, tpe2: `Type`): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some comments to explain it and link to the scala bug report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Added a comment
There used to be a similar logic: https://github.com/apache/spark/pull/1423/files for a similar issue for 2.10 |
Good point @skonto , that looks like it was added to address a Scala bug that had been fixed, and then removed as a result. That's good precedent for reintroducing it. |
interesting. We removed that hack because we believe the bug has been fixed in Scala 2.11. Seems we were wrong... |
@cloud-fan yeah I think something else was fixed and this is a different issue of the same type: scala/bug#10766 (comment) |
Test build #4626 has finished for PR 24085 at commit
|
I'd like to merge this back to 2.3 if there are no concerns. It was indeed a mechanism that already existed and looks like it is needed post-2.10 anyway. |
Shall we go ahead and merge this? |
thanks, merging to master! We need individual PRs for 2.4 and 2.3, as |
Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). Existing tests and a new one for the new subtype checking function. Closes apache#24085 from mwlon/SPARK-26555. Authored-by: mwlon <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
I've opened a 2.4.x backport for this at #24913 |
…thread safe This is a Spark 2.4.x backport of #24085. Original description follows below: ## What changes were proposed in this pull request? Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). ## How was this patch tested? Existing tests and a new one for the new subtype checking function. Closes #24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport. Authored-by: mwlon <[email protected]> Signed-off-by: Josh Rosen <[email protected]>
…thread safe This is a Spark 2.4.x backport of apache#24085. Original description follows below: ## What changes were proposed in this pull request? Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). ## How was this patch tested? Existing tests and a new one for the new subtype checking function. Closes apache#24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport. Authored-by: mwlon <[email protected]> Signed-off-by: Josh Rosen <[email protected]>
…thread safe This is a Spark 2.4.x backport of apache#24085. Original description follows below: ## What changes were proposed in this pull request? Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). ## How was this patch tested? Existing tests and a new one for the new subtype checking function. Closes apache#24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport. Authored-by: mwlon <[email protected]> Signed-off-by: Josh Rosen <[email protected]>
…thread safe This is a Spark 2.4.x backport of apache#24085. Original description follows below: ## What changes were proposed in this pull request? Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). ## How was this patch tested? Existing tests and a new one for the new subtype checking function. Closes apache#24913 from JoshRosen/joshrosen/SPARK-26555-branch-2.4-backport. Authored-by: mwlon <[email protected]> Signed-off-by: Josh Rosen <[email protected]>
## What changes were proposed in this pull request? Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766). ## How was this patch tested? Existing tests and a new one for the new subtype checking function. Closes apache#24085 from mwlon/SPARK-26555. Authored-by: mwlon <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> RB=1924052 BUG=LIHADOOP-50758 G=superfriends-reviewers R=fli,latang,mshen,zolin,yezhou A=mshen
What changes were proposed in this pull request?
Make ScalaReflection subtype checking thread safe by adding a lock. There is a thread safety bug in the <:< operator in all versions of scala (scala/bug#10766).
How was this patch tested?
Existing tests and a new one for the new subtype checking function.