-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-8707] RDD#toDebugString fails if any cached RDD has invalid partitions #7127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We may not need to introduce another version of toDebugString with a argument to get around this issue. The reported problem is - if there are any unreleated invalid RDDs in the same application it tries to evaluate all of them unnecessarily and fails. So we should be restricting toDebugString evaluate its own partitions. |
add to whitelist |
retest this please |
Test build #36350 has finished for PR 7127 at commit
|
Adressed comments. Thanks, @kmadhugit |
Test build #36539 has finished for PR 7127 at commit
|
Test build #41341 has finished for PR 7127 at commit
|
Test build #41580 has finished for PR 7127 at commit
|
} | ||
|
||
@DeveloperApi | ||
def getRDDStorageInfo(filter: RDD[_] => Boolean): Array[RDDInfo] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make this private[spark]
. Let's try to limit the number of things we expose.
LGTM, left a few minor comments. I'll merge this once you address them. |
retest this please |
Test build #41888 has finished for PR 7127 at commit
|
I've merged this into master after applying the change myself |
Added numPartitions(evaluate: Boolean) to RDD. With "evaluate=true" the method is same with "partitions.length". With "evaluate=false", it checks checked-out or already evaluated partitions in the RDD to get number of partition. If it's not those cases, returns -1. RDDInfo.partitionNum calls numPartition only when it's accessed.