-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-10289] [SQL] A direct write API for testing Parquet #8454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10289] [SQL] A direct write API for testing Parquet #8454
Conversation
Test build #41613 has started for PR 8454 at commit |
149c23c
to
85747e4
Compare
Test build #41618 has finished for PR 8454 at commit
|
@@ -17,11 +17,15 @@ | |||
|
|||
package org.apache.spark.sql.execution.datasources.parquet | |||
|
|||
import scala.collection.JavaConverters._ | |||
import scala.collection.JavaConverters.{collectionAsScalaIterableConverter, mapAsJavaMapConverter, seqAsJavaListConverter} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the specific imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we should be explicit and avoid wildcard imports according to our style guide. But just realized it's OK to have them for implicit methods.
Seems useful, and only touches test code, so I'm gonna merge into master and 1.5 |
Actually does not apply cleanly to branch-1.5, so I'll hold off. |
It's OK to not having this merged into branch-1.5. I've resolved SPARK-10289. |
…or nested structs We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR #8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR #8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428. This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR #8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR #8515 depends on it. This should be safe since #8454 only touches testing code. Author: Cheng Lian <[email protected]> Closes #8583 from liancheng/spark-10301/for-1.5.
…or nested structs We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR apache#8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR apache#8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428. This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR apache#8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR apache#8515 depends on it. This should be safe since apache#8454 only touches testing code. Author: Cheng Lian <[email protected]> Closes apache#8583 from liancheng/spark-10301/for-1.5. (cherry picked from commit fca16c5) Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
This PR introduces a direct write API for testing Parquet. It's a DSL flavored version of the
writeDirect
method comes with parquet-avro testing code. With this API, it's much easier to construct arbitrary Parquet structures. It's especially useful when adding regression tests for various compatibility corner cases.Sample usage of this API can be found in the new test case added in
ParquetThriftCompatibilitySuite
.