[SPARK-10289] [SQL] A direct write API for testing Parquet #8454

liancheng · 2015-08-26T08:24:34Z

This PR introduces a direct write API for testing Parquet. It's a DSL flavored version of the writeDirect method comes with parquet-avro testing code. With this API, it's much easier to construct arbitrary Parquet structures. It's especially useful when adding regression tests for various compatibility corner cases.

Sample usage of this API can be found in the new test case added in ParquetThriftCompatibilitySuite.

SparkQA · 2015-08-26T08:33:31Z

Test build #41613 has started for PR 8454 at commit 149c23c.

…e records

SparkQA · 2015-08-26T12:18:33Z

Test build #41618 has finished for PR 8454 at commit 85747e4.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- implicit class RecordConsumerDSL(consumer: RecordConsumer)

marmbrus · 2015-08-29T20:23:11Z

...test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala

@@ -17,11 +17,15 @@

 package org.apache.spark.sql.execution.datasources.parquet

-import scala.collection.JavaConverters._
+import scala.collection.JavaConverters.{collectionAsScalaIterableConverter, mapAsJavaMapConverter, seqAsJavaListConverter}


Why the specific imports?

I thought we should be explicit and avoid wildcard imports according to our style guide. But just realized it's OK to have them for implicit methods.

marmbrus · 2015-08-29T20:24:09Z

Seems useful, and only touches test code, so I'm gonna merge into master and 1.5

marmbrus · 2015-08-29T20:25:49Z

Actually does not apply cleanly to branch-1.5, so I'll hold off.

liancheng · 2015-08-31T11:03:48Z

It's OK to not having this merged into branch-1.5. I've resolved SPARK-10289.

…or nested structs We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR #8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR #8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428. This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR #8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR #8515 depends on it. This should be safe since #8454 only touches testing code. Author: Cheng Lian <[email protected]> Closes #8583 from liancheng/spark-10301/for-1.5.

…or nested structs We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR apache#8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR apache#8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428. This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR apache#8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR apache#8515 depends on it. This should be safe since apache#8454 only touches testing code. Author: Cheng Lian <[email protected]> Closes apache#8583 from liancheng/spark-10301/for-1.5. (cherry picked from commit fca16c5) Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala

Adds testing direct write API for Parquet

144c4cd

Removes implicit RecordConsumer arguments, and allows writing multipl…

85747e4

…e records

liancheng force-pushed the spark-10289/parquet-testing-direct-write-api branch from 149c23c to 85747e4 Compare August 26, 2015 09:57

marmbrus reviewed Aug 29, 2015
View reviewed changes

asfgit closed this in 24ffa85 Aug 29, 2015

liancheng deleted the spark-10289/parquet-testing-direct-write-api branch August 31, 2015 10:57

liancheng mentioned this pull request Sep 3, 2015

[SPARK-10301] [SPARK-10428] [SQL] [BRANCH-1.5] Fixes schema merging for nested structs #8583

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-10289] [SQL] A direct write API for testing Parquet #8454

[SPARK-10289] [SQL] A direct write API for testing Parquet #8454

Uh oh!

liancheng commented Aug 26, 2015

Uh oh!

SparkQA commented Aug 26, 2015

Uh oh!

SparkQA commented Aug 26, 2015

Uh oh!

marmbrus Aug 29, 2015

Uh oh!

liancheng Aug 31, 2015

Uh oh!

marmbrus commented Aug 29, 2015

Uh oh!

marmbrus commented Aug 29, 2015

Uh oh!

liancheng commented Aug 31, 2015

Uh oh!

Uh oh!

[SPARK-10289] [SQL] A direct write API for testing Parquet #8454

[SPARK-10289] [SQL] A direct write API for testing Parquet #8454

Uh oh!

Conversation

liancheng commented Aug 26, 2015

Uh oh!

SparkQA commented Aug 26, 2015

Uh oh!

SparkQA commented Aug 26, 2015

Uh oh!

marmbrus Aug 29, 2015

Choose a reason for hiding this comment

Uh oh!

liancheng Aug 31, 2015

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Aug 29, 2015

Uh oh!

marmbrus commented Aug 29, 2015

Uh oh!

liancheng commented Aug 31, 2015

Uh oh!

Uh oh!