Skip to content

[SPARK-10289] [SQL] A direct write API for testing Parquet #8454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

liancheng
Copy link
Contributor

This PR introduces a direct write API for testing Parquet. It's a DSL flavored version of the writeDirect method comes with parquet-avro testing code. With this API, it's much easier to construct arbitrary Parquet structures. It's especially useful when adding regression tests for various compatibility corner cases.

Sample usage of this API can be found in the new test case added in ParquetThriftCompatibilitySuite.

@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #41613 has started for PR 8454 at commit 149c23c.

@liancheng liancheng force-pushed the spark-10289/parquet-testing-direct-write-api branch from 149c23c to 85747e4 Compare August 26, 2015 09:57
@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #41618 has finished for PR 8454 at commit 85747e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • implicit class RecordConsumerDSL(consumer: RecordConsumer)

@@ -17,11 +17,15 @@

package org.apache.spark.sql.execution.datasources.parquet

import scala.collection.JavaConverters._
import scala.collection.JavaConverters.{collectionAsScalaIterableConverter, mapAsJavaMapConverter, seqAsJavaListConverter}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the specific imports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we should be explicit and avoid wildcard imports according to our style guide. But just realized it's OK to have them for implicit methods.

@marmbrus
Copy link
Contributor

Seems useful, and only touches test code, so I'm gonna merge into master and 1.5

@marmbrus
Copy link
Contributor

Actually does not apply cleanly to branch-1.5, so I'll hold off.

@asfgit asfgit closed this in 24ffa85 Aug 29, 2015
@liancheng liancheng deleted the spark-10289/parquet-testing-direct-write-api branch August 31, 2015 10:57
@liancheng
Copy link
Contributor Author

It's OK to not having this merged into branch-1.5. I've resolved SPARK-10289.

asfgit pushed a commit that referenced this pull request Sep 9, 2015
…or nested structs

We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR #8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR #8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428.

This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR #8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR #8515 depends on it. This should be safe since #8454 only touches testing code.

Author: Cheng Lian <[email protected]>

Closes #8583 from liancheng/spark-10301/for-1.5.
ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
…or nested structs

We used to workaround SPARK-10301 with a quick fix in branch-1.5 (PR apache#8515), but it doesn't cover the case described in SPARK-10428. So this PR backports PR apache#8509, which had once been considered too big a change to be merged into branch-1.5 in the last minute, to fix both SPARK-10301 and SPARK-10428 for Spark 1.5. Also added more test cases for SPARK-10428.

This PR looks big, but the essential change is only ~200 loc. All other changes are for testing. Especially, PR apache#8454 is also backported here because the `ParquetInteroperabilitySuite` introduced in PR apache#8515 depends on it. This should be safe since apache#8454 only touches testing code.

Author: Cheng Lian <[email protected]>

Closes apache#8583 from liancheng/spark-10301/for-1.5.

(cherry picked from commit fca16c5)

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants