Skip to content

[SPARK-8501] [SQL] Avoids reading schema from empty ORC files #7199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

liancheng
Copy link
Contributor

ORC writes empty schema (struct<>) to ORC files containing zero rows. This is OK for Hive since the table schema is managed by the metastore. But it causes trouble when reading raw ORC files via Spark SQL since we have to discover the schema from the files.

Notice that the ORC data source always avoids writing empty ORC files, but it's still problematic when reading Hive tables which contain empty part-files.

@liancheng
Copy link
Contributor Author

cc @yhuai @zhzhan

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36437 has finished for PR 7199 at commit a290221.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36456 has finished for PR 7199 at commit ad5b0ae.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36459 has finished for PR 7199 at commit bb8cd95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Merging to master. This PR is backported to branch-1.4 by #7200.

asfgit pushed a commit that referenced this pull request Jul 3, 2015
…rt to 1.4)

This PR backports #7199 to branch-1.4

Author: Cheng Lian <[email protected]>

Closes #7200 from liancheng/spark-8501-for-1.4 and squashes the following commits:

725e9e3 [Cheng Lian] Addresses comments
0fa25af [Cheng Lian] Avoids reading schema from empty ORC files
@asfgit asfgit closed this in 20a4d7d Jul 3, 2015
@liancheng liancheng deleted the spark-8501 branch September 27, 2016 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants