Skip to content

[SPARK-8148] Do not use FloatType in partition column inference. #6692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Jun 7, 2015

Use DoubleType instead to be more stable and robust.

@rxin
Copy link
Contributor Author

rxin commented Jun 7, 2015

cc @liancheng

@rxin
Copy link
Contributor Author

rxin commented Jun 7, 2015

The other thing we should consider, although I'm less sure about, is whether we should skip IntegerType and go straight to LongType.

@SparkQA
Copy link

SparkQA commented Jun 7, 2015

Test build #34396 has finished for PR 6692 at commit f88b4e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 7, 2015

Test build #34398 has finished for PR 6692 at commit 0ac8c31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var isNull: Term, var primitive: Term)
    • class CodeGenContext
    • case class Pow(left: Expression, right: Expression)
    • case class Rint(child: Expression) extends UnaryMathExpression(math.rint, "ROUND")
    • case class ToDegrees(child: Expression) extends UnaryMathExpression(math.toDegrees, "DEGREES")
    • case class ToRadians(child: Expression) extends UnaryMathExpression(math.toRadians, "RADIANS")

@liancheng
Copy link
Contributor

I'm worrying about skipping FloatType (and possibly IntegerType) might break existing user code because the partition column data types gets changed. Especially when the inferred schema gets persisted in places like Parquet file metadata and metastore.

@rxin
Copy link
Contributor Author

rxin commented Jun 8, 2015

If it is persisted in the metastore, then inference is no longer used, isn't it?

And why would partition columns be stored in Parquet file metadata?

@liancheng
Copy link
Contributor

Had offline discussion with @rxin. There can be rare corner cases where compatibility issues may arise. However, stop using FloatType in this case can eliminate these corner cases. So we would like to have this and add a note in release notes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2015

Test build #34419 has finished for PR 6692 at commit 4fd761f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Use DoubleType instead to be more stable and robust.
@SparkQA
Copy link

SparkQA commented Jun 8, 2015

Test build #34457 has finished for PR 6692 at commit 6742ecc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 5185389 Jun 8, 2015
@SparkQA
Copy link

SparkQA commented Jun 8, 2015

Test build #891 timed out for PR 6692 at commit 4fd761f after a configured wait of 175m.

nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Use DoubleType instead to be more stable and robust.

Author: Reynold Xin <[email protected]>

Closes apache#6692 from rxin/SPARK-8148 and squashes the following commits:

6742ecc [Reynold Xin] [SPARK-8148] Do not use FloatType in partition column inference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants