Skip to content

SPARK-6548 Adding stddev to DataFrame functions #6297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 56 commits into from
Closed

SPARK-6548 Adding stddev to DataFrame functions #6297

wants to merge 56 commits into from

Conversation

JihongMA
Copy link
Contributor

Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.

JihongMA and others added 29 commits May 5, 2015 21:17
This reverts commit c40701a.
This reverts commit 3e7d889.
This reverts commit 9c84695.

Conflicts:

	docs/running-on-yarn.md
This reverts commit a399aa6.

Conflicts:

	docs/running-on-yarn.md
@@ -292,6 +293,7 @@ class SqlParser extends AbstractSparkSQLParser with DataTypeParser {
| AVG ~ "(" ~> expression <~ ")" ^^ { case exp => Average(exp) }
| MIN ~ "(" ~> expression <~ ")" ^^ { case exp => Min(exp) }
| MAX ~ "(" ~> expression <~ ")" ^^ { case exp => Max(exp) }
| STDDEV ~ "(" ~> expression <~ ")" ^^ { case exp => Stddev(exp)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have changed how these plug in. You'll need to change the FunctionRegistry now.

@JihongMA
Copy link
Contributor Author

Please don't test it yet, need to make change to accomodate API change introduced by other JIRA.

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38399 has finished for PR 6297 at commit 87fd2dc.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class InternalRow extends Serializable
    • case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
    • case class ComputePartialStd(child: Expression) extends AggregateExpression
    • case class CombinePartialStd(child: Expression) extends AggregateExpression
    • case class ComputePartialStdFunction (
    • case class CombinePartialStdFunction(
    • case class StddevFunction(
    • class GenericRow(protected[sql] val values: Array[Any]) extends Row
    • class GenericInternalRow(protected[sql] val values: Array[Any]) extends InternalRow
    • class GenericInternalRowWithSchema(values: Array[Any], val schema: StructType)
    • class GenericMutableRow(val values: Array[Any]) extends MutableRow

@yhuai
Copy link
Contributor

yhuai commented Jul 29, 2015

@JihongMA Will you get time to implement the function based on the new API? It will be good if we can merge it before the 1.5 deadline for new features (end of this month).

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41730 has finished for PR 6297 at commit 25425ac.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(child: Expression, isSample: Boolean) extends UnaryExpression with AggregateExpression1
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41732 has finished for PR 6297 at commit f4c725c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41748 has finished for PR 6297 at commit 0902ceb.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@SparkQA
Copy link

SparkQA commented Sep 4, 2015

Test build #42006 has finished for PR 6297 at commit a81d0fc.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@JihongMA
Copy link
Contributor Author

JihongMA commented Sep 4, 2015

R style check failure is caused by commit of SPARK-8951

@SparkQA
Copy link

SparkQA commented Sep 6, 2015

Test build #42062 has finished for PR 6297 at commit 6035648.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType))

private val resultType = child.dataType match {
case DecimalType.Fixed(p, s) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should always return Double, because Sqrt() only works with Double, also other databases just return Double/float.

@SparkQA
Copy link

SparkQA commented Sep 12, 2015

Test build #42366 has finished for PR 6297 at commit 6351fc8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@davies
Copy link
Contributor

davies commented Sep 12, 2015

LGTM, merging this into master, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants