address review comments #8

cloud-fan · 2020-11-16T19:35:55Z

No description provided.

cloud-fan · 2020-11-16T19:37:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

-    val serdeInfo =
-      (fileFormatSerdeInfo ++ rowFormatSerdeInfo).reduceLeftOption((x, y) => x.merge(y))
-
+    val serdeInfo = getSerdeInfo(ctx.rowFormat.asScala, ctx.createFileFormat.asScala, ctx)


make a method for it, so that we can reuse it in SparkSqlAstBuilder

cloud-fan · 2020-11-16T19:38:51Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

-      operationNotAllowed("REPLACE ... IF NOT EXISTS, use CREATE IF NOT EXISTS instead", ctx)
-    }
-
+    assert(!temp && !ifNotExists && !external)


visitReplaceTableHeader simply return 3 false for these 3 properties. So it's simpler to use assert here.

I think it is bad practice for a method to assume it will get certain results from other methods. While using an assert handles correctness, if there is a change that violates the assertion, a user would get a nearly unusable error.

I think this change should be reverted so that the error messages are meaningful.

cloud-fan · 2020-11-16T19:43:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala


      case Some(query) =>
        ReplaceTableAsSelectStatement(table, query, partitioning, bucketSpec, properties,
          provider, options, location, comment, writeOptions = Map.empty, serdeInfo,
          orCreate = orCreate)

      case _ =>
-        ReplaceTableStatement(table, schema.getOrElse(new StructType), partitioning,


Previously, when table columns list is not specified, we ignore the partition columns with data type. It was fine before syntax merging, as there was no partition columns with data type in REPLACE TABLE. But now it's better to make it consistent with CREATE TABLE. I also added test to check it: https://github.com/rdblue/spark/pull/8/files#diff-b9e91f767e5562861565b0ce78759af3bcb7fff405a81e928894641147db2ae4R293

cloud-fan · 2020-11-16T19:44:37Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala

@@ -63,11 +63,11 @@ case class SerdeInfo(
    serdeProperties: Map[String, String] = Map.empty) {
  // this uses assertions because validation is done in validateRowFormatFileFormat etc.
  assert(storedAs.isEmpty || formatClasses.isEmpty,
-    s"Conflicting STORED AS $storedAs and INPUTFORMAT/OUTPUTFORMAT $formatClasses values")


it's a bit weird to print scala Option directly.

cloud-fan · 2020-11-16T19:45:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala

@@ -85,7 +85,7 @@ case class SerdeInfo(
  def merge(other: SerdeInfo): SerdeInfo = {
    def getOnly[T](desc: String, left: Option[T], right: Option[T]): Option[T] = {
      (left, right) match {
-        case (Some(l), Some(r)) if l != r =>


otherwise the assert below is useless.

cloud-fan · 2020-11-16T19:47:23Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

-          CreateTable(tableDesc, mode, None)
-
-        case None =>
+      val (storageFormat, provider) = getStorageFormatAndProvider(


I did some refactoring, to avoid duplicated code between CREATE TABLE and CTAS, buildCatalogTable and buildHiveCatalogTable

Looks fine to me. I like that it should have fewer changes from master.

cloud-fan · 2020-11-16T19:54:47Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

-      bucketSpec: Option[BucketSpec],
-      properties: Map[String, String],
-      provider: String,
+  private def getStorageFormatAndProvider(


This method closely follow the original logic of creating hive table in SparkSqlAstBuilder

cloud-fan · 2020-11-16T20:09:23Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

+      // The parser guarantees that USING and STORED AS/ROW FORMAT won't co-exist.
+      assert(maybeSerdeInfo.isEmpty)
+      nonHiveStorageFormat -> provider.get
+    } else if (maybeSerdeInfo.isDefined) {


The logic here is roughly merging 3 serde infos: the one user-specified, the one inferred from STORED AS, and the default one.

cloud-fan · 2020-11-16T20:11:27Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

@@ -363,6 +363,37 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
    }
  }

+  private def toStorageFormat(


We can get rid of duplicated code here once we move CREATE TABLE LIKE and INSERT DIRECTORY to v2 command.

cloud-fan · 2020-11-16T20:14:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

-    val serdeInfo =
-      (fileFormatSerdeInfo ++ rowFormatSerdeInfo).reduceLeftOption((x, y) => x.merge(y))
-
+    val serdeInfo = getSerdeInfo(Seq(ctx.rowFormat), Seq(ctx.createFileFormat), ctx)
    val path = string(ctx.path)
    // The path field is required
    if (path.isEmpty) {
      operationNotAllowed("INSERT OVERWRITE DIRECTORY must be accompanied by path", ctx)
    }

    val default = HiveSerDe.getDefaultStorage(conf)


I don't know why INSERT DIRECTORY considers the default serde info but CREATE TABLE LIKE does not. I'm going to fix it later and keep the behavior unchanged here.

rdblue

@cloud-fan, it mostly looks good but I did find a few things to change and at least one bug from using _._1 instead of _._2. Let me know when you've had time to update this.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

rdblue · 2020-11-20T01:25:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

-      operationNotAllowed("REPLACE ... IF NOT EXISTS, use CREATE IF NOT EXISTS instead", ctx)
-    }
-
+    assert(!temp && !ifNotExists && !external)


I think it is bad practice for a method to assume it will get certain results from other methods. While using an assert handles correctness, if there is a change that violates the assertion, a user would get a nearly unusable error.

I think this change should be reverted so that the error messages are meaningful.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

rdblue · 2020-11-20T02:00:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

+              serde = serdeInfo.serde.orElse(hiveSerde.serde),
+              properties = serdeInfo.serdeProperties)
+          case _ =>
+            operationNotAllowed(s"STORED AS with file format '${serdeInfo.storedAs.get}'", ctx)


This is okay, but my intent was to avoid mixing parsing logic and validation where possible. The parser should return what was parsed to Spark, which should decide whether it is supported.

I don't think this is a blocker because it is in the SparkSqlParser instead of the one in catalyst. We can fix this when moving these commands to v2.

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

rdblue · 2020-11-20T02:56:57Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

-          CreateTable(tableDesc, mode, None)
-
-        case None =>
+      val (storageFormat, provider) = getStorageFormatAndProvider(


Looks fine to me. I like that it should have fewer changes from master.

cloud-fan · 2020-11-20T04:29:36Z

@rdblue thanks for the suggestions! Pushed a commit to update it, please take another look, thanks!

rdblue · 2020-11-20T17:29:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

-        "CREATE OR REPLACE TEMPORARY TABLE ..., use CREATE TEMPORARY VIEW instead",
-        ctx)
+      operationNotAllowed("CREATE OR REPLACE TEMPORARY TABLE is not supported yet. " +
+        "Please use CREATE OR REPLACE TEMPORARY VIEW as an alternative.", ctx)


I think this should use the original error message, "CREATE OR REPLACE TEMPORARY TABLE ..., use CREATE TEMPORARY VIEW instead".

Using operationNotAllowed means that the message is prefixed with "Operation not allowed: ", so adding "is not supported yet." is not helpful and just makes the message harder to read. In addition, "yet" implies that this will be supported and there is no reason to do so.

rdblue · 2020-11-20T17:31:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

-        .getOrElse(StructType(partCols))
+    if (temp) {
+      operationNotAllowed("CREATE TEMPORARY TABLE is not supported yet. " +
+        "Please use CREATE TEMPORARY VIEW as an alternative.", ctx)


This should use the other error message, "CREATE TEMPORARY TABLE ... AS ..., use CREATE TEMPORARY VIEW instead".

As I noted on the similar case, "is not supported yet" is both redundant and misleading. I don't think that Spark intends to implement CREATE TEMPORARY TABLE. Even if it may be implemented, it has not been supported for years, so there is no value in implying that it will be supported.

Please update to use the simpler and clearer error mesage.

rdblue

I think the only things that need to be fixed are the error messages that were changed and are now longer and less clear.

rdblue · 2020-11-23T22:41:13Z

Thanks, @cloud-fan. I've merged this.

cloud-fan commented Nov 16, 2020

View reviewed changes

cloud-fan force-pushed the help branch from 2bbcd16 to 1162a36 Compare November 16, 2020 19:51

cloud-fan commented Nov 16, 2020

View reviewed changes

cloud-fan force-pushed the help branch from 1162a36 to 12cb7b7 Compare November 16, 2020 20:06

cloud-fan commented Nov 16, 2020

View reviewed changes

address review comments

119260f

cloud-fan force-pushed the help branch from 12cb7b7 to 119260f Compare November 16, 2020 20:10

cloud-fan commented Nov 16, 2020

View reviewed changes

cloud-fan mentioned this pull request Nov 16, 2020

[SPARK-31257][SPARK-33561][SQL] Unify create table syntax apache/spark#28026

Closed

rdblue requested changes Nov 20, 2020

View reviewed changes

update

20d22be

rdblue reviewed Nov 20, 2020

View reviewed changes

rdblue requested changes Nov 20, 2020

View reviewed changes

improve error message

4d1b37a

rdblue merged this pull request into rdblue:unify-create-table Nov 23, 2020

rdblue pushed a commit that referenced this pull request Nov 23, 2020

address review comments (#8)

8ca5942

address review comments #8

address review comments #8

Uh oh!

Conversation

cloud-fan commented Nov 16, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 20, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue left a comment

Choose a reason for hiding this comment

Uh oh!

rdblue commented Nov 23, 2020

Uh oh!

Uh oh!

cloud-fan Nov 16, 2020 •

edited

Loading