Skip to content

address review comments #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 23, 2020
Merged

Conversation

cloud-fan
Copy link

No description provided.

val serdeInfo =
(fileFormatSerdeInfo ++ rowFormatSerdeInfo).reduceLeftOption((x, y) => x.merge(y))

val serdeInfo = getSerdeInfo(ctx.rowFormat.asScala, ctx.createFileFormat.asScala, ctx)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a method for it, so that we can reuse it in SparkSqlAstBuilder

operationNotAllowed("REPLACE ... IF NOT EXISTS, use CREATE IF NOT EXISTS instead", ctx)
}

assert(!temp && !ifNotExists && !external)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visitReplaceTableHeader simply return 3 false for these 3 properties. So it's simpler to use assert here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is bad practice for a method to assume it will get certain results from other methods. While using an assert handles correctness, if there is a change that violates the assertion, a user would get a nearly unusable error.

I think this change should be reverted so that the error messages are meaningful.


case Some(query) =>
ReplaceTableAsSelectStatement(table, query, partitioning, bucketSpec, properties,
provider, options, location, comment, writeOptions = Map.empty, serdeInfo,
orCreate = orCreate)

case _ =>
ReplaceTableStatement(table, schema.getOrElse(new StructType), partitioning,
Copy link
Author

@cloud-fan cloud-fan Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when table columns list is not specified, we ignore the partition columns with data type. It was fine before syntax merging, as there was no partition columns with data type in REPLACE TABLE. But now it's better to make it consistent with CREATE TABLE. I also added test to check it: https://github.com/rdblue/spark/pull/8/files#diff-b9e91f767e5562861565b0ce78759af3bcb7fff405a81e928894641147db2ae4R293

@@ -63,11 +63,11 @@ case class SerdeInfo(
serdeProperties: Map[String, String] = Map.empty) {
// this uses assertions because validation is done in validateRowFormatFileFormat etc.
assert(storedAs.isEmpty || formatClasses.isEmpty,
s"Conflicting STORED AS $storedAs and INPUTFORMAT/OUTPUTFORMAT $formatClasses values")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit weird to print scala Option directly.

@@ -85,7 +85,7 @@ case class SerdeInfo(
def merge(other: SerdeInfo): SerdeInfo = {
def getOnly[T](desc: String, left: Option[T], right: Option[T]): Option[T] = {
(left, right) match {
case (Some(l), Some(r)) if l != r =>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise the assert below is useless.

CreateTable(tableDesc, mode, None)

case None =>
val (storageFormat, provider) = getStorageFormatAndProvider(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some refactoring, to avoid duplicated code between CREATE TABLE and CTAS, buildCatalogTable and buildHiveCatalogTable

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. I like that it should have fewer changes from master.

bucketSpec: Option[BucketSpec],
properties: Map[String, String],
provider: String,
private def getStorageFormatAndProvider(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method closely follow the original logic of creating hive table in SparkSqlAstBuilder

// The parser guarantees that USING and STORED AS/ROW FORMAT won't co-exist.
assert(maybeSerdeInfo.isEmpty)
nonHiveStorageFormat -> provider.get
} else if (maybeSerdeInfo.isDefined) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is roughly merging 3 serde infos: the one user-specified, the one inferred from STORED AS, and the default one.

@@ -363,6 +363,37 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
}
}

private def toStorageFormat(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of duplicated code here once we move CREATE TABLE LIKE and INSERT DIRECTORY to v2 command.

val serdeInfo =
(fileFormatSerdeInfo ++ rowFormatSerdeInfo).reduceLeftOption((x, y) => x.merge(y))

val serdeInfo = getSerdeInfo(Seq(ctx.rowFormat), Seq(ctx.createFileFormat), ctx)
val path = string(ctx.path)
// The path field is required
if (path.isEmpty) {
operationNotAllowed("INSERT OVERWRITE DIRECTORY must be accompanied by path", ctx)
}

val default = HiveSerDe.getDefaultStorage(conf)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why INSERT DIRECTORY considers the default serde info but CREATE TABLE LIKE does not. I'm going to fix it later and keep the behavior unchanged here.

Copy link
Owner

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, it mostly looks good but I did find a few things to change and at least one bug from using _._1 instead of _._2. Let me know when you've had time to update this.

operationNotAllowed("REPLACE ... IF NOT EXISTS, use CREATE IF NOT EXISTS instead", ctx)
}

assert(!temp && !ifNotExists && !external)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is bad practice for a method to assume it will get certain results from other methods. While using an assert handles correctness, if there is a change that violates the assertion, a user would get a nearly unusable error.

I think this change should be reverted so that the error messages are meaningful.

serde = serdeInfo.serde.orElse(hiveSerde.serde),
properties = serdeInfo.serdeProperties)
case _ =>
operationNotAllowed(s"STORED AS with file format '${serdeInfo.storedAs.get}'", ctx)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay, but my intent was to avoid mixing parsing logic and validation where possible. The parser should return what was parsed to Spark, which should decide whether it is supported.

I don't think this is a blocker because it is in the SparkSqlParser instead of the one in catalyst. We can fix this when moving these commands to v2.

CreateTable(tableDesc, mode, None)

case None =>
val (storageFormat, provider) = getStorageFormatAndProvider(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. I like that it should have fewer changes from master.

@cloud-fan
Copy link
Author

@rdblue thanks for the suggestions! Pushed a commit to update it, please take another look, thanks!

"CREATE OR REPLACE TEMPORARY TABLE ..., use CREATE TEMPORARY VIEW instead",
ctx)
operationNotAllowed("CREATE OR REPLACE TEMPORARY TABLE is not supported yet. " +
"Please use CREATE OR REPLACE TEMPORARY VIEW as an alternative.", ctx)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use the original error message, "CREATE OR REPLACE TEMPORARY TABLE ..., use CREATE TEMPORARY VIEW instead".

Using operationNotAllowed means that the message is prefixed with "Operation not allowed: ", so adding "is not supported yet." is not helpful and just makes the message harder to read. In addition, "yet" implies that this will be supported and there is no reason to do so.

.getOrElse(StructType(partCols))
if (temp) {
operationNotAllowed("CREATE TEMPORARY TABLE is not supported yet. " +
"Please use CREATE TEMPORARY VIEW as an alternative.", ctx)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the other error message, "CREATE TEMPORARY TABLE ... AS ..., use CREATE TEMPORARY VIEW instead".

As I noted on the similar case, "is not supported yet" is both redundant and misleading. I don't think that Spark intends to implement CREATE TEMPORARY TABLE. Even if it may be implemented, it has not been supported for years, so there is no value in implying that it will be supported.

Please update to use the simpler and clearer error mesage.

Copy link
Owner

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only things that need to be fixed are the error messages that were changed and are now longer and less clear.

@rdblue rdblue merged this pull request into rdblue:unify-create-table Nov 23, 2020
@rdblue
Copy link
Owner

rdblue commented Nov 23, 2020

Thanks, @cloud-fan. I've merged this.

rdblue pushed a commit that referenced this pull request Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants