-
Notifications
You must be signed in to change notification settings - Fork 0
address review comments #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
val serdeInfo = | ||
(fileFormatSerdeInfo ++ rowFormatSerdeInfo).reduceLeftOption((x, y) => x.merge(y)) | ||
|
||
val serdeInfo = getSerdeInfo(ctx.rowFormat.asScala, ctx.createFileFormat.asScala, ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a method for it, so that we can reuse it in SparkSqlAstBuilder
operationNotAllowed("REPLACE ... IF NOT EXISTS, use CREATE IF NOT EXISTS instead", ctx) | ||
} | ||
|
||
assert(!temp && !ifNotExists && !external) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visitReplaceTableHeader
simply return 3 false
for these 3 properties. So it's simpler to use assert here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is bad practice for a method to assume it will get certain results from other methods. While using an assert handles correctness, if there is a change that violates the assertion, a user would get a nearly unusable error.
I think this change should be reverted so that the error messages are meaningful.
|
||
case Some(query) => | ||
ReplaceTableAsSelectStatement(table, query, partitioning, bucketSpec, properties, | ||
provider, options, location, comment, writeOptions = Map.empty, serdeInfo, | ||
orCreate = orCreate) | ||
|
||
case _ => | ||
ReplaceTableStatement(table, schema.getOrElse(new StructType), partitioning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, when table columns list is not specified, we ignore the partition columns with data type. It was fine before syntax merging, as there was no partition columns with data type in REPLACE TABLE. But now it's better to make it consistent with CREATE TABLE. I also added test to check it: https://github.com/rdblue/spark/pull/8/files#diff-b9e91f767e5562861565b0ce78759af3bcb7fff405a81e928894641147db2ae4R293
@@ -63,11 +63,11 @@ case class SerdeInfo( | |||
serdeProperties: Map[String, String] = Map.empty) { | |||
// this uses assertions because validation is done in validateRowFormatFileFormat etc. | |||
assert(storedAs.isEmpty || formatClasses.isEmpty, | |||
s"Conflicting STORED AS $storedAs and INPUTFORMAT/OUTPUTFORMAT $formatClasses values") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit weird to print scala Option directly.
@@ -85,7 +85,7 @@ case class SerdeInfo( | |||
def merge(other: SerdeInfo): SerdeInfo = { | |||
def getOnly[T](desc: String, left: Option[T], right: Option[T]): Option[T] = { | |||
(left, right) match { | |||
case (Some(l), Some(r)) if l != r => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise the assert below is useless.
CreateTable(tableDesc, mode, None) | ||
|
||
case None => | ||
val (storageFormat, provider) = getStorageFormatAndProvider( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some refactoring, to avoid duplicated code between CREATE TABLE and CTAS, buildCatalogTable
and buildHiveCatalogTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. I like that it should have fewer changes from master.
bucketSpec: Option[BucketSpec], | ||
properties: Map[String, String], | ||
provider: String, | ||
private def getStorageFormatAndProvider( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method closely follow the original logic of creating hive table in SparkSqlAstBuilder
// The parser guarantees that USING and STORED AS/ROW FORMAT won't co-exist. | ||
assert(maybeSerdeInfo.isEmpty) | ||
nonHiveStorageFormat -> provider.get | ||
} else if (maybeSerdeInfo.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is roughly merging 3 serde infos: the one user-specified, the one inferred from STORED AS, and the default one.
@@ -363,6 +363,37 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { | |||
} | |||
} | |||
|
|||
private def toStorageFormat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of duplicated code here once we move CREATE TABLE LIKE and INSERT DIRECTORY to v2 command.
val serdeInfo = | ||
(fileFormatSerdeInfo ++ rowFormatSerdeInfo).reduceLeftOption((x, y) => x.merge(y)) | ||
|
||
val serdeInfo = getSerdeInfo(Seq(ctx.rowFormat), Seq(ctx.createFileFormat), ctx) | ||
val path = string(ctx.path) | ||
// The path field is required | ||
if (path.isEmpty) { | ||
operationNotAllowed("INSERT OVERWRITE DIRECTORY must be accompanied by path", ctx) | ||
} | ||
|
||
val default = HiveSerDe.getDefaultStorage(conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why INSERT DIRECTORY considers the default serde info but CREATE TABLE LIKE does not. I'm going to fix it later and keep the behavior unchanged here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan, it mostly looks good but I did find a few things to change and at least one bug from using _._1
instead of _._2
. Let me know when you've had time to update this.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
operationNotAllowed("REPLACE ... IF NOT EXISTS, use CREATE IF NOT EXISTS instead", ctx) | ||
} | ||
|
||
assert(!temp && !ifNotExists && !external) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is bad practice for a method to assume it will get certain results from other methods. While using an assert handles correctness, if there is a change that violates the assertion, a user would get a nearly unusable error.
I think this change should be reverted so that the error messages are meaningful.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Outdated
Show resolved
Hide resolved
serde = serdeInfo.serde.orElse(hiveSerde.serde), | ||
properties = serdeInfo.serdeProperties) | ||
case _ => | ||
operationNotAllowed(s"STORED AS with file format '${serdeInfo.storedAs.get}'", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is okay, but my intent was to avoid mixing parsing logic and validation where possible. The parser should return what was parsed to Spark, which should decide whether it is supported.
I don't think this is a blocker because it is in the SparkSqlParser instead of the one in catalyst. We can fix this when moving these commands to v2.
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
CreateTable(tableDesc, mode, None) | ||
|
||
case None => | ||
val (storageFormat, provider) = getStorageFormatAndProvider( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. I like that it should have fewer changes from master.
@rdblue thanks for the suggestions! Pushed a commit to update it, please take another look, thanks! |
"CREATE OR REPLACE TEMPORARY TABLE ..., use CREATE TEMPORARY VIEW instead", | ||
ctx) | ||
operationNotAllowed("CREATE OR REPLACE TEMPORARY TABLE is not supported yet. " + | ||
"Please use CREATE OR REPLACE TEMPORARY VIEW as an alternative.", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should use the original error message, "CREATE OR REPLACE TEMPORARY TABLE ..., use CREATE TEMPORARY VIEW instead".
Using operationNotAllowed
means that the message is prefixed with "Operation not allowed: ", so adding "is not supported yet." is not helpful and just makes the message harder to read. In addition, "yet" implies that this will be supported and there is no reason to do so.
.getOrElse(StructType(partCols)) | ||
if (temp) { | ||
operationNotAllowed("CREATE TEMPORARY TABLE is not supported yet. " + | ||
"Please use CREATE TEMPORARY VIEW as an alternative.", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the other error message, "CREATE TEMPORARY TABLE ... AS ..., use CREATE TEMPORARY VIEW instead".
As I noted on the similar case, "is not supported yet" is both redundant and misleading. I don't think that Spark intends to implement CREATE TEMPORARY TABLE
. Even if it may be implemented, it has not been supported for years, so there is no value in implying that it will be supported.
Please update to use the simpler and clearer error mesage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only things that need to be fixed are the error messages that were changed and are now longer and less clear.
Thanks, @cloud-fan. I've merged this. |
No description provided.