-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api #3960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #25277 has started for PR 3960 at commit
|
Test build #25277 has finished for PR 3960 at commit
|
Test FAILed. |
Test build #25282 has started for PR 3960 at commit
|
Test build #25282 has finished for PR 3960 at commit
|
Test PASSed. |
…hSchema2 Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala sql/core/src/test/scala/org/apache/spark/sql/sources/TableScanSuite.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
Test build #25372 has started for PR 3960 at commit
|
Test build #25372 has finished for PR 3960 at commit
|
Test FAILed. |
Test build #25387 has started for PR 3960 at commit
|
Test build #25387 has finished for PR 3960 at commit
|
Test PASSed. |
@@ -50,8 +52,75 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with | |||
/** Connection to hive metastore. Usages should lock on `this`. */ | |||
protected[hive] val client = Hive.get(hive.hiveconf) | |||
|
|||
// TODO: Use this everywhere instead of tuples or databaseName, tableName,. | |||
/** A fully qualified identifier for a table (i.e., database.tableName) */ | |||
case class TableIdent(database: String, name: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about QualifiedTable?
Test build #25429 has finished for PR 3960 at commit
|
Test PASSed. |
Test build #25435 has started for PR 3960 at commit
|
|
||
checkAnswer( | ||
sql("SELECT * FROM jsonTable"), | ||
jsonFile("src/test/resources/sample.json").collect().toSeq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maven and sbt use different paths for running tests. I am not sure specifying path this way will work. You probably need to use getClass.getResources to get the absolute path.
Test build #25438 has started for PR 3960 at commit
|
Test build #25435 has finished for PR 3960 at commit
|
Test PASSed. |
Test build #25438 has finished for PR 3960 at commit
|
Test FAILed. |
@@ -310,4 +311,17 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { | |||
case _ => Nil | |||
} | |||
} | |||
|
|||
object DDLStrategy extends Strategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid make this strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was original in CommandStrategy
. I was trying to find a good place for these, but I did not find a suitable Strategy. Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scwf Actually, I think that it is better to put all rules for the data data source API in the same place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhuai, i mean since CreateTableUsing and CreateTempTableUsing is command, we'd better make it follow strategy:
object BasicOperators extends Strategy {
def numPartitions = self.numPartitions
def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
case r: RunnableCommand => ExecutedCommand(r) :: Nil
i will try for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I am not sure we should put them in BasicOperators. We cannot just create a RunnableCommand
in ddl.scala
since SQLContext
does not allow persistent table and we need to throw the error in SparkStrategies
. Also, I feel code is clear when we put stuff related to the data source API together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhuai,i write a draft version for this, can you have a look(https://github.com/scwf/spark/compare/apache:master...scwf:createDataSourceTable?expand=1)
why we put case r: RunnableCommand => ExecutedCommand(r)
in BasicOperators is because we no need make a new strategy for only one rule.
And after we refactor command implementation in spark sql, we should make the newly added command follow RunnableCommand
if possible, then we can avoid adding new strategy for newly added command.
/cc @marmbrus
Jenkins, retest this please. |
Test build #25456 has started for PR 3960 at commit
|
Test build #25456 has finished for PR 3960 at commit
|
Test PASSed. |
Test build #25482 has started for PR 3960 at commit
|
@@ -50,8 +52,76 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with | |||
/** Connection to hive metastore. Usages should lock on `this`. */ | |||
protected[hive] val client = Hive.get(hive.hiveconf) | |||
|
|||
// TODO: Use this everywhere instead of tuples or databaseName, tableName,. | |||
/** A fully qualified identifier for a table (i.e., database.tableName) */ | |||
case class QualifiedTableName(database: String, name: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really match the rest of the API any more now that we have the concept of a tableIdentifier
. We can fix this in a followup PR.
LGTM once tests pass. |
Test build #25482 has finished for PR 3960 at commit
|
Test PASSed. |
With changes in this PR, users can persist metadata of tables created based on the data source API in metastore through DDLs.