[SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api #3960

yhuai · 2015-01-08T23:54:17Z

With changes in this PR, users can persist metadata of tables created based on the data source API in metastore through DDLs.

…ommands. #3431

SparkQA · 2015-01-08T23:57:36Z

Test build #25277 has started for PR 3960 at commit 49bf1ac.

This patch merges cleanly.

SparkQA · 2015-01-09T00:38:49Z

Test build #25277 has finished for PR 3960 at commit 49bf1ac.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class DefaultSource extends SchemaRelationProvider
- case class ParquetRelation2(
- trait SchemaRelationProvider
- case class TableIdent(database: String, name: String)
- case class CreateMetastoreDataSource(

AmplabJenkins · 2015-01-09T00:38:52Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25277/
Test FAILed.

SparkQA · 2015-01-09T01:22:36Z

Test build #25282 has started for PR 3960 at commit 172db80.

This patch merges cleanly.

SparkQA · 2015-01-09T02:30:39Z

Test build #25282 has finished for PR 3960 at commit 172db80.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class DefaultSource extends SchemaRelationProvider
- case class ParquetRelation2(
- trait SchemaRelationProvider
- case class TableIdent(database: String, name: String)
- case class CreateMetastoreDataSource(

AmplabJenkins · 2015-01-09T02:30:42Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25282/
Test PASSed.

…hSchema2 Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala sql/core/src/test/scala/org/apache/spark/sql/sources/TableScanSuite.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

SparkQA · 2015-01-11T04:37:37Z

Test build #25372 has started for PR 3960 at commit feb88aa.

This patch merges cleanly.

SparkQA · 2015-01-11T04:40:40Z

Test build #25372 has finished for PR 3960 at commit feb88aa.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class DefaultSource extends SchemaRelationProvider
- case class ParquetRelation2(
- case class TableIdent(database: String, name: String)
- case class CreateMetastoreDataSource(

AmplabJenkins · 2015-01-11T04:40:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25372/
Test FAILed.

…urce API.

SparkQA · 2015-01-12T01:12:40Z

Test build #25387 has started for PR 3960 at commit 7fc4b56.

This patch merges cleanly.

SparkQA · 2015-01-12T02:18:54Z

Test build #25387 has finished for PR 3960 at commit 7fc4b56.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class TableIdent(database: String, name: String)
- case class CreateMetastoreDataSource(

AmplabJenkins · 2015-01-12T02:18:58Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25387/
Test PASSed.

rxin · 2015-01-12T20:38:05Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

@@ -50,8 +52,75 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
  /** Connection to hive metastore.  Usages should lock on `this`. */
  protected[hive] val client = Hive.get(hive.hiveconf)

+  // TODO: Use this everywhere instead of tuples or databaseName, tableName,.
+  /** A fully qualified identifier for a table (i.e., database.tableName) */
+  case class TableIdent(database: String, name: String) {


how about QualifiedTable?

SparkQA · 2015-01-12T22:11:15Z

Test build #25429 has finished for PR 3960 at commit 5315dfc.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class QualifiedTableName(database: String, name: String)
- case class CreateMetastoreDataSource(

AmplabJenkins · 2015-01-12T22:11:18Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25429/
Test PASSed.

SparkQA · 2015-01-12T23:42:47Z

Test build #25435 has started for PR 3960 at commit 4456e98.

This patch merges cleanly.

rxin · 2015-01-13T00:17:59Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala

+
+    checkAnswer(
+      sql("SELECT * FROM jsonTable"),
+      jsonFile("src/test/resources/sample.json").collect().toSeq)


maven and sbt use different paths for running tests. I am not sure specifying path this way will work. You probably need to use getClass.getResources to get the absolute path.

SparkQA · 2015-01-13T00:42:31Z

Test build #25438 has started for PR 3960 at commit c07cbc6.

This patch merges cleanly.

SparkQA · 2015-01-13T00:47:33Z

Test build #25435 has finished for PR 3960 at commit 4456e98.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-13T00:47:35Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25435/
Test PASSed.

SparkQA · 2015-01-13T01:35:32Z

Test build #25438 has finished for PR 3960 at commit c07cbc6.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-13T01:35:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25438/
Test FAILed.

scwf · 2015-01-13T02:38:14Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

@@ -310,4 +311,17 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
      case _ => Nil
    }
  }
+
+  object DDLStrategy extends Strategy {


can we avoid make this strategy

It was original in CommandStrategy. I was trying to find a good place for these, but I did not find a suitable Strategy. Any suggestion?

@scwf Actually, I think that it is better to put all rules for the data data source API in the same place.

@yhuai, i mean since CreateTableUsing and CreateTempTableUsing is command, we'd better make it follow strategy:

object BasicOperators extends Strategy { def numPartitions = self.numPartitions def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { case r: RunnableCommand => ExecutedCommand(r) :: Nil

i will try for this

Actually, I am not sure we should put them in BasicOperators. We cannot just create a RunnableCommand in ddl.scala since SQLContext does not allow persistent table and we need to throw the error in SparkStrategies. Also, I feel code is clear when we put stuff related to the data source API together.

@yhuai,i write a draft version for this, can you have a look(https://github.com/scwf/spark/compare/apache:master...scwf:createDataSourceTable?expand=1)

why we put case r: RunnableCommand => ExecutedCommand(r) in BasicOperators is because we no need make a new strategy for only one rule.

And after we refactor command implementation in spark sql, we should make the newly added command follow RunnableCommand if possible, then we can avoid adding new strategy for newly added command.

/cc @marmbrus

rxin · 2015-01-13T04:48:20Z

Jenkins, retest this please.

SparkQA · 2015-01-13T04:52:41Z

Test build #25456 has started for PR 3960 at commit c07cbc6.

This patch merges cleanly.

SparkQA · 2015-01-13T05:59:34Z

Test build #25456 has finished for PR 3960 at commit c07cbc6.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class QualifiedTableName(database: String, name: String)
- case class CreateMetastoreDataSource(

AmplabJenkins · 2015-01-13T05:59:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25456/
Test PASSed.

SparkQA · 2015-01-13T19:52:38Z

Test build #25482 has started for PR 3960 at commit 069c235.

This patch merges cleanly.

marmbrus · 2015-01-13T20:44:05Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

@@ -50,8 +52,76 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
  /** Connection to hive metastore.  Usages should lock on `this`. */
  protected[hive] val client = Hive.get(hive.hiveconf)

+  // TODO: Use this everywhere instead of tuples or databaseName, tableName,.
+  /** A fully qualified identifier for a table (i.e., database.tableName) */
+  case class QualifiedTableName(database: String, name: String) {


This doesn't really match the rest of the API any more now that we have the concept of a tableIdentifier. We can fix this in a followup PR.

marmbrus · 2015-01-13T20:45:52Z

LGTM once tests pass.

SparkQA · 2015-01-13T21:00:13Z

Test build #25482 has finished for PR 3960 at commit 069c235.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class QualifiedTableName(database: String, name: String)
- case class CreateMetastoreDataSource(

AmplabJenkins · 2015-01-13T21:00:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25482/
Test PASSed.

marmbrus and others added 8 commits January 8, 2015 13:38

First draft of persistent tables.

d7da491

Add tests.

6edc710

Don't fail when trying to uncache a table that doesn't exist

1ea6e7b

Don't use reflection to read options

c00bb1b

Set external when creating tables

2b59723

Unit tests.

f47fda1

[SPARK-4574][SQL] Adding support for defining schema in foreign DDL c…

8f8f1a1

…ommands. #3431

Unit tests.

49bf1ac

Fix unit test.

172db80

yhuai added 3 commits January 10, 2015 20:59

Revert unnecessary changes.

06f9b0c

Add comments.

aeaf4b3

Add DDLStrategy and HiveDDLStrategy to plan DDLs based on the data so…

7fc4b56

…urce API.

yhuai changed the title ~~[WIP][SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api~~ [SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api Jan 12, 2015

rxin reviewed Jan 12, 2015
View reviewed changes

Test data.

4456e98

rxin reviewed Jan 13, 2015
View reviewed changes

Get the location of test file in a correct way.

c07cbc6

scwf reviewed Jan 13, 2015
View reviewed changes

Make exception messages user friendly.

069c235

marmbrus reviewed Jan 13, 2015
View reviewed changes

asfgit closed this in 6463e0b Jan 13, 2015

This was referenced Jan 14, 2015

[SPARK-5240][SQL] Adding createDataSourceTable interface to Catalog #4036

Closed

[SPARK-5251][SQL] Using tableIdentifier in hive metastore #4045

Closed

[SPARK-4943][SPARK-5251][SQL] Allow table name having dot for db/catalog #4062

Closed

yhuai mentioned this pull request Jan 21, 2015

[SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api #3752

Closed

[SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api #3960

[SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api #3960

Uh oh!

Conversation

yhuai commented Jan 8, 2015

Uh oh!

SparkQA commented Jan 8, 2015

Uh oh!

SparkQA commented Jan 9, 2015

Uh oh!

AmplabJenkins commented Jan 9, 2015

Uh oh!

SparkQA commented Jan 9, 2015

Uh oh!

SparkQA commented Jan 9, 2015

Uh oh!

AmplabJenkins commented Jan 9, 2015

Uh oh!

SparkQA commented Jan 11, 2015

Uh oh!

SparkQA commented Jan 11, 2015

Uh oh!

AmplabJenkins commented Jan 11, 2015

Uh oh!

SparkQA commented Jan 12, 2015

Uh oh!

SparkQA commented Jan 12, 2015

Uh oh!

AmplabJenkins commented Jan 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 12, 2015

Uh oh!

AmplabJenkins commented Jan 12, 2015

Uh oh!

SparkQA commented Jan 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 13, 2015

Uh oh!

SparkQA commented Jan 13, 2015

Uh oh!

AmplabJenkins commented Jan 13, 2015

Uh oh!

SparkQA commented Jan 13, 2015

Uh oh!

AmplabJenkins commented Jan 13, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Jan 13, 2015

Uh oh!

SparkQA commented Jan 13, 2015

Uh oh!

SparkQA commented Jan 13, 2015

Uh oh!

AmplabJenkins commented Jan 13, 2015

Uh oh!

SparkQA commented Jan 13, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Jan 13, 2015

Uh oh!

SparkQA commented Jan 13, 2015

Uh oh!

AmplabJenkins commented Jan 13, 2015

Uh oh!