-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-4574][SQL] Adding support for defining schema in foreign DDL commands. #3431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #23787 has started for PR 3431 at commit
|
Test build #23787 has finished for PR 3431 at commit
|
Test PASSed. |
Test build #24044 has started for PR 3431 at commit
|
Test build #24044 has finished for PR 3431 at commit
|
Test PASSed. |
Cool feature :) I wanted to include this in the first cut but ran out of time. It's too late for 1.2 but I'll try and review this soon. |
Test build #24396 has started for PR 3431 at commit
|
Test build #24396 has finished for PR 3431 at commit
|
Test PASSed. |
import org.apache.spark.sql.sources._ | ||
|
||
private[sql] class DefaultSource extends RelationProvider { | ||
/** Returns a new base relation with the given parameters. */ | ||
override def createRelation( | ||
sqlContext: SQLContext, | ||
parameters: Map[String, String]): BaseRelation = { | ||
parameters: Map[String, String], | ||
schema: Option[StructType]): BaseRelation = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot change the function signature, otherwise we will break existing libraries. Instead I think we need to create a new interface SchemaRelationProvider
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or using a default value for schema: schema: Option[StructType] = None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default values do not preserve binary compatibility, only source compatibility.
Thanks for working on this! I know a couple of people who want to use this in data sources they are writing. |
ping. any progress here? |
Hi @marmbrus, still working on this, tomorrow i will update this. |
Test build #24897 has started for PR 3431 at commit
|
Test build #24897 has finished for PR 3431 at commit
|
Test FAILed. |
Test build #24898 has started for PR 3431 at commit
|
Test build #24898 has finished for PR 3431 at commit
|
Test FAILed. |
retest this please |
Test build #25283 has started for PR 3431 at commit
|
protected def cleanIdentifier(ident: String): String = ident match { | ||
case escapedIdentifier(i) => i | ||
case plainIdent => plainIdent | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems when we use ident
, the parser will automatically take care backticks. We can remove it. I am sorry I just noticed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you:)
Test build #25287 has started for PR 3431 at commit
|
Test build #25283 has finished for PR 3431 at commit
|
Test PASSed. |
Test build #25287 has finished for PR 3431 at commit
|
Test PASSed. |
def createRelation( | ||
sqlContext: SQLContext, | ||
parameters: Map[String, String], | ||
schema: Option[StructType]): BaseRelation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an option? we have two traits and option is not very friendly to java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial idea is to compatible with the old traits, since we will have two traits i will fix this.
@scwf I have done it and will have a PR to your branch. |
Remove Option from createRelation.
ok, merged! |
Test build #25354 has started for PR 3431 at commit
|
Test build #25354 has finished for PR 3431 at commit
|
Test PASSed. |
Thanks for working on this guys! Merging to master. @yhuai can you clarify the difference between SchemaRelationProvider and RelationProvider in the scala doc in your next PR? |
Yeah, no problem. |
With changes in this PR, users can persist metadata of tables created based on the data source API in metastore through DDLs. Author: Yin Huai <[email protected]> Author: Michael Armbrust <[email protected]> Closes #3960 from yhuai/persistantTablesWithSchema2 and squashes the following commits: 069c235 [Yin Huai] Make exception messages user friendly. c07cbc6 [Yin Huai] Get the location of test file in a correct way. 4456e98 [Yin Huai] Test data. 5315dfc [Yin Huai] rxin's comments. 7fc4b56 [Yin Huai] Add DDLStrategy and HiveDDLStrategy to plan DDLs based on the data source API. aeaf4b3 [Yin Huai] Add comments. 06f9b0c [Yin Huai] Revert unnecessary changes. feb88aa [Yin Huai] Merge remote-tracking branch 'apache/master' into persistantTablesWithSchema2 172db80 [Yin Huai] Fix unit test. 49bf1ac [Yin Huai] Unit tests. 8f8f1a1 [Yin Huai] [SPARK-4574][SQL] Adding support for defining schema in foreign DDL commands. #3431 f47fda1 [Yin Huai] Unit tests. 2b59723 [Michael Armbrust] Set external when creating tables c00bb1b [Michael Armbrust] Don't use reflection to read options 1ea6e7b [Michael Armbrust] Don't fail when trying to uncache a table that doesn't exist 6edc710 [Michael Armbrust] Add tests. d7da491 [Michael Armbrust] First draft of persistent tables.
Adding support for defining schema in foreign DDL commands. Now foreign DDL support commands like:
With this PR user can define schema instead of infer from file, so support ddl command as follows: