Skip to content

[SPARK-6888][SQL] Export driver quirks #5498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

rtreffer
Copy link
Contributor

Make it possible to (temporary) overwrite the driver quirks. This
can be used to overcome problems with specific schemas or to
add new jdbc driver support on the fly.

A very simple implementation to dump the loading can be done like this (spark-shell)

class DumpQuirk extends org.apache.spark.sql.jdbc.DriverQuirks {
  def canHandle(url : String): Boolean = true
  def getCatalystType(sqlType: Int, typeName: String, size: Int, md: org.apache.spark.sql.types.MetadataBuilder): org.apache.spark.sql.types.DataType = {
    println("" + (sqlType, typeName, size, md))
    null
  }
  def getJDBCType(dt: org.apache.spark.sql.types.DataType): (String, Option[Int]) = (null, None)
}
org.apache.spark.sql.jdbc.DriverQuirks.registerQuirks(new DumpQuirk())

Not that this pull request is against 1.3 - I could not create a distribution with the current master.

@srowen
Copy link
Member

srowen commented Apr 13, 2015

@rtreffer
Copy link
Contributor Author

Added a ticket: https://issues.apache.org/jira/browse/SPARK-6888
Will add that to the commit after some sleep .zZzZzZ

Make it possible to (temporary) overwrite the driver quirks. This
can be used to overcome problems with specific schemas or to
add new jdbc driver support on the fly.
@rtreffer rtreffer force-pushed the export-driver-quirks branch from dca9372 to 9ca66d9 Compare April 14, 2015 09:58
@rtreffer rtreffer changed the title Export driver quirks [SPARK-6888][SQL] Export driver quirks Apr 14, 2015
@liancheng
Copy link
Contributor

ok to test.


private var quirks = List[DriverQuirks]()

def registerQuirks(quirk: DriverQuirks) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add return type explicitly for all public methods.

@rtreffer
Copy link
Contributor Author

@liancheng thank you, will updaste the patch.
Just one question: Should I sqash/amend the fixes or should I add a second commit?

} else {
r.getCatalystType(sqlType, typeName, size, md)
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

quirks.map(_.getCatalystType(sqlType, typeName, size, md)).collectFirst {
  case dataType if dataType != null => dataType
}.orNull

@liancheng
Copy link
Contributor

You may just add new commits to this PR. Also, would you please add tests for this feature?

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30332 has finished for PR 5498 at commit 9ca66d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rtreffer
Copy link
Contributor Author

I still have to write tests for AggregatedQuirks.

@marmbrus
Copy link
Contributor

If we are going to make this a public API we should consider a clearer name. Perhaps. JDBCTypeMapping? You will also need to reopen the PR against master as we don't want to add new APIs in a maintenance branch.

@@ -39,33 +39,68 @@ import java.sql.Types
* if `getJDBCType` returns `(null, None)`, the default type handling is used
* for the given Catalyst type.
*/
private[sql] abstract class DriverQuirks {
abstract class DriverQuirks {
def canHandle(url : String): Boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add scala doc to describe the contract for each of these methods.

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30375 has finished for PR 5498 at commit 7f23484.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rtreffer
Copy link
Contributor Author

@marmbrus thank you, I'll fix those issues and open a new one when done.

Regarding naming/api:

It is quite common that there is one class per sql/jdbc dialect. Often called that way (e.g. MySQLDialect on hibernate). I've found quite some projects that use the same naming (via github search).
Anyway, in this case I'd like to match those namings and add a default implementation per method (returning the neutral element).

On the other hand it's currently just doing type mapping. So JDBCTypeMapping would be a very valid name, too. It would restrict the use case more (can be good or bad).

I guess you know better what would suite spark :-)

@marmbrus
Copy link
Contributor

Dialect seems reasonable to me.

@marmbrus
Copy link
Contributor

We will also want to mark all of these @DeveloperApi

@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30463 has finished for PR 5498 at commit 22d65ca.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rtreffer
Copy link
Contributor Author

Replaced by #5555

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants