Skip to content

[SPARK-5009] [SQL] Long keyword support in SQL Parsers #3926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

chenghao-intel
Copy link
Contributor

  • The SqlLexical.allCaseVersions will cause StackOverflowException if the key word is too long, the patch will fix that by normalizing all of the keywords in SqlLexical.
  • And make a unified SparkSQLParser for sharing the common code.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25146 has started for PR 3926 at commit 98023a8.

  • This patch merges cleanly.

@OopsOutOfMemory
Copy link
Contributor

LGTM.
Could you add tests for this ?

@chenghao-intel
Copy link
Contributor Author

@OopsOutOfMemory Yea, I will do that after #3924 merged. :)

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25146 has finished for PR 3926 at commit 98023a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25146/
Test PASSed.

// NOTICE, Since the Keyword properties defined by sub class, we couldn't call this
// method during the parent class instantiation, because the sub class instance
// isn't created yet. Using `def` instead of the `val` for the lazy initialization.
protected def reservedWords: Seq[Keyword] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not lazy val?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, will fix this.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25157 has started for PR 3926 at commit 01ff9c6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25157 has finished for PR 3926 at commit 01ff9c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25157/
Test PASSed.

@chenghao-intel
Copy link
Contributor Author

@marmbrus I have some other idea for this fixing, and need more time on it. Can you review the SQL Parsers code refactoring in #3924 ? I will rebase after #3924 merged.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25210 has started for PR 3926 at commit c620afa.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25210 has finished for PR 3926 at commit c620afa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25210/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25212 has started for PR 3926 at commit dd0e60a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25212 has finished for PR 3926 at commit dd0e60a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25212/
Test PASSed.

@chenghao-intel chenghao-intel changed the title [SPARK-5009][SQL] [WIP] Long keyword support in SQL Parsers [SPARK-5009] [SQL] Long keyword support in SQL Parsers Jan 8, 2015
@chenghao-intel
Copy link
Contributor Author

Removed the WIP, and updated the description.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25228 has started for PR 3926 at commit 536e592.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25228 has finished for PR 3926 at commit 536e592.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25228/
Test PASSed.

def apply(input: String): LogicalPlan = phrase(start)(new lexical.Scanner(input)) match {
case Success(plan, _) => plan
case failureOrError => sys.error(failureOrError.toString)
def apply(input: String): LogicalPlan = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenghao-intel
May

def apply(input: String): LogicalPlan 

to be

def apply(input: String): Option[LogicalPlan]

?
It's not consistent with DDLParser .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we probably can combine couple of Parser work in delegation mode, but currently, I just simply wrote another version of the def apply in DDLParser.

@marmbrus
Copy link
Contributor

This is awesome, thanks for cleaning this up. One question though, do we really want to have case insensitive keywords? Are there any systems that actually do that? If it is something we want to keep then maybe you can add some documentation to the normalizer classes.

@marmbrus
Copy link
Contributor

BTW, I'm going to try to merge #3431 first, which might conflict with this.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25385 has started for PR 3926 at commit 5ca74b4.

  • This patch merges cleanly.

@chenghao-intel
Copy link
Contributor Author

@marmbrus You're right, SQL keywords should always be case insensitive. I've updated the code for this and rebased to the latest master.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25385 has finished for PR 3926 at commit 5ca74b4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical extends StdLexical

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25385/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25390 has started for PR 3926 at commit 4828f46.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25390 has finished for PR 3926 at commit 4828f46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical extends StdLexical

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25390/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25511 has started for PR 3926 at commit f3c0abc.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25513 has started for PR 3926 at commit 686660f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25513 has finished for PR 3926 at commit 686660f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical extends StdLexical

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25513/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25511 has finished for PR 3926 at commit f3c0abc.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical extends StdLexical

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25511/
Test PASSed.

@chenghao-intel
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 20, 2015

Test build #25810 has started for PR 3926 at commit 686660f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 20, 2015

Test build #25810 has finished for PR 3926 at commit 686660f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25810/
Test PASSed.

@chenghao-intel
Copy link
Contributor Author

cc @rxin @marmbrus

@@ -25,15 +25,42 @@ import scala.util.parsing.input.CharArrayReader.EofCh

import org.apache.spark.sql.catalyst.plans.logical._

private[sql] object KeywordNormalizer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a nit, but since this is only used in AbstractSparkSQLParser and its subclasses I'd just make it a protected method to avoid the syntatic overhead of a whole separate object. I believe you are doing further refactoring so maybe that can be done in a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also used withinSqlLexical.processIdent, but you're right, we'd better keep it the minimize visibility. I will do that in #4015 .

@asfgit asfgit closed this in 8361078 Jan 21, 2015
@marmbrus
Copy link
Contributor

Thanks for doing this, much better than the previous hack!

Merged to master.

bomeng pushed a commit to Huawei-Spark/spark that referenced this pull request Jan 22, 2015
* The `SqlLexical.allCaseVersions` will cause `StackOverflowException` if the key word is too long, the patch will fix that by normalizing all of the keywords in `SqlLexical`.
* And make a unified SparkSQLParser for sharing the common code.

Author: Cheng Hao <[email protected]>

Closes apache#3926 from chenghao-intel/long_keyword and squashes the following commits:

686660f [Cheng Hao] Support Long Keyword and Refactor the SQLParsers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants