[SPARK-5009] [SQL] Long keyword support in SQL Parsers #3926

chenghao-intel · 2015-01-07T06:22:59Z

The SqlLexical.allCaseVersions will cause StackOverflowException if the key word is too long, the patch will fix that by normalizing all of the keywords in SqlLexical.
And make a unified SparkSQLParser for sharing the common code.

SparkQA · 2015-01-07T06:27:33Z

Test build #25146 has started for PR 3926 at commit 98023a8.

This patch merges cleanly.

OopsOutOfMemory · 2015-01-07T06:44:41Z

LGTM.
Could you add tests for this ?

chenghao-intel · 2015-01-07T06:53:14Z

@OopsOutOfMemory Yea, I will do that after #3924 merged. :)

SparkQA · 2015-01-07T07:35:27Z

Test build #25146 has finished for PR 3926 at commit 98023a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

AmplabJenkins · 2015-01-07T07:35:31Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25146/
Test PASSed.

marmbrus · 2015-01-07T08:00:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SparkSQLParser.scala

+  // NOTICE, Since the Keyword properties defined by sub class, we couldn't call this
+  // method during the parent class instantiation, because the sub class instance
+  // isn't created yet. Using `def` instead of the `val` for the lazy initialization.
+  protected def reservedWords: Seq[Keyword] =


why not lazy val?

Yeah, you're right, will fix this.

SparkQA · 2015-01-07T08:57:33Z

Test build #25157 has started for PR 3926 at commit 01ff9c6.

This patch merges cleanly.

SparkQA · 2015-01-07T10:05:06Z

Test build #25157 has finished for PR 3926 at commit 01ff9c6.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

AmplabJenkins · 2015-01-07T10:05:09Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25157/
Test PASSed.

chenghao-intel · 2015-01-07T14:49:17Z

@marmbrus I have some other idea for this fixing, and need more time on it. Can you review the SQL Parsers code refactoring in #3924 ? I will rebase after #3924 merged.

SparkQA · 2015-01-08T08:32:35Z

Test build #25210 has started for PR 3926 at commit c620afa.

This patch merges cleanly.

SparkQA · 2015-01-08T08:40:12Z

Test build #25210 has finished for PR 3926 at commit c620afa.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

AmplabJenkins · 2015-01-08T08:40:15Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25210/
Test FAILed.

SparkQA · 2015-01-08T08:57:44Z

Test build #25212 has started for PR 3926 at commit dd0e60a.

This patch merges cleanly.

SparkQA · 2015-01-08T10:05:53Z

Test build #25212 has finished for PR 3926 at commit dd0e60a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

AmplabJenkins · 2015-01-08T10:05:56Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25212/
Test PASSed.

chenghao-intel · 2015-01-08T15:12:26Z

Removed the WIP, and updated the description.

SparkQA · 2015-01-08T15:12:37Z

Test build #25228 has started for PR 3926 at commit 536e592.

This patch merges cleanly.

SparkQA · 2015-01-08T16:18:50Z

Test build #25228 has finished for PR 3926 at commit 536e592.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical(normalizer: KeywordNormalizer) extends StdLexical

AmplabJenkins · 2015-01-08T16:18:53Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25228/
Test PASSed.

OopsOutOfMemory · 2015-01-09T06:41:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SparkSQLParser.scala

-  def apply(input: String): LogicalPlan = phrase(start)(new lexical.Scanner(input)) match {
-    case Success(plan, _) => plan
-    case failureOrError => sys.error(failureOrError.toString)
+  def apply(input: String): LogicalPlan = {


@chenghao-intel
May

def apply(input: String): LogicalPlan

to be

def apply(input: String): Option[LogicalPlan]

?
It's not consistent with DDLParser .

You're right, we probably can combine couple of Parser work in delegation mode, but currently, I just simply wrote another version of the def apply in DDLParser.

marmbrus · 2015-01-10T21:49:35Z

This is awesome, thanks for cleaning this up. One question though, do we really want to have case insensitive keywords? Are there any systems that actually do that? If it is something we want to keep then maybe you can add some documentation to the normalizer classes.

marmbrus · 2015-01-10T21:50:39Z

BTW, I'm going to try to merge #3431 first, which might conflict with this.

SparkQA · 2015-01-12T00:52:37Z

Test build #25385 has started for PR 3926 at commit 5ca74b4.

This patch merges cleanly.

chenghao-intel · 2015-01-12T00:56:20Z

@marmbrus You're right, SQL keywords should always be case insensitive. I've updated the code for this and rebased to the latest master.

SparkQA · 2015-01-12T01:34:09Z

Test build #25385 has finished for PR 3926 at commit 5ca74b4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical extends StdLexical

AmplabJenkins · 2015-01-12T01:34:12Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25385/
Test FAILed.

SparkQA · 2015-01-12T02:02:36Z

Test build #25390 has started for PR 3926 at commit 4828f46.

This patch merges cleanly.

SparkQA · 2015-01-12T03:09:13Z

Test build #25390 has finished for PR 3926 at commit 4828f46.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical extends StdLexical

AmplabJenkins · 2015-01-12T03:09:17Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25390/
Test PASSed.

SparkQA · 2015-01-14T05:02:32Z

Test build #25511 has started for PR 3926 at commit f3c0abc.

This patch does not merge cleanly.

SparkQA · 2015-01-14T05:12:37Z

Test build #25513 has started for PR 3926 at commit 686660f.

This patch merges cleanly.

SparkQA · 2015-01-14T06:18:41Z

Test build #25513 has finished for PR 3926 at commit 686660f.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical extends StdLexical

AmplabJenkins · 2015-01-14T06:18:44Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25513/
Test PASSed.

SparkQA · 2015-01-14T06:39:39Z

Test build #25511 has finished for PR 3926 at commit f3c0abc.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical extends StdLexical

AmplabJenkins · 2015-01-14T06:39:43Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25511/
Test PASSed.

chenghao-intel · 2015-01-20T07:50:29Z

retest this please

SparkQA · 2015-01-20T07:52:36Z

Test build #25810 has started for PR 3926 at commit 686660f.

This patch merges cleanly.

SparkQA · 2015-01-20T09:04:10Z

Test build #25810 has finished for PR 3926 at commit 686660f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-20T09:04:13Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25810/
Test PASSed.

chenghao-intel · 2015-01-21T01:26:26Z

cc @rxin @marmbrus

marmbrus · 2015-01-21T21:05:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/AbstractSparkSQLParser.scala

@@ -25,15 +25,42 @@ import scala.util.parsing.input.CharArrayReader.EofCh

 import org.apache.spark.sql.catalyst.plans.logical._

+private[sql] object KeywordNormalizer {


This is kind of a nit, but since this is only used in AbstractSparkSQLParser and its subclasses I'd just make it a protected method to avoid the syntatic overhead of a whole separate object. I believe you are doing further refactoring so maybe that can be done in a followup.

It's also used withinSqlLexical.processIdent, but you're right, we'd better keep it the minimize visibility. I will do that in #4015 .

marmbrus · 2015-01-21T21:09:47Z

Thanks for doing this, much better than the previous hack!

Merged to master.

* The `SqlLexical.allCaseVersions` will cause `StackOverflowException` if the key word is too long, the patch will fix that by normalizing all of the keywords in `SqlLexical`. * And make a unified SparkSQLParser for sharing the common code. Author: Cheng Hao <[email protected]> Closes apache#3926 from chenghao-intel/long_keyword and squashes the following commits: 686660f [Cheng Hao] Support Long Keyword and Refactor the SQLParsers

chenghao-intel mentioned this pull request Jan 7, 2015

[SPARK-5009][SQL][Bug FIx] allCaseVersions leads to stackoverflow. #3909

Closed

marmbrus reviewed Jan 7, 2015
View reviewed changes

chenghao-intel force-pushed the long_keyword branch from 01ff9c6 to c620afa Compare January 8, 2015 08:30

chenghao-intel force-pushed the long_keyword branch from c620afa to dd0e60a Compare January 8, 2015 08:56

chenghao-intel changed the title ~~[SPARK-5009][SQL] [WIP] Long keyword support in SQL Parsers~~ [SPARK-5009] [SQL] Long keyword support in SQL Parsers Jan 8, 2015

OopsOutOfMemory reviewed Jan 9, 2015
View reviewed changes

chenghao-intel force-pushed the long_keyword branch from 536e592 to 080410a Compare January 12, 2015 00:48

chenghao-intel force-pushed the long_keyword branch from 4828f46 to f3c0abc Compare January 14, 2015 04:57

Support Long Keyword and Refactor the SQLParsers

686660f

chenghao-intel force-pushed the long_keyword branch from f3c0abc to 686660f Compare January 14, 2015 05:08

marmbrus reviewed Jan 21, 2015
View reviewed changes

asfgit closed this in 8361078 Jan 21, 2015

		@@ -25,15 +25,42 @@ import scala.util.parsing.input.CharArrayReader.EofCh

		import org.apache.spark.sql.catalyst.plans.logical._

		private[sql] object KeywordNormalizer {

[SPARK-5009] [SQL] Long keyword support in SQL Parsers #3926

[SPARK-5009] [SQL] Long keyword support in SQL Parsers #3926

Uh oh!

Conversation

chenghao-intel commented Jan 7, 2015

Uh oh!

SparkQA commented Jan 7, 2015

Uh oh!

OopsOutOfMemory commented Jan 7, 2015

Uh oh!

chenghao-intel commented Jan 7, 2015

Uh oh!

SparkQA commented Jan 7, 2015

Uh oh!

AmplabJenkins commented Jan 7, 2015

Uh oh!

marmbrus Jan 7, 2015

Choose a reason for hiding this comment

Uh oh!

chenghao-intel Jan 7, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 7, 2015

Uh oh!

SparkQA commented Jan 7, 2015

Uh oh!

AmplabJenkins commented Jan 7, 2015

Uh oh!

chenghao-intel commented Jan 7, 2015

Uh oh!

SparkQA commented Jan 8, 2015

Uh oh!

SparkQA commented Jan 8, 2015

Uh oh!

AmplabJenkins commented Jan 8, 2015

Uh oh!

SparkQA commented Jan 8, 2015

Uh oh!

SparkQA commented Jan 8, 2015

Uh oh!

AmplabJenkins commented Jan 8, 2015

Uh oh!

chenghao-intel commented Jan 8, 2015

Uh oh!

SparkQA commented Jan 8, 2015

Uh oh!

SparkQA commented Jan 8, 2015

Uh oh!

AmplabJenkins commented Jan 8, 2015

Uh oh!

OopsOutOfMemory Jan 9, 2015

Choose a reason for hiding this comment

Uh oh!

chenghao-intel Jan 9, 2015

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Jan 10, 2015

Uh oh!

marmbrus commented Jan 10, 2015

Uh oh!

SparkQA commented Jan 12, 2015

Uh oh!

chenghao-intel commented Jan 12, 2015

Uh oh!

SparkQA commented Jan 12, 2015

Uh oh!

AmplabJenkins commented Jan 12, 2015

Uh oh!

SparkQA commented Jan 12, 2015

Uh oh!

SparkQA commented Jan 12, 2015

Uh oh!

AmplabJenkins commented Jan 12, 2015

Uh oh!

SparkQA commented Jan 14, 2015

Uh oh!

SparkQA commented Jan 14, 2015

Uh oh!

SparkQA commented Jan 14, 2015

Uh oh!