-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-5009] [SQL] Long keyword support in SQL Parsers #3926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #25146 has started for PR 3926 at commit
|
LGTM. |
@OopsOutOfMemory Yea, I will do that after #3924 merged. :) |
Test build #25146 has finished for PR 3926 at commit
|
Test PASSed. |
// NOTICE, Since the Keyword properties defined by sub class, we couldn't call this | ||
// method during the parent class instantiation, because the sub class instance | ||
// isn't created yet. Using `def` instead of the `val` for the lazy initialization. | ||
protected def reservedWords: Seq[Keyword] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not lazy val
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, will fix this.
Test build #25157 has started for PR 3926 at commit
|
Test build #25157 has finished for PR 3926 at commit
|
Test PASSed. |
01ff9c6
to
c620afa
Compare
Test build #25210 has started for PR 3926 at commit
|
Test build #25210 has finished for PR 3926 at commit
|
Test FAILed. |
c620afa
to
dd0e60a
Compare
Test build #25212 has started for PR 3926 at commit
|
Test build #25212 has finished for PR 3926 at commit
|
Test PASSed. |
Removed the WIP, and updated the description. |
Test build #25228 has started for PR 3926 at commit
|
Test build #25228 has finished for PR 3926 at commit
|
Test PASSed. |
def apply(input: String): LogicalPlan = phrase(start)(new lexical.Scanner(input)) match { | ||
case Success(plan, _) => plan | ||
case failureOrError => sys.error(failureOrError.toString) | ||
def apply(input: String): LogicalPlan = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenghao-intel
May
def apply(input: String): LogicalPlan
to be
def apply(input: String): Option[LogicalPlan]
?
It's not consistent with DDLParser
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we probably can combine couple of Parser
work in delegation mode, but currently, I just simply wrote another version of the def apply
in DDLParser
.
This is awesome, thanks for cleaning this up. One question though, do we really want to have case insensitive keywords? Are there any systems that actually do that? If it is something we want to keep then maybe you can add some documentation to the normalizer classes. |
BTW, I'm going to try to merge #3431 first, which might conflict with this. |
536e592
to
080410a
Compare
Test build #25385 has started for PR 3926 at commit
|
@marmbrus You're right, SQL keywords should always be case insensitive. I've updated the code for this and rebased to the latest master. |
Test build #25385 has finished for PR 3926 at commit
|
Test FAILed. |
Test build #25390 has started for PR 3926 at commit
|
Test build #25390 has finished for PR 3926 at commit
|
Test PASSed. |
4828f46
to
f3c0abc
Compare
Test build #25511 has started for PR 3926 at commit
|
f3c0abc
to
686660f
Compare
Test build #25513 has started for PR 3926 at commit
|
Test build #25513 has finished for PR 3926 at commit
|
Test PASSed. |
Test build #25511 has finished for PR 3926 at commit
|
Test PASSed. |
retest this please |
Test build #25810 has started for PR 3926 at commit
|
Test build #25810 has finished for PR 3926 at commit
|
Test PASSed. |
@@ -25,15 +25,42 @@ import scala.util.parsing.input.CharArrayReader.EofCh | |||
|
|||
import org.apache.spark.sql.catalyst.plans.logical._ | |||
|
|||
private[sql] object KeywordNormalizer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a nit, but since this is only used in AbstractSparkSQLParser and its subclasses I'd just make it a protected method to avoid the syntatic overhead of a whole separate object. I believe you are doing further refactoring so maybe that can be done in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also used withinSqlLexical.processIdent
, but you're right, we'd better keep it the minimize visibility. I will do that in #4015 .
Thanks for doing this, much better than the previous hack! Merged to master. |
* The `SqlLexical.allCaseVersions` will cause `StackOverflowException` if the key word is too long, the patch will fix that by normalizing all of the keywords in `SqlLexical`. * And make a unified SparkSQLParser for sharing the common code. Author: Cheng Hao <[email protected]> Closes apache#3926 from chenghao-intel/long_keyword and squashes the following commits: 686660f [Cheng Hao] Support Long Keyword and Refactor the SQLParsers
SqlLexical.allCaseVersions
will causeStackOverflowException
if the key word is too long, the patch will fix that by normalizing all of the keywords inSqlLexical
.