Skip to content

Debug option #116

@paulochf

Description

@paulochf

Is your feature request related to a problem? Please describe.
I tried to use the library to parse HQL DDLs, but some of them I got the error below

---------------------------------------------------------------------------
DDLParserError                            Traceback (most recent call last)
/var/folders/gv/rh_2w83x16s3t1bkll6gd9ym0000gq/T/ipykernel_7778/1646707155.py in <module>
      2     print("="*40, tbl_name)
      3     if tbl_file["parse"] != "PARSE" or tbl_name not in {"silver.hvc_general", "silver.hvc_relog", "silver.hvc_telemetry"}: continue
----> 4     contents = parse_from_file(tbl_file["path"], output_mode="hql")
      5 
      6 

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/ddl_parser.py in parse_from_file(file_path, **kwargs)
    203     """get useful data from ddl"""
    204     with open(file_path, "r") as df:
--> 205         return DDLParser(df.read()).run(file_path=file_path, **kwargs)

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in run(self, dump, dump_path, file_path, output_mode, group_by_type, json_dump)
    270             Dict == one entity from ddl - one table or sequence or type.
    271         """
--> 272         self.tables = self.parse_data()
    273         self.tables = result_format(self.tables, output_mode, group_by_type)
    274         if dump:

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in parse_data(self)
    184 
    185         for num, self.line in enumerate(lines):
--> 186             self.process_line(num != len(lines) - 1)
    187         if self.comments:
    188             self.tables.append({"comments": self.comments})

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in process_line(self, last_line)
    216         self.set_default_flags_in_lexer()
    217 
--> 218         self.process_statement()
    219 
    220     def process_statement(self):

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in process_statement(self)
    220     def process_statement(self):
    221         if not self.set_line and self.statement:
--> 222             self.parse_statement()
    223         if self.new_statement:
    224             self.statement = self.line

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in parse_statement(self)
    227 
    228     def parse_statement(self) -> None:
--> 229         _parse_result = yacc.parse(self.statement)
    230         if _parse_result:
    231             self.tables.append(_parse_result)

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/ply/yacc.py in parse(self, input, lexer, debug, tracking, tokenfunc)
    331             return self.parseopt(input, lexer, debug, tracking, tokenfunc)
    332         else:
--> 333             return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
    334 
    335 

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/ply/yacc.py in parseopt_notrack(self, input, lexer, debug, tracking, tokenfunc)
   1061                 if not lookahead:
   1062                     if not lookaheadstack:
-> 1063                         lookahead = get_token()     # Get the next token
   1064                     else:
   1065                         lookahead = lookaheadstack.pop()

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/ply/lex.py in token(self)
    384                     tok.lexpos = lexpos
    385                     self.lexpos = lexpos
--> 386                     newtok = self.lexerrorf(tok)
    387                     if lexpos == self.lexpos:
    388                         # Error method didn't change text position at all. This is an error.

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/ddl_parser.py in t_error(self, t)
    193 
    194     def t_error(self, t):
--> 195         raise DDLParserError("Unknown symbol %r" % (t.value[0],))
    196 
    197     def p_error(self, p):

DDLParserError: Unknown symbol "'"

It was hard to find the problem in a big DDL. After finding another shorter example, I could isolate the cause and figured out that the following comment was the issue.

column_name STRING COMMENT 'yada yada yada don’t bla bla bla',   -- the problem was the single stylized quote (&rsquo; in HTML) in "don't".

Hence, some debugging parameters to see which error lexer/yaccer got might be helpful. For example, I could see ply lets you do that.

Describe the solution you'd like
parse_from_file(tbl_file["path"], output_mode="hql", debug=True)

Describe alternatives you've considered
Show the character it caused the problem

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions