Skip to content

Commit 90253ab

Browse files
committed
add several fixes for HQL & new statements support
1 parent 925f570 commit 90253ab

File tree

13 files changed

+489
-21
lines changed

13 files changed

+489
-21
lines changed

CHANGELOG.txt

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,25 @@
1+
**v0.24.0**
2+
3+
## Fixes:
4+
5+
### HQL:
6+
7+
1. More then 2 tblproperties now are parsed correctly https://github.com/xnuinside/simple-ddl-parser/pull/104
8+
9+
10+
### Common:
11+
12+
2. 'set' in lower case now also parsed validly.
13+
3. Now names like 'schema', 'database', 'table' can be used as names in CREATE DABASE | SCHEMA | TABLESPACE | DOMAIN | TYPE statements and after INDEX and CONSTRAINT.
14+
4. Creation of empty tables also parsed correctly (like CREATE Table table;).
15+
16+
## New Statements Support:
17+
18+
### HQL:
19+
1. Added support for CLUSTERED BY - https://github.com/xnuinside/simple-ddl-parser/issues/103
20+
2. Added support for INTO ... BUCKETS
21+
3. CREATE REMOTE DATABASE | SCHEMA
22+
123
**v0.23.0**
224

325
Big refactoring: less code complexity & increase code coverage. Radon added to pre-commit hooks.

README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,7 @@ You also can provide a path where you want to have a dumps with schema with argu
308308
- FIELDS TERMINATED BY, LINES TERMINATED BY, COLLECTION ITEMS TERMINATED BY, MAP KEYS TERMINATED BY
309309
- TBLPROPERTIES ('parquet.compression'='SNAPPY' & etc.)
310310
- SKEWED BY
311+
- CLUSTERED BY
311312

312313
### MySQL
313314

@@ -388,6 +389,28 @@ for help with debugging & testing support for BigQuery dialect DDLs:
388389

389390

390391
## Changelog
392+
**v0.24.0**
393+
394+
## Fixes:
395+
396+
### HQL:
397+
398+
1. More then 2 tblproperties now are parsed correctly https://github.com/xnuinside/simple-ddl-parser/pull/104
399+
400+
401+
### Common:
402+
403+
2. 'set' in lower case now also parsed validly.
404+
3. Now names like 'schema', 'database', 'table' can be used as names in CREATE DABASE | SCHEMA | TABLESPACE | DOMAIN | TYPE statements and after INDEX and CONSTRAINT.
405+
4. Creation of empty tables also parsed correctly (like CREATE Table table;).
406+
407+
## New Statements Support:
408+
409+
### HQL:
410+
1. Added support for CLUSTERED BY - https://github.com/xnuinside/simple-ddl-parser/issues/103
411+
2. Added support for INTO ... BUCKETS
412+
3. CREATE REMOTE DATABASE | SCHEMA
413+
391414
**v0.23.0**
392415

393416
Big refactoring: less code complexity & increase code coverage. Radon added to pre-commit hooks.

docs/README.rst

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Build with ply (lex & yacc in python). A lot of samples in 'tests/.
2525
Is it Stable?
2626
^^^^^^^^^^^^^
2727

28-
Yes, library already has about 7000+ downloads per day.
28+
Yes, library already has about 7000+ downloads per day - https://pypistats.org/packages/simple-ddl-parser..
2929

3030
As maintainer, I guarantee that any backward incompatible changes will not be done in patch or minor version. Only additionals & new features.
3131

@@ -342,6 +342,7 @@ HQL Dialect statements
342342
* FIELDS TERMINATED BY, LINES TERMINATED BY, COLLECTION ITEMS TERMINATED BY, MAP KEYS TERMINATED BY
343343
* TBLPROPERTIES ('parquet.compression'='SNAPPY' & etc.)
344344
* SKEWED BY
345+
* CLUSTERED BY
345346

346347
MySQL
347348
^^^^^
@@ -447,6 +448,36 @@ for help with debugging & testing support for BigQuery dialect DDLs:
447448
Changelog
448449
---------
449450

451+
**v0.24.0**
452+
453+
Fixes:
454+
------
455+
456+
HQL:
457+
^^^^
458+
459+
460+
#. More then 2 tblproperties now are parsed correctly https://github.com/xnuinside/simple-ddl-parser/pull/104
461+
462+
Common:
463+
^^^^^^^
464+
465+
466+
#. 'set' in lower case now also parsed validly.
467+
#. Now names like 'schema', 'database', 'table' can be used as names in CREATE DABASE | SCHEMA | TABLESPACE | DOMAIN | TYPE statements and after INDEX and CONSTRAINT.
468+
#. Creation of empty tables also parsed correctly (like CREATE Table table;).
469+
470+
New Statements Support:
471+
-----------------------
472+
473+
HQL:
474+
^^^^
475+
476+
477+
#. Added support for CLUSTERED BY - https://github.com/xnuinside/simple-ddl-parser/issues/103
478+
#. Added support for INTO ... BUCKETS
479+
#. CREATE REMOTE DATABASE | SCHEMA
480+
450481
**v0.23.0**
451482

452483
Big refactoring: less code complexity & increase code coverage. Radon added to pre-commit hooks.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "simple-ddl-parser"
3-
version = "0.23.0"
3+
version = "0.24.0"
44
description = "Simple DDL Parser to parse SQL & dialects like HQL, TSQL (MSSQL), Oracle, AWS Redshift, Snowflake, MySQL, PostgreSQL, etc ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.; sequences, alters, custom types & other entities from ddl."
55
authors = ["Iuliia Volkova <[email protected]>"]
66
license = "MIT"

simple_ddl_parser/ddl_parser.py

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,11 +84,13 @@ def set_lexer_tags(self, t):
8484
def t_STRING(self, t):
8585
r"((\')([a-zA-Z_,`0-9:><\=\-\+.\~\%$\!() {}\[\]\/\\\"\#\*&^|?;±§@~]*)(\')){1}"
8686
t.type = "STRING"
87+
self.lexer.last_token = t.type
8788
return t
8889

8990
def t_DQ_STRING(self, t):
9091
r"((\")([a-zA-Z_,`0-9:><\=\-\+.\~\%$\!() {}'\[\]\/\\\\#\*&^|?;±§@~]*)(\")){1}"
9192
t.type = "DQ_STRING"
93+
self.lexer.last_token = t.type
9294
return t
9395

9496
def is_token_column_name(self, t):
@@ -103,9 +105,31 @@ def is_token_column_name(self, t):
103105
and t.value.upper() not in tok.first_liners
104106
)
105107

108+
def is_creation_name(self, t):
109+
"""many of reserved words can be used as column name,
110+
to decide is it a column name or not we need do some checks"""
111+
skip_id_tokens = ["(", ")", ","]
112+
return (
113+
t.value not in skip_id_tokens
114+
and t.value.upper() not in ["IF"]
115+
and self.lexer.last_token
116+
in [
117+
"SCHEMA",
118+
"TABLE",
119+
"DATABASE",
120+
"TYPE",
121+
"DOMAIN",
122+
"TABLESPACE",
123+
"INDEX",
124+
"CONSTRAINT",
125+
"EXISTS",
126+
]
127+
)
128+
106129
def t_ID(self, t):
107130
r"([0-9]\.[0-9])\w|([a-zA-Z_,0-9:><\/\=\-\+\~\%$\*\()!{}\[\]\`\[\]]+)"
108131
t.type = tok.symbol_tokens.get(t.value, "ID")
132+
109133
if t.type == "LP":
110134
self.lexer.lp_open += 1
111135
self.lexer.columns_def = True
@@ -114,17 +138,22 @@ def t_ID(self, t):
114138

115139
elif self.is_token_column_name(t):
116140
t.type = "ID"
141+
elif t.type != "DQ_STRING" and self.is_creation_name(t):
142+
t.type = "ID"
117143
else:
118144
t = self.tokens_not_columns_names(t)
119145

120-
# capitalize tokens
121-
if t.type != "ID" and t.type not in ["LT", "RT"]:
122-
t.value = t.value.upper()
146+
self.capitalize_tokens(t)
123147

124148
if t.type == "COMMA" and self.lexer.lt_open:
125149
t.type = "COMMAT"
150+
126151
return self.set_last_token(t)
127152

153+
def capitalize_tokens(self, t):
154+
if t.type != "ID" and t.type not in ["LT", "RT"]:
155+
t.value = t.value.upper()
156+
128157
def set_last_token(self, t):
129158
self.lexer.last_token = t.type
130159
if t.type in ["RP", "LP"]:

simple_ddl_parser/dialects/hql.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,19 @@ def p_expression_location(self, p):
88
p_list = list(p)
99
p[0]["location"] = p_list[-1]
1010

11+
def p_expression_clustered(self, p):
12+
"""expr : expr ID ON LP pid RP
13+
| expr ID BY LP pid RP"""
14+
p[0] = p[1]
15+
p_list = list(p)
16+
p[0][f"{p_list[2].lower()}_{p_list[3].lower()}"] = p_list[-2]
17+
18+
def p_expression_into_buckets(self, p):
19+
"""expr : expr INTO ID ID"""
20+
p[0] = p[1]
21+
p_list = list(p)
22+
p[0][f"{p_list[2].lower()}_{p_list[-1].lower()}"] = p_list[-2]
23+
1124
def p_row_format(self, p):
1225
"""row_format : ROW FORMAT SERDE
1326
| ROW FORMAT

simple_ddl_parser/dialects/sql.py

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -26,22 +26,28 @@ def p_expression_partition_by(self, p: List) -> None:
2626

2727

2828
class Database:
29+
def p_expression_create_database(self, p: List) -> None:
30+
"""expr : expr database_base"""
31+
p[0] = p[1]
32+
p_list = list(p)
33+
p[0].update(p_list[-1])
34+
2935
def p_database_base(self, p: List) -> None:
3036
"""database_base : CREATE DATABASE id
37+
| CREATE ID DATABASE id
3138
| database_base clone
3239
"""
33-
p[0] = p[1]
40+
if isinstance(p[1], dict):
41+
p[0] = p[1]
42+
else:
43+
p[0] = {}
3444
p_list = list(p)
3545
if isinstance(p_list[-1], dict):
3646
p[0].update(p_list[-1])
3747
else:
3848
p[0]["database_name"] = p_list[-1]
39-
40-
def p_expression_create_database(self, p: List) -> None:
41-
"""expr : expr database_base"""
42-
p[0] = p[1]
43-
p_list = list(p)
44-
p[0].update(p_list[-1])
49+
if len(p_list) == 5:
50+
p[0][p[2].lower()] = True
4551

4652

4753
class TableSpaces:
@@ -372,9 +378,12 @@ def set_properties_for_schema_and_database(self, p: List, p_list: List) -> None:
372378
if not p[0].get("properties"):
373379
if len(p_list) == 3:
374380
properties = p_list[-1]
375-
else:
381+
elif len(p_list) > 3:
376382
properties = {p_list[-3]: p_list[-1]}
377-
p[0]["properties"] = properties
383+
else:
384+
properties = {}
385+
if properties:
386+
p[0]["properties"] = properties
378387
else:
379388
p[0]["properties"].update({p_list[-3]: p_list[-1]})
380389

@@ -385,8 +394,10 @@ def set_auth_property_in_schema(self, p: List, p_list: List) -> None:
385394
p[0] = {"schema_name": p_list[2], auth.lower(): p_list[-1]}
386395

387396
def p_c_schema(self, p: List) -> None:
388-
"""c_schema : CREATE SCHEMA"""
389-
pass
397+
"""c_schema : CREATE SCHEMA
398+
| CREATE ID SCHEMA"""
399+
if len(p) == 4:
400+
p[0] = {"remote": True}
390401

391402
def p_create_schema(self, p: List) -> None:
392403
"""create_schema : c_schema id id
@@ -409,7 +420,7 @@ def p_create_schema(self, p: List) -> None:
409420
auth_index = p_list.index(auth)
410421
self.set_auth_property_in_schema(p, p_list)
411422

412-
elif isinstance(p_list[-1], str):
423+
if isinstance(p_list[-1], str):
413424
if auth_index:
414425
schema_name = p_list[auth_index - 1]
415426
if schema_name is None:
@@ -427,7 +438,7 @@ def set_project_in_schema(data: Dict, p_list: List, auth_index: int) -> Dict:
427438
return data
428439

429440
def p_create_database(self, p: List) -> None:
430-
"""create_database : CREATE DATABASE id
441+
"""create_database : database_base
431442
| create_database id id id
432443
| create_database id id STRING
433444
| create_database options
@@ -703,6 +714,7 @@ def extract_check_data(self, p, p_list):
703714
def p_expression_table(self, p: List) -> None:
704715
"""expr : table_name defcolumn
705716
| table_name LP defcolumn
717+
| table_name
706718
| expr COMMA defcolumn
707719
| expr COMMA
708720
| expr COMMA constraint
@@ -1142,7 +1154,6 @@ def p_expression_alter(self, p: List) -> None:
11421154
| alter_default
11431155
"""
11441156
p[0] = p[1]
1145-
print(p[0], "expe")
11461157
if len(p) == 3:
11471158
p[0].update(p[2])
11481159

@@ -1152,7 +1163,6 @@ def p_alter_unique(self, p: List) -> None:
11521163
"""
11531164

11541165
p_list = remove_par(list(p))
1155-
print(p_list, "unique")
11561166
p[0] = p[1]
11571167
p[0]["unique"] = {"constraint_name": None, "columns": p_list[-1]}
11581168
if "constraint" in p[2]:

simple_ddl_parser/parser.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ def process_set(self) -> None:
126126
self.tables.append({"name": name, "value": value})
127127

128128
def parse_set_statement(self):
129-
if re.match(r"SET", self.line):
129+
if re.match(r"SET", self.line.upper()):
130130
self.set_was_in_line = True
131131
if not self.set_line:
132132
self.set_line = self.line

simple_ddl_parser/tokens.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@
6666
"PARTITION": "PARTITION",
6767
"BY": "BY",
6868
# hql
69+
"INTO": "INTO",
6970
"STORED": "STORED",
7071
"LOCATION": "LOCATION",
7172
"ROW": "ROW",

tests/test_create_database.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,23 @@ def test_parse_properties_in_create_db():
2626
"ddl_properties": [],
2727
}
2828
assert expected == result
29+
30+
31+
def test_create_database_database():
32+
expected = {
33+
"databases": [{"database_name": "database"}],
34+
"ddl_properties": [],
35+
"domains": [],
36+
"schemas": [{"schema_name": "SCHEMA"}],
37+
"sequences": [],
38+
"tables": [],
39+
"types": [],
40+
}
41+
42+
ddl = """
43+
44+
CREATE DATABASE database;
45+
CREATE SCHEMA SCHEMA;
46+
"""
47+
result = DDLParser(ddl).run(group_by_type=True, output_mode="hql")
48+
assert expected == result

0 commit comments

Comments
 (0)