-
Notifications
You must be signed in to change notification settings - Fork 452
Description
If a column has null value in every row/record, %%sql will not drop that entire column.
To reproduce, create a table where a column has only null values, e.g.
%%sql
insert into table
values (1, null),
(2, null),
(3, null)
I have attached screenshots using results from %%sql and spark.sql()
Screen Shot 2019-12-26 at 2.50.52 pm.pdf
Versions:
- SparkMagic 0.12.0
- Livy 0.6.0
- Kernel: Spark
Additional context
I believe the problem comes from the fact that since JSON doesn't pick up null values, when the data got converted into dict and then converted into dataframe, it couldn't have known that there was a missing column:
https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/utils/utils.py#L52
https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/livyclientlib/sqlquery.py#L58
We need a way to pick up the schema before populating all the data.