Skip to content

SQL magic drops column if all row value is NaN #611

@benhyy

Description

@benhyy

If a column has null value in every row/record, %%sql will not drop that entire column.

To reproduce, create a table where a column has only null values, e.g.
%%sql
insert into table
values (1, null),
(2, null),
(3, null)

I have attached screenshots using results from %%sql and spark.sql()

Screen Shot 2019-12-26 at 2.50.52 pm.pdf

Versions:

  • SparkMagic 0.12.0
  • Livy 0.6.0
  • Kernel: Spark

Additional context
I believe the problem comes from the fact that since JSON doesn't pick up null values, when the data got converted into dict and then converted into dataframe, it couldn't have known that there was a missing column:

https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/utils/utils.py#L52
https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/livyclientlib/sqlquery.py#L58

We need a way to pick up the schema before populating all the data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:bugAn unexpected error or issue with sparkmagic

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions