[FEA] Refactoring JSON reader tree algorithms with Compressed Sparse Row (CSR)

The steady addition of features to the JSON reader has resulted in some code paths that are error-prone (see #15750) and difficult to maintain. Support for mixed types, coercing nested types to string, array of arrays, null literals and more has been added over the past few releases (see [comment](https://github.com/rapidsai/cudf/issues/15750#issuecomment-2121737479)) and stretched the original design of token-to-tree and tree-to-column processing. 

| Status | Topic | 
|---|---|
| 🔄 | Introduce column vertex structure and graph traversal to the tree representation. Make sure to maintain the pandas requirements for handling array-of-arrays and null literals. |
| | Introduce mixed type handling with pruning for non-conforming dtypes (updated Spark requirement). Also consider the case where a dtype is not provided for a column with mixed types. | 
| | Add an pruning option for cross-column pruning, for cases when validation fails and all values in the row become null | 
| | https://github.com/rapidsai/cudf/issues/15278 |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Refactoring JSON reader tree algorithms with Compressed Sparse Row (CSR) #15903

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Status	Topic
🔄	Introduce column vertex structure and graph traversal to the tree representation. Make sure to maintain the pandas requirements for handling array-of-arrays and null literals.
	Introduce mixed type handling with pruning for non-conforming dtypes (updated Spark requirement). Also consider the case where a dtype is not provided for a column with mixed types.
	Add an pruning option for cross-column pruning, for cases when validation fails and all values in the row become null
	#15278

[FEA] Refactoring JSON reader tree algorithms with Compressed Sparse Row (CSR) #15903

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions