-
Notifications
You must be signed in to change notification settings - Fork 993
Open
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.SparkFunctionality that helps Spark RAPIDSFunctionality that helps Spark RAPIDScuIOcuIO issuecuIO issuefeature requestNew feature or requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.Affects libcudf (C++/CUDA) code.
Milestone
Description
The steady addition of features to the JSON reader has resulted in some code paths that are error-prone (see #15750) and difficult to maintain. Support for mixed types, coercing nested types to string, array of arrays, null literals and more has been added over the past few releases (see comment) and stretched the original design of token-to-tree and tree-to-column processing.
| Status | Topic |
|---|---|
| 🔄 | Introduce column vertex structure and graph traversal to the tree representation. Make sure to maintain the pandas requirements for handling array-of-arrays and null literals. |
| Introduce mixed type handling with pruning for non-conforming dtypes (updated Spark requirement). Also consider the case where a dtype is not provided for a column with mixed types. | |
| Add an pruning option for cross-column pruning, for cases when validation fails and all values in the row become null | |
| #15278 |
Metadata
Metadata
Assignees
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.SparkFunctionality that helps Spark RAPIDSFunctionality that helps Spark RAPIDScuIOcuIO issuecuIO issuefeature requestNew feature or requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.Affects libcudf (C++/CUDA) code.
Type
Projects
Status
No status
Status
Todo