-
Notifications
You must be signed in to change notification settings - Fork 993
Closed
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.bugSomething isn't workingSomething isn't workingcuIOcuIO issuecuIO issuegood first issueGood for newcomersGood for newcomers
Milestone
Description
Describe the bug
The cudf.read_json reports an incorrect error when given an invalid file path.
Steps/Code to reproduce bug
>>> import cudf
>>> df = cudf.read_json("nosuchfile.json")
/conda/envs/rapids/lib/python3.10/site-packages/cudf-23.6.0-py3.10-linux-x86_64.egg/cudf/io/json.py:121: UserWarning: Using CPU via Pandas to read JSON dataset, this may be GPU accelerated in the future
warnings.warn(
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/conda/envs/rapids/lib/python3.10/site-packages/cudf-23.6.0-py3.10-linux-x86_64.egg/cudf/io/json.py", line 143, in read_json
pd_value = pd.read_json(
File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 757, in read_json
return json_reader.read()
File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 915, in read
obj = self._get_object_parser(self.data)
File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 937, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 1064, in parse
self._parse_no_numpy()
File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 1321, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Unexpected character found when decoding 'null'
Using engine="cudf" gives the following result
>>> df = cudf.read_json("nosuchfile.json", engine="cudf")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/conda/envs/rapids/lib/python3.10/site-packages/cudf-23.6.0-py3.10-linux-x86_64.egg/cudf/io/json.py", line 111, in read_json
df = libjson.read_json(
File "json.pyx", line 50, in cudf._lib.json.read_json
File "json.pyx", line 138, in cudf._lib.json.read_json
RuntimeError: CUDF failure at: /cudf/cpp/src/io/json/json_column.cu:958: Input needs to be an array of arrays or an array of (nested) objects
I spent way too much time trying to debug my json file format until I realized I had a typo in the path name.
Expected behavior
The other cudf.read_* functions raise FileNotFoundError if the file cannot be found
galipremsagar
Metadata
Metadata
Assignees
Labels
PythonAffects Python cuDF API.Affects Python cuDF API.bugSomething isn't workingSomething isn't workingcuIOcuIO issuecuIO issuegood first issueGood for newcomersGood for newcomers