Skip to content

cudf.read_json returns ValueError when given file is not found #13026

@davidwendt

Description

@davidwendt

Describe the bug
The cudf.read_json reports an incorrect error when given an invalid file path.

Steps/Code to reproduce bug

>>> import cudf
>>> df = cudf.read_json("nosuchfile.json")
/conda/envs/rapids/lib/python3.10/site-packages/cudf-23.6.0-py3.10-linux-x86_64.egg/cudf/io/json.py:121: UserWarning: Using CPU via Pandas to read JSON dataset, this may be GPU accelerated in the future
  warnings.warn(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/conda/envs/rapids/lib/python3.10/site-packages/cudf-23.6.0-py3.10-linux-x86_64.egg/cudf/io/json.py", line 143, in read_json
    pd_value = pd.read_json(
  File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 757, in read_json
    return json_reader.read()
  File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 915, in read
    obj = self._get_object_parser(self.data)
  File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 937, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
  File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 1064, in parse
    self._parse_no_numpy()
  File "/conda/envs/rapids/lib/python3.10/site-packages/pandas/io/json/_json.py", line 1321, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None
ValueError: Unexpected character found when decoding 'null'

Using engine="cudf" gives the following result

>>> df = cudf.read_json("nosuchfile.json", engine="cudf")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/conda/envs/rapids/lib/python3.10/site-packages/cudf-23.6.0-py3.10-linux-x86_64.egg/cudf/io/json.py", line 111, in read_json
    df = libjson.read_json(
  File "json.pyx", line 50, in cudf._lib.json.read_json
  File "json.pyx", line 138, in cudf._lib.json.read_json
RuntimeError: CUDF failure at: /cudf/cpp/src/io/json/json_column.cu:958: Input needs to be an array of arrays or an array of (nested) objects

I spent way too much time trying to debug my json file format until I realized I had a typo in the path name.

Expected behavior
The other cudf.read_* functions raise FileNotFoundError if the file cannot be found

Metadata

Metadata

Assignees

No one assigned

    Labels

    PythonAffects Python cuDF API.bugSomething isn't workingcuIOcuIO issuegood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions