Skip to content

Can we switch re to regex?  #407

Closed
@kthyng

Description

@kthyng

I have a limited understanding of the difference between the two regular expression packages, but re won't allow patterns anymore in which "global flags" like (?i) are present not at the beginning of a regular expression pattern, whereas regex will. I have been setting up my custom vocabularies such that a flag like that might end up later in a pattern because they can be linked together with |.

For example,

import cf_xarray as cfx
import xarray as xr

vocab = {"sea_ice_u": {"name": "(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*u)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*x)(?=.*vel)"}}
ds = xr.Dataset()
ds["sea_ice_velocity_x"] = [0,1,2]

with cfx.set_options(custom_criteria=vocab):
    ds.cf["sea_ice_u"]

Currently returns

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 2034, in __getitem__
    return _getitem(self, key)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 685, in _getitem
    names = _get_all(obj, k)
            ^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 385, in _get_all
    results = apply_mapper(all_mappers, obj, key, error=False, default=None)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 117, in apply_mapper
    results.append(_apply_single_mapper(mapper))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 101, in _apply_single_mapper
    results = mapper(obj, key)
              ^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/site-packages/cf_xarray/accessor.py", line 214, in _get_custom_criteria
    if re.match(patterns, obj[var].attrs.get(criterion, "")):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/__init__.py", line 166, in match
    return _compile(pattern, flags).match(string)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/__init__.py", line 294, in _compile
    p = _compiler.compile(pattern, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/_compiler.py", line 743, in compile
    p = _parser.parse(p, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/_parser.py", line 980, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/_parser.py", line 455, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kthyng/miniconda3/envs/omsa3/lib/python3.11/re/_parser.py", line 841, in _parse
    raise source.error('global flags not at the start '
re.error: global flags not at the start of the expression at position 48

But if I replace re with regex (and do some renaming since the variable holding regular expressions in accessor.py is also called "regex") I get back:

<xarray.DataArray 'sea_ice_velocity_x' (sea_ice_velocity_x: 3)>
array([0, 1, 2])
Coordinates:
  * sea_ice_velocity_x  (sea_ice_velocity_x) int64 0 1 2

I suppose there is a reason that re doesn't allow this anymore but I would prefer to be able to do so! What do others think? @dcherian you might be the other person who has used custom vocabularies?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions