Description
This issue proposes exposing Xarray's Variable
class as a stand-alone array class with named axes (dims
) and arbitrary metadata (attrs
) but without coordinates (indexes
). Yes, this already exists but the Variable
class in currently inseparable from our Pandas dependency, despite not utilizing any of its functionality. What would this entail?
The biggest change would be in making Pandas an optional dependency and isolating any imports. This change could be confined to the Variable
object or could be propagated further as the Explicit Indexes work proceeds (#1603).
Why?
Within Xarray, the Variable
class is a vital building block for many of our internal data structures. Recently, the utility of a simple array with named dimensions has been highlighted by a few potential user communities:
- Scikit-learn: SLEP 8: Propagating feature names scikit-learn/enhancement_proposals#18
- PyTorch: (https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html, http://nlp.seas.harvard.edu/NamedTensor)
An example from the above linked SLEP as to why users may not want Pandas a dependency in Xarray:
@amueller: ...If we go this route, I think we need to make xarray, and therefore pandas, a mandatory dependency...
...
@adrinjalali: ...And we still do have the option of making a NamedArray. xarray uses the pandas' index classes for the indexing and stuff, which is something we really don't need...
Since we already have a class developed that meets these applications' use cases, its seems only prudent to evaluate the feasibility in exposing the Variable
as a low-level api object.
In conclusion, I'm not sure this is currently worth the effort but its probably worth exploring at this point.