Skip to content

Commit ec215da

Browse files
aulemahalmax-sixty
andauthored
Implementation of polyfit and polyval (#3733)
* [WIP] Implementation of polyfit and polyval - minimum testing - no docs * Formatting with black, flake8 * Fix failing test * More intelligent skipna switching * Add docs | Change coeff order to fit numpy | move polyval * Move doc patching to class * conditional doc patching * Fix windows fail - more efficient nan skipping * Fix typo in least_squares * Move polyfit to dataset * Add more tests | fix some edge cases * Skip test without dask * Fix 1D case | add docs * skip polyval test without dask * Explicit docs | More restrictive polyval * Small typo in polyfit docstrings * Apply suggestions from code review Co-Authored-By: Maximilian Roos <[email protected]> * Polyfit : fix style in docstring | add see also section * Clean up docstrings and documentation. * Move whats new entry to 0.16 | fix PEP8 issue in test_dataarray Co-authored-by: Maximilian Roos <[email protected]>
1 parent f583ac7 commit ec215da

14 files changed

+488
-1
lines changed

doc/api.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Top-level functions
3030
zeros_like
3131
ones_like
3232
dot
33+
polyval
3334
map_blocks
3435
show_versions
3536
set_options
@@ -172,6 +173,7 @@ Computation
172173
Dataset.quantile
173174
Dataset.differentiate
174175
Dataset.integrate
176+
Dataset.polyfit
175177

176178
**Aggregation**:
177179
:py:attr:`~Dataset.all`
@@ -352,6 +354,7 @@ Computation
352354
DataArray.quantile
353355
DataArray.differentiate
354356
DataArray.integrate
357+
DataArray.polyfit
355358
DataArray.str
356359

357360
**Aggregation**:

doc/computation.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,32 @@ trapezoidal rule using their coordinates,
401401
and integration along multidimensional coordinate are not supported.
402402

403403

404+
.. _compute.polyfit:
405+
406+
Fitting polynomials
407+
===================
408+
409+
Xarray objects provide an interface for performing linear or polynomial regressions
410+
using the least-squares method. :py:meth:`~xarray.DataArray.polyfit` computes the
411+
best fitting coefficients along a given dimension and for a given order,
412+
413+
.. ipython:: python
414+
415+
x = xr.DataArray(np.arange(10), dims=['x'], name='x')
416+
a = xr.DataArray(3 + 4 * x, dims=['x'], coords={'x': x})
417+
out = a.polyfit(dim='x', deg=1, full=True)
418+
out
419+
420+
The method outputs a dataset containing the coefficients (and more if `full=True`).
421+
The inverse operation is done with :py:meth:`~xarray.polyval`,
422+
423+
.. ipython:: python
424+
425+
xr.polyval(coord=x, coeffs=out.polyfit_coefficients)
426+
427+
.. note::
428+
These methods replicate the behaviour of :py:func:`numpy.polyfit` and :py:func:`numpy.polyval`.
429+
404430
.. _compute.broadcasting:
405431

406432
Broadcasting by dimension name

doc/whats-new.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ Breaking changes
2424

2525
New Features
2626
~~~~~~~~~~~~
27+
- Added :py:meth:`DataArray.polyfit` and :py:func:`xarray.polyval` for fitting polynomials. (:issue:`3349`)
28+
By `Pascal Bourgault <https://github.com/aulemahal>`_.
2729
- Control over attributes of result in :py:func:`merge`, :py:func:`concat`,
2830
:py:func:`combine_by_coords` and :py:func:`combine_nested` using
2931
combine_attrs keyword argument. (:issue:`3865`, :pull:`3877`)

xarray/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
from .core.alignment import align, broadcast
1818
from .core.combine import auto_combine, combine_by_coords, combine_nested
1919
from .core.common import ALL_DIMS, full_like, ones_like, zeros_like
20-
from .core.computation import apply_ufunc, dot, where
20+
from .core.computation import apply_ufunc, dot, polyval, where
2121
from .core.concat import concat
2222
from .core.dataarray import DataArray
2323
from .core.dataset import Dataset
@@ -65,6 +65,7 @@
6565
"open_mfdataset",
6666
"open_rasterio",
6767
"open_zarr",
68+
"polyval",
6869
"register_dataarray_accessor",
6970
"register_dataset_accessor",
7071
"save_mfdataset",

xarray/core/computation.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1306,3 +1306,35 @@ def where(cond, x, y):
13061306
dataset_join="exact",
13071307
dask="allowed",
13081308
)
1309+
1310+
1311+
def polyval(coord, coeffs, degree_dim="degree"):
1312+
"""Evaluate a polynomial at specific values
1313+
1314+
Parameters
1315+
----------
1316+
coord : DataArray
1317+
The 1D coordinate along which to evaluate the polynomial.
1318+
coeffs : DataArray
1319+
Coefficients of the polynomials.
1320+
degree_dim : str, default "degree"
1321+
Name of the polynomial degree dimension in `coeffs`.
1322+
1323+
See also
1324+
--------
1325+
xarray.DataArray.polyfit
1326+
numpy.polyval
1327+
"""
1328+
from .dataarray import DataArray
1329+
from .missing import get_clean_interp_index
1330+
1331+
x = get_clean_interp_index(coord, coord.name)
1332+
1333+
deg_coord = coeffs[degree_dim]
1334+
1335+
lhs = DataArray(
1336+
np.vander(x, int(deg_coord.max()) + 1),
1337+
dims=(coord.name, degree_dim),
1338+
coords={coord.name: coord, degree_dim: np.arange(deg_coord.max() + 1)[::-1]},
1339+
)
1340+
return (lhs * coeffs).sum(degree_dim)

xarray/core/dask_array_ops.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,3 +95,30 @@ def func(x, window, axis=-1):
9595
# crop boundary.
9696
index = (slice(None),) * axis + (slice(drop_size, drop_size + orig_shape[axis]),)
9797
return out[index]
98+
99+
100+
def least_squares(lhs, rhs, rcond=None, skipna=False):
101+
import dask.array as da
102+
103+
lhs_da = da.from_array(lhs, chunks=(rhs.chunks[0], lhs.shape[1]))
104+
if skipna:
105+
added_dim = rhs.ndim == 1
106+
if added_dim:
107+
rhs = rhs.reshape(rhs.shape[0], 1)
108+
results = da.apply_along_axis(
109+
nputils._nanpolyfit_1d,
110+
0,
111+
rhs,
112+
lhs_da,
113+
dtype=float,
114+
shape=(lhs.shape[1] + 1,),
115+
rcond=rcond,
116+
)
117+
coeffs = results[:-1, ...]
118+
residuals = results[-1, ...]
119+
if added_dim:
120+
coeffs = coeffs.reshape(coeffs.shape[0])
121+
residuals = residuals.reshape(residuals.shape[0])
122+
else:
123+
coeffs, residuals, _, _ = da.linalg.lstsq(lhs_da, rhs)
124+
return coeffs, residuals

xarray/core/dataarray.py

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3275,6 +3275,68 @@ def map_blocks(
32753275

32763276
return map_blocks(func, self, args, kwargs)
32773277

3278+
def polyfit(
3279+
self,
3280+
dim: Hashable,
3281+
deg: int,
3282+
skipna: bool = None,
3283+
rcond: float = None,
3284+
w: Union[Hashable, Any] = None,
3285+
full: bool = False,
3286+
cov: bool = False,
3287+
):
3288+
"""
3289+
Least squares polynomial fit.
3290+
3291+
This replicates the behaviour of `numpy.polyfit` but differs by skipping
3292+
invalid values when `skipna = True`.
3293+
3294+
Parameters
3295+
----------
3296+
dim : hashable
3297+
Coordinate along which to fit the polynomials.
3298+
deg : int
3299+
Degree of the fitting polynomial.
3300+
skipna : bool, optional
3301+
If True, removes all invalid values before fitting each 1D slices of the array.
3302+
Default is True if data is stored in a dask.array or if there is any
3303+
invalid values, False otherwise.
3304+
rcond : float, optional
3305+
Relative condition number to the fit.
3306+
w : Union[Hashable, Any], optional
3307+
Weights to apply to the y-coordinate of the sample points.
3308+
Can be an array-like object or the name of a coordinate in the dataset.
3309+
full : bool, optional
3310+
Whether to return the residuals, matrix rank and singular values in addition
3311+
to the coefficients.
3312+
cov : Union[bool, str], optional
3313+
Whether to return to the covariance matrix in addition to the coefficients.
3314+
The matrix is not scaled if `cov='unscaled'`.
3315+
3316+
Returns
3317+
-------
3318+
polyfit_results : Dataset
3319+
A single dataset which contains:
3320+
3321+
polyfit_coefficients
3322+
The coefficients of the best fit.
3323+
polyfit_residuals
3324+
The residuals of the least-square computation (only included if `full=True`)
3325+
[dim]_matrix_rank
3326+
The effective rank of the scaled Vandermonde coefficient matrix (only included if `full=True`)
3327+
[dim]_singular_value
3328+
The singular values of the scaled Vandermonde coefficient matrix (only included if `full=True`)
3329+
polyfit_covariance
3330+
The covariance matrix of the polynomial coefficient estimates (only included if `full=False` and `cov=True`)
3331+
3332+
See also
3333+
--------
3334+
numpy.polyfit
3335+
"""
3336+
return self._to_temp_dataset().polyfit(
3337+
dim, deg, skipna=skipna, rcond=rcond, w=w, full=full, cov=cov
3338+
)
3339+
32783340
def pad(
32793341
self,
32803342
pad_width: Mapping[Hashable, Union[int, Tuple[int, int]]] = None,

0 commit comments

Comments
 (0)