Getting started#

Welcome to `xarray-einstats`!#

xarray-einstats is an open source Python library part of the ArviZ project. It acts as a bridge between the xarray library for labelled arrays and libraries for raw arrays such as NumPy or SciPy.

Xarray has as “Compatibility with the broader ecosystem” as one of its main goals. Which is what allows xarray-einstats to perform this bridge role with minimal code and duplication.

Overview#

xarray-einstats provides wrappers for:

Most of the functions in numpy.linalg
A subset of scipy.stats
rearrange and reduce from einops

These wrappers have the same names and functionality as the original functions. The difference in behaviour is that the wrappers will not make assumptions about the meaning of a dimension based on its position nor they have arguments like axis or axes. They will have dims argument that take dimension names instead of integers indicating the positions of the dimensions on which to act.

It also provides a handful of re-implemented functions:

These are partially reimplemented because the original function doesn’t yet support multidimensional and/or batched computations. They also share the name with a function in NumPy or SciPy, but they only implement a subset of the features. Moreover, the goal is for those to eventually be wrappers too.

Using `xarray-einstats`#

Dataset and GroupBy inputs#

While the DataArray is the base xarray object, there are also other xarray objects that are key while using the library. These other objects such as Dataset are implemented as a collection of DataArray objects, and all include a .map method in order to apply the same function to all its child DataArrays.

We can use map to apply the same function to all the 4 child DataArrays in ds, but this will not always be possible. When using .map, the function provided is applied to all child DataArrays with the same **kwargs.

If we try doing:

ds.map(stats.circmean, dims=("chain", "draw"))

Show code cell output Hide code cell output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[6], line 1
----> 1 ds.map(stats.circmean, dims=("chain", "draw"))

File ~/checkouts/readthedocs.org/user_builds/xarray-einstats/envs/v0.4.0/lib/python3.10/site-packages/xarray/core/dataset.py:5949, in Dataset.map(self, func, keep_attrs, args, **kwargs)
   5947 if keep_attrs is None:
   5948     keep_attrs = _get_keep_attrs(default=False)
-> 5949 variables = {
   5950     k: maybe_wrap_array(v, func(v, *args, **kwargs))
   5951     for k, v in self.data_vars.items()
   5952 }
   5953 if keep_attrs:
   5954     for k, v in variables.items():

File ~/checkouts/readthedocs.org/user_builds/xarray-einstats/envs/v0.4.0/lib/python3.10/site-packages/xarray/core/dataset.py:5950, in <dictcomp>(.0)
   5947 if keep_attrs is None:
   5948     keep_attrs = _get_keep_attrs(default=False)
   5949 variables = {
-> 5950     k: maybe_wrap_array(v, func(v, *args, **kwargs))
   5951     for k, v in self.data_vars.items()
   5952 }
   5953 if keep_attrs:
   5954     for k, v in variables.items():

File ~/checkouts/readthedocs.org/user_builds/xarray-einstats/envs/v0.4.0/lib/python3.10/site-packages/xarray_einstats/stats.py:489, in circmean(da, dims, high, low, nan_policy, **kwargs)
    487 if nan_policy is not None:
    488     circmean_kwargs["nan_policy"] = nan_policy
--> 489 return _apply_reduce_func(stats.circmean, da, dims, kwargs, circmean_kwargs)

File ~/checkouts/readthedocs.org/user_builds/xarray-einstats/envs/v0.4.0/lib/python3.10/site-packages/xarray_einstats/stats.py:432, in _apply_reduce_func(func, da, dims, kwargs, func_kwargs)
    430 if not isinstance(dims, str):
    431     aux_dim = f"__aux_dim__:{','.join(dims)}"
--> 432     da = da.stack({aux_dim: dims})
    433     core_dims = [aux_dim]
    434 else:

File ~/checkouts/readthedocs.org/user_builds/xarray-einstats/envs/v0.4.0/lib/python3.10/site-packages/xarray/core/dataarray.py:2739, in DataArray.stack(self, dimensions, create_index, index_cls, **dimensions_kwargs)
   2674 def stack(
   2675     self: T_DataArray,
   2676     dimensions: Mapping[Any, Sequence[Hashable]] | None = None,
   (...)
   2679     **dimensions_kwargs: Sequence[Hashable],
   2680 ) -> T_DataArray:
   2681     """
   2682     Stack any number of existing dimensions into a single new dimension.
   2683 
   (...)
   2737     DataArray.unstack
   2738     """
-> 2739     ds = self._to_temp_dataset().stack(
   2740         dimensions,
   2741         create_index=create_index,
   2742         index_cls=index_cls,
   2743         **dimensions_kwargs,
   2744     )
   2745     return self._from_temp_dataset(ds)

File ~/checkouts/readthedocs.org/user_builds/xarray-einstats/envs/v0.4.0/lib/python3.10/site-packages/xarray/core/dataset.py:4593, in Dataset.stack(self, dimensions, create_index, index_cls, **dimensions_kwargs)
   4591 result = self
   4592 for new_dim, dims in dimensions.items():
-> 4593     result = result._stack_once(dims, new_dim, index_cls, create_index)
   4594 return result

File ~/checkouts/readthedocs.org/user_builds/xarray-einstats/envs/v0.4.0/lib/python3.10/site-packages/xarray/core/dataset.py:4524, in Dataset._stack_once(self, dims, new_dim, index_cls, create_index)
   4522 product_vars: dict[Any, Variable] = {}
   4523 for dim in dims:
-> 4524     idx, idx_vars = self._get_stack_index(dim, create_index=create_index)
   4525     if idx is not None:
   4526         product_vars.update(idx_vars)

File ~/checkouts/readthedocs.org/user_builds/xarray-einstats/envs/v0.4.0/lib/python3.10/site-packages/xarray/core/dataset.py:4480, in Dataset._get_stack_index(self, dim, multi, create_index)
   4478     var = self._variables[dim]
   4479 else:
-> 4480     _, _, var = _get_virtual_variable(self._variables, dim, self.dims)
   4481 # dummy index (only `stack_coords` will be used to construct the multi-index)
   4482 stack_index = PandasIndex([0], dim)

File ~/checkouts/readthedocs.org/user_builds/xarray-einstats/envs/v0.4.0/lib/python3.10/site-packages/xarray/core/dataset.py:178, in _get_virtual_variable(variables, key, dim_sizes)
    176 split_key = key.split(".", 1)
    177 if len(split_key) != 2:
--> 178     raise KeyError(key)
    180 ref_name, var_name = split_key
    181 ref_var = variables[ref_name]

KeyError: 'chain'

we get an exception. The chain and draw dimensions are not present in all child DataArrays. Instead, we could apply it only to the variables that have both chain and dim dimensions.

Attention

In general, you should prefer using .map attribute over using non-DataArray objects as input to the xarray_einstats directly. .map will ensure no unexpected broadcasting between the multiple child DataArrays takes place. See the examples below for some examples.

However, if you are using functions that reduce dimensions on non-DataArray inputs whose child DataArrays all have all the dimensions to reduce you will not trigger any such broadcasting, and we have included that behaviour on our test suite to ensure it stays this way.

It is also possible to do

Here, all child DataArrays have both chain and draw dimension, so as expected, the result is the same. There are some cases however, in which not using .map triggers some broadcasting operations which will generally not be the desired output.

If we use the .map attribute, the function is applied to each child DataArray independently from the others:

whereas without using the .map attribute, extra broadcasting can happen:

The behaviour on DataArrayGroupBy for example is very similar to the examples we have shown for Datasets:

when we apply a “group by” operation over the team dimension, we generate a DataArrayGroupBy with 3 groups.

gb = da.groupby("team")
gb

DataArrayGroupBy, grouped over 'team'
3 groups with labels 'a', 'b', 'c'.

on which we can use .map to apply a function from xarray-einstats over all groups independently:

which as expected has performed the operation group-wise, yielding a different result than either

stats.median_abs_deviation(da, dims=["draw", "team"])

<xarray.DataArray 'mu' (chain: 4)>
0.3444 0.5968 0.4553 0.4069
Coordinates:
  * chain    (chain) int64 0 1 2 3

Getting started#

Welcome to xarray-einstats!#

Overview#

Using xarray-einstats#

DataArray inputs#

Dataset and GroupBy inputs#

Welcome to `xarray-einstats`!#

Using `xarray-einstats`#