Intro to the linear algebra module¶

Most of the linear algebra module are wrappers with very few lines and an API nearly equal to their numpy counterpart. In general, the only thing you need to do is pass the input DataArray and indicate which dimensions correspond to the matrices. There are only a couple exceptions which have their own section.

import xarray_einstats
from xarray_einstats.tutorial import generate_matrices_dataarray

We start by generating synthetic data to work with:

da = generate_matrices_dataarray(7)
da

<xarray.DataArray (batch: 10, experiment: 3, dim: 4, dim2: 4)> Size: 4kB
0.7075 1.025 0.5685 0.8951 0.2065 3.384 ... 1.239 0.4527 0.5749 0.4766 0.859
Dimensions without coordinates: batch, experiment, dim, dim2

The data represents a collection of matrices. dim and dim2 indicate the matrix dimensions, the whole array is 4d, with 30 matrices in total from 10 batches and 3 experiments.

General linalg functions¶

You can get the trace of all 30 matrices in a single line, you only need the input DataArray and the dimensions corresponding to the matrices:

da.linalg.trace(dims=["dim", "dim2"])

<xarray.DataArray (batch: 10, experiment: 3)> Size: 240B
4.854 4.74 4.457 2.637 2.79 3.163 1.998 ... 2.804 4.58 2.888 4.936 5.983 4.07
Dimensions without coordinates: batch, experiment

The main feature of the wrappers is that they know what is the expected shape of the output, you don’t need to take care of it. See how the inverse which doesn’t reduce the matrix dimension can be called with the exact same arguments.

da.linalg.inv(dims=["dim", "dim2"])

<xarray.DataArray (batch: 10, experiment: 3, dim: 4, dim2: 4)> Size: 4kB
11.26 -2.363 -10.84 -0.2744 10.99 -2.017 ... -3.444 0.7703 0.316 0.01949 -1.162
Dimensions without coordinates: batch, experiment, dim, dim2

Even a qr decomposition which returns multiple matrices (which could even have different shapes) needs only these two arguments to work. (batched qr decomposition requires numpy>=1.22)

q, r = da.linalg.qr(dims=["dim", "dim2"])

<xarray.DataArray (batch: 10, experiment: 3, dim: 4, dim2: 4)> Size: 4kB
-0.5452 0.01652 -0.5624 -0.6214 -0.1592 ... -0.3322 -0.4013 0.2607 0.8128
Dimensions without coordinates: batch, experiment, dim, dim2

xarray.DataArray

batch: 10
experiment: 3
dim: 4
dim2: 4

-0.5452 0.01652 -0.5624 -0.6214 ... -0.3322 -0.4013 0.2607 0.8128

array([[[[-0.54524667,  0.01651548, -0.56235395, -0.62144295],
         [-0.15916133, -0.97836784, -0.01894287,  0.13078685],
         [-0.44337144,  0.18336874, -0.4174974 ,  0.77168231],
         [-0.69339419,  0.09433718,  0.71350902, -0.03478312]],

        [[-0.81211074, -0.25534102,  0.18661131,  0.4903604 ],
         [-0.01938461, -0.38919712, -0.91398447,  0.11305849],
         [-0.56710711,  0.17072971, -0.15839897, -0.79003206],
         [-0.13597759,  0.86843418, -0.32360217,  0.35016253]],

        [[-0.16409329, -0.95950495,  0.22836282, -0.01655471],
         [-0.06472038, -0.20350816, -0.92447445, -0.31582066],
         [-0.98120675,  0.17984722,  0.04683549, -0.05191084],
         [-0.07821755, -0.07476395, -0.30166844,  0.94725314]]],


       [[[-0.17226434, -0.02339046,  0.48097431,  0.85932625],
         [-0.13297457, -0.27146099,  0.81687772, -0.49126112],
         [-0.95612375,  0.23344781, -0.14178422, -0.10595627],
         [-0.19613806, -0.9334152 , -0.28508199,  0.09483768]],
...
        [[-0.67238791,  0.37985267, -0.61277313,  0.1676769 ],
         [-0.71627061, -0.12489248,  0.64185376, -0.24368427],
         [-0.16104528, -0.87983319, -0.41455768, -0.16763025],
         [-0.09442098, -0.256919  ,  0.20167988,  0.94042678]]],


       [[[-0.79446568,  0.52924421, -0.21048691, -0.21076079],
         [-0.36396157, -0.13625917,  0.91392739,  0.11705614],
         [-0.26264678, -0.70288759, -0.12628497, -0.64885886],
         [-0.40911237, -0.45528348, -0.32323999,  0.72170627]],

        [[-0.50442053,  0.14010248,  0.78785751, -0.32436363],
         [-0.06543669, -0.58805434,  0.35970857,  0.72147063],
         [-0.58101799, -0.61244346, -0.40679606, -0.34906743],
         [-0.63537081,  0.50938826, -0.29052927,  0.50241446]],

        [[-0.4372865 , -0.30580667,  0.65188259, -0.53880598],
         [-0.10464974, -0.75576903, -0.60745083, -0.22105451],
         [-0.82911976,  0.41747314, -0.37160558, -0.0136346 ],
         [-0.33224897, -0.40126255,  0.26069498,  0.81279586]]]])

<xarray.DataArray (batch: 10, experiment: 3, dim: 4, dim2: 4)> Size: 4kB
-1.298 -1.975 -1.858 -1.228 0.0 -3.137 ... -0.4307 1.052 0.0 0.0 0.0 -0.6995
Dimensions without coordinates: batch, experiment, dim, dim2

Tip

Do you always follow the same convention to name your matrix dimensions and feel that even having to repeat that is unnecessary? Take a look at xarray_einstats.linalg.get_default_dims to see how to modify the default dims used by the linalg wrappers

matmul: 1st exception¶

The general representation of a matrix multiplication is:

(1)¶\[ \mathcal{M}_1^{N\times K} * \mathcal{M}_2^{K\times M} = \mathcal{M}^{N\times M} \]

There are conceptually 3 dimensions involved in the operation because the 2nd dimension of \(\mathcal{M}_1\) needs to be the same as the 1st dimension of \(\mathcal{M}_2\). Moreover, when working with square matrices, \(N==M==K\) and there is only 1 dimension.

When working with xarray however, there can’t be repeated dimension names, so as we have already seen, conceptually equivalent dimensions will have potentially different names, i.e. dim and dim2.

Taking all of this into account, matmul’s dims argument supports indicating the dimensions in 3 different ways. The following table summarizes the inputs dims accepts and how they are interpreted:

`dims`	dim_a1	dim_a2	dim_b1	dim_b2
`[dim1, dim2]`	dim1	dim2	dim1	dim2
`[dim1, dim2, dim3]`	dim1	dim2	dim2	dim3
`[[dim_a1, dim_a2], [dim_b1, dim_b2]]`	dim_a1	dim_a2	dim_b1	dim_b2

where dim_a1, dim_a2 are the matrix dimensions of the first matrix, and dim_b# are the matrix dimensions of the 2nd matrix. Like in (1), the dimensions present in the output are dim_a1, dim_b2.

List of two elements¶

This first example uses square matrices, so when doing a matrix multiplication, the two dimensions are common in both inputs. You only need a list with two strings to indicate how to perform the multiplication:

from xarray_einstats import linalg

linalg.matmul(da, da, dims=["dim", "dim2"])

<xarray.DataArray (batch: 10, experiment: 3, dim: 4, dim2: 4)> Size: 4kB
1.845 5.326 2.407 3.89 3.378 14.68 5.449 ... 5.586 6.55 1.279 1.373 1.791 2.658
Dimensions without coordinates: batch, experiment, dim, dim2

xarray.DataArray

batch: 10
experiment: 3
dim: 4
dim2: 4

1.845 5.326 2.407 3.89 3.378 14.68 ... 6.55 1.279 1.373 1.791 2.658

array([[[[ 1.84483353,  5.32622597,  2.40653715,  3.88958197],
         [ 3.37821999, 14.67992792,  5.44896964, 10.31712753],
         [ 1.06132739,  2.10451187,  1.2110244 ,  1.59748268],
         [ 2.14225193,  5.36018033,  1.96010871,  4.459126  ]],

        [[11.17444781,  2.54333489,  2.43663322,  4.6547786 ],
         [ 3.33442209,  0.65735713,  0.96416432,  2.34139055],
         [ 8.70776353,  1.91833062,  1.96446006,  4.12066739],
         [ 2.83551898,  0.55062389,  0.66090816,  1.85737406]],

        [[ 1.05747116,  1.6134343 ,  5.35598418,  3.64189045],
         [12.34319667,  6.18659368,  9.60522982,  2.88937492],
         [11.40380566,  9.87792652, 10.60431947,  6.38790659],
         [ 4.97998042,  2.80678298,  4.71686236,  2.27157941]]],


       [[[ 0.81785914,  1.94947163,  1.73132774,  2.10454657],
         [ 1.8118472 ,  9.97201681,  5.55310118,  6.47229958],
         [ 0.89295137,  4.74436502,  4.61323158,  5.10575088],
         [ 1.01459562,  5.56834639,  6.27942366,  8.33195688]],
...
        [[ 0.58806205,  2.53306916,  2.3839461 ,  3.9027599 ],
         [ 0.85930538,  3.31860584,  2.82810318,  5.46112734],
         [ 0.59717669,  2.45320583,  1.99567191,  4.16134841],
         [ 0.21272851,  0.77176418,  0.62946025,  1.35537024]]],


       [[[ 7.94088915,  6.64226224,  5.09179258,  8.6891171 ],
         [ 6.5582568 ,  8.7593008 ,  6.14762572,  7.24120495],
         [ 4.32613111,  3.97776226,  6.51184816,  5.86011174],
         [ 5.59030822,  4.76592598,  6.56455167,  7.15063389]],

        [[ 2.92996292,  5.76989862,  3.61280176,  4.28196128],
         [ 6.13005135,  4.54669936,  4.26841964, 21.52168075],
         [ 1.75080663,  4.77843152,  4.18348502, 10.96667709],
         [ 4.93526315,  3.01039592,  4.35588171, 12.22402064]],

        [[ 2.15614186,  2.41301741,  2.78209698,  4.43361124],
         [ 0.65596617,  0.80514352,  0.96646404,  1.28195412],
         [ 3.63494276,  2.91909398,  5.58619473,  6.55033197],
         [ 1.27909197,  1.37282388,  1.79066045,  2.65836137]]]])

List of three elements¶

However, the input matrices for matrix multiplication might not be square or might not have the exact same dimension names. As we have seen, what is necessary if for the 2nd dimension of the 1st matrix to match with the 1st dimension of the 2nd matrix. This 3 element list of dimensions is arguable the most common way to specify matrix multiplications.

You could interpret the DataArray as a collection of matrices of dimension batch, experiment, or with experiment, dim2 indicating the matrices. Those two collections of matrices are valid inputs for matrix multiplication.

As there is still one that need to match, matmul can also take a list of 3 dimensions:

linalg.matmul(da, da, dims=["batch", "experiment", "dim2"], out_append="_bis")

<xarray.DataArray (dim: 4, dim2_bis: 4, batch_bis: 10, batch: 10, dim2: 4)> Size: 51kB
10.79 3.926 1.503 3.986 0.1886 0.1844 ... 1.289 4.187 5.251 3.372 2.81 13.1
Dimensions without coordinates: dim, dim2_bis, batch_bis, batch, dim2

Here, batch and dim2 were matrix dimensions in one of the matrices and batch dimensions in the other. While this might not be very common, xarray-einstats check for dimensions that would end up being duplicated in the output and renames them if necessary using out_append to avoid collisions.

A similar thing happens when both dim1 and dim3 have the same name:

linalg.matmul(da, da, dims=["batch", "experiment", "batch"])

<xarray.DataArray (dim: 4, dim2: 4, batch: 10, batch2: 10)> Size: 13kB
10.79 0.1886 5.402 1.471 1.243 5.348 2.639 ... 3.462 3.618 11.21 9.47 4.187 13.1
Dimensions without coordinates: dim, dim2, batch, batch2

xarray.DataArray

dim: 4
dim2: 4
batch: 10
batch2: 10

10.79 0.1886 5.402 1.471 1.243 5.348 ... 3.618 11.21 9.47 4.187 13.1

array([[[[1.07924506e+01, 1.88617009e-01, 5.40194364e+00, ...,
          1.46483422e+00, 6.36071336e+00, 4.52598370e+00],
         [1.88617009e-01, 1.74931608e-02, 9.39485020e-02, ...,
          4.82905798e-02, 6.85133007e-02, 3.04700398e-01],
         [5.40194364e+00, 9.39485020e-02, 3.81031896e+00, ...,
          1.30659613e+00, 3.24989923e+00, 2.37898447e+00],
         ...,
         [1.46483422e+00, 4.82905798e-02, 1.30659613e+00, ...,
          5.32998377e-01, 8.28776379e-01, 1.03760493e+00],
         [6.36071336e+00, 6.85133007e-02, 3.24989923e+00, ...,
          8.28776379e-01, 3.88072540e+00, 1.99677137e+00],
         [4.52598370e+00, 3.04700398e-01, 2.37898447e+00, ...,
          1.03760493e+00, 1.99677137e+00, 5.49628318e+00]],

        [[3.54295755e+00, 3.04480052e+00, 3.41204236e+00, ...,
          2.82241226e+00, 3.35579122e+00, 1.59709807e+00],
         [3.04480052e+00, 3.08478451e+00, 2.79795857e+00, ...,
          2.16494075e+00, 3.69766201e+00, 1.65154905e+00],
         [3.41204236e+00, 2.79795857e+00, 4.53929126e+00, ...,
          4.51531914e+00, 4.54541295e+00, 1.59990521e+00],
...
          8.96902906e-01, 2.35640702e+00, 7.51861913e-01],
         [4.06862935e+00, 5.68857744e+00, 1.07271356e+01, ...,
          2.35640702e+00, 6.91951506e+00, 2.04360176e+00],
         [1.71000522e+00, 1.95882437e+00, 4.17446880e+00, ...,
          7.51861913e-01, 2.04360176e+00, 7.90021331e-01]],

        [[2.57438396e+00, 1.04485531e+00, 2.96890075e+00, ...,
          3.41946973e+00, 1.55302877e+00, 4.65902178e+00],
         [1.04485531e+00, 1.30418776e+00, 2.04678312e+00, ...,
          2.58177737e+00, 1.67679335e+00, 3.57155457e+00],
         [2.96890075e+00, 2.04678312e+00, 7.81254137e+00, ...,
          6.65662798e+00, 1.81590521e+00, 9.26455777e+00],
         ...,
         [3.41946973e+00, 2.58177737e+00, 6.65662798e+00, ...,
          6.85048272e+00, 3.05449619e+00, 9.46973328e+00],
         [1.55302877e+00, 1.67679335e+00, 1.81590521e+00, ...,
          3.05449619e+00, 2.44699944e+00, 4.18676838e+00],
         [4.65902178e+00, 3.57155457e+00, 9.26455777e+00, ...,
          9.46973328e+00, 4.18676838e+00, 1.30967793e+01]]]],
      shape=(4, 4, 10, 10))

List of 2 element lists¶

The 3rd option is the more verbose and explicit, but still necessary to avoid the need for manual renamings before being able to multiply some matrices.

To see how it works, you’ll need a db object, with the same shape but different dimension names:

db = da.rename(dim="different_dim", dim2="different_dim2")
db

<xarray.DataArray (batch: 10, experiment: 3, different_dim: 4, different_dim2: 4)> Size: 4kB
0.7075 1.025 0.5685 0.8951 0.2065 3.384 ... 1.239 0.4527 0.5749 0.4766 0.859
Dimensions without coordinates: batch, experiment, different_dim, different_dim2

Now da and db are compatible and you might want to multiply them, after all, it’s the same operation we did in the first matmul example (you can check the result if running the notebook). But given the name mismatch it wasn’t possible to use the first nor second option:

linalg.matmul(da, db, dims=[["dim", "dim2"], ["different_dim", "different_dim2"]])

<xarray.DataArray (batch: 10, experiment: 3, dim: 4, different_dim2: 4)> Size: 4kB
1.845 5.326 2.407 3.89 3.378 14.68 5.449 ... 5.586 6.55 1.279 1.373 1.791 2.658
Dimensions without coordinates: batch, experiment, dim, different_dim2

xarray.DataArray

batch: 10
experiment: 3
dim: 4
different_dim2: 4

1.845 5.326 2.407 3.89 3.378 14.68 ... 6.55 1.279 1.373 1.791 2.658

array([[[[ 1.84483353,  5.32622597,  2.40653715,  3.88958197],
         [ 3.37821999, 14.67992792,  5.44896964, 10.31712753],
         [ 1.06132739,  2.10451187,  1.2110244 ,  1.59748268],
         [ 2.14225193,  5.36018033,  1.96010871,  4.459126  ]],

        [[11.17444781,  2.54333489,  2.43663322,  4.6547786 ],
         [ 3.33442209,  0.65735713,  0.96416432,  2.34139055],
         [ 8.70776353,  1.91833062,  1.96446006,  4.12066739],
         [ 2.83551898,  0.55062389,  0.66090816,  1.85737406]],

        [[ 1.05747116,  1.6134343 ,  5.35598418,  3.64189045],
         [12.34319667,  6.18659368,  9.60522982,  2.88937492],
         [11.40380566,  9.87792652, 10.60431947,  6.38790659],
         [ 4.97998042,  2.80678298,  4.71686236,  2.27157941]]],


       [[[ 0.81785914,  1.94947163,  1.73132774,  2.10454657],
         [ 1.8118472 ,  9.97201681,  5.55310118,  6.47229958],
         [ 0.89295137,  4.74436502,  4.61323158,  5.10575088],
         [ 1.01459562,  5.56834639,  6.27942366,  8.33195688]],
...
        [[ 0.58806205,  2.53306916,  2.3839461 ,  3.9027599 ],
         [ 0.85930538,  3.31860584,  2.82810318,  5.46112734],
         [ 0.59717669,  2.45320583,  1.99567191,  4.16134841],
         [ 0.21272851,  0.77176418,  0.62946025,  1.35537024]]],


       [[[ 7.94088915,  6.64226224,  5.09179258,  8.6891171 ],
         [ 6.5582568 ,  8.7593008 ,  6.14762572,  7.24120495],
         [ 4.32613111,  3.97776226,  6.51184816,  5.86011174],
         [ 5.59030822,  4.76592598,  6.56455167,  7.15063389]],

        [[ 2.92996292,  5.76989862,  3.61280176,  4.28196128],
         [ 6.13005135,  4.54669936,  4.26841964, 21.52168075],
         [ 1.75080663,  4.77843152,  4.18348502, 10.96667709],
         [ 4.93526315,  3.01039592,  4.35588171, 12.22402064]],

        [[ 2.15614186,  2.41301741,  2.78209698,  4.43361124],
         [ 0.65596617,  0.80514352,  0.96646404,  1.28195412],
         [ 3.63494276,  2.91909398,  5.58619473,  6.55033197],
         [ 1.27909197,  1.37282388,  1.79066045,  2.65836137]]]])

Whenever the dimension being multiplied/reduced doesn’t have the same name in both matrices, you’ll need to use this 2+2 dims specification. Like in the list of 3 elements case, matmul avoids name clashes:

dc = da.rename(batch="batch_bis")
linalg.matmul(da, dc, dims=[["experiment", "batch"], ["batch_bis", "experiment"]])

<xarray.DataArray (dim: 4, dim2: 4, experiment: 3, experiment2: 3)> Size: 1kB
9.727 6.68 3.595 6.68 18.66 6.065 3.595 ... 10.81 36.08 8.181 3.233 8.181 14.77
Dimensions without coordinates: dim, dim2, experiment, experiment2

Intro to the linear algebra module¶

General linalg functions¶

matmul: 1st exception¶

List of two elements¶

List of three elements¶

List of 2 element lists¶

einsum: 2nd and most notable exception¶