Intro to the stats module#

from scipy import stats
import numpy as np
from xarray_einstats.stats import XrContinuousRV, rankdata, hmean, skew, median_abs_deviation
from xarray_einstats.tutorial import generate_mcmc_like_dataset

ds = generate_mcmc_like_dataset(11)

Probability distributions#

Initialization#

norm = XrContinuousRV(stats.norm, ds["mu"], ds["sigma"])

Using its methods#

Once initialized, you can use its methods exactly as you’d use them with scipy distributions. The only two differences are

They now take scalars or DataArrays as inputs, arrays are only accepted as the arguments on which to evaluate the methods (in scipy docs they are represented by x, k or q depending on the method)
size behaves differently in the rvs method. This ensures that you don’t need to care about any broadcasting or alignment of arrays, xarray_einstats does this for you.

You can generate 10 random draws from the initialized distribution. Here, unlike what would happen with scipy, the output won’t have shape 10, but instead will have shape 10, *broadcasted_input_shape. xarray generates the broadcasted_input_shape and size is independent from it so you can relax and not care about broadcasting.

norm.rvs(size=(10))

<xarray.DataArray (rv_dim0: 10, chain: 4, draw: 10, team: 6)>
-0.02342 0.4384 1.526 0.2427 -0.3344 ... 0.04699 -0.5903 -0.2142 0.6762 1.661
Coordinates:
  * team     (team) <U1 'a' 'b' 'c' 'd' 'e' 'f'
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: rv_dim0

xarray.DataArray

rv_dim0: 10
chain: 4
draw: 10
team: 6

-0.02342 0.4384 1.526 0.2427 -0.3344 ... -0.5903 -0.2142 0.6762 1.661

array([[[[-2.34223244e-02,  4.38408434e-01,  1.52569709e+00,
           2.42662638e-01, -3.34363804e-01,  3.98407388e+00],
         [ 7.51269176e-02,  1.94699264e-01,  2.91387940e-01,
           7.04873651e-01,  3.49686200e-01,  7.96750283e-01],
         [ 1.02683849e+00, -1.09339498e-01,  1.40212507e+00,
           1.56102600e+00,  2.58184647e+00,  2.74600224e+00],
         ...,
         [ 1.48733772e+00,  8.21109178e-01,  4.04352929e+00,
          -4.15343267e-01, -1.62234460e+00, -7.52981931e-01],
         [-2.14125246e-01,  2.76011784e+00, -6.24363710e-01,
           7.22441185e-01,  8.37803375e+00,  2.12780126e+00],
         [ 8.80238525e-01,  1.75698990e-01, -4.12130667e-02,
           5.56795463e-01,  1.01262248e+00,  1.11692708e+00]],

        [[ 5.87992499e-01,  1.27336853e+00, -1.56987359e-01,
           3.46233187e-01,  1.27958793e+00,  2.11031159e+00],
         [ 1.06789915e+00,  4.02927093e+00,  4.39106841e-02,
          -1.34363466e-02,  6.86694111e-02,  1.13025893e+00],
         [ 1.39298189e+00,  8.53949674e-01,  4.68319798e+00,
          -1.08083386e-01, -7.17658374e-02,  1.55389851e+00],
...
         [ 9.36080365e-01,  3.10150742e+00,  2.29928192e+00,
           3.83573032e-01,  3.92144188e+00,  1.87273729e+00],
         [ 1.12344223e+00, -7.15553782e-02,  8.79679260e-01,
           5.61924163e-01,  7.07375858e-01,  2.72357650e+00],
         [ 3.90312423e-01,  1.28735092e+00,  7.20702805e-03,
           4.12616966e+00,  1.14532428e+00,  4.97207712e-02]],

        [[ 5.83949589e+00,  1.62487352e+00,  1.20561279e+00,
          -8.78235753e-01,  2.43417057e+00,  2.15207705e+00],
         [ 1.43760179e+00,  1.11320709e+00, -7.78472277e-01,
          -3.45590619e-01,  4.95602487e-02,  1.63405121e+00],
         [ 1.01608858e+00,  6.31376659e-01,  1.69052253e-01,
           8.50005750e-02,  9.44560757e-01, -7.77953563e-01],
         ...,
         [ 4.31028520e-01, -7.72906373e-01,  4.11506808e-01,
           3.70637998e+00,  1.36117461e+00,  5.03567021e-01],
         [ 2.72867142e+00,  9.34742004e-01,  4.92342453e+00,
           1.40270517e-02,  1.72987222e-01,  3.02987591e-01],
         [ 1.43933923e+00,  4.69926634e-02, -5.90318074e-01,
          -2.14151227e-01,  6.76218268e-01,  1.66061606e+00]]]])

Coordinates: (3)
- team
  (team)
  <U1
  'a' 'b' 'c' 'd' 'e' 'f'
```
array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U1')
```
- chain
  (chain)
  int64
  0 1 2 3
```
array([0, 1, 2, 3])
```
- draw
  (draw)
  int64
  0 1 2 3 4 5 6 7 8 9
```
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```

Indexes: (3)

team

PandasIndex

PandasIndex(Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object', name='team'))

chain

PandasIndex

PandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))

draw

PandasIndex

PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='draw'))

Attributes: (0)

If the dimension names are not provided, xarray_einstats assings rv_dim# as dimension name as many times as necessary. To define the names manually you can use the dims argument:

norm.rvs(size=(5, 3), dims=["subject", "batch"])

<xarray.DataArray (subject: 5, batch: 3, chain: 4, draw: 10, team: 6)>
0.4488 0.4036 0.3949 0.708 0.7141 3.845 ... 0.1895 -0.03576 0.3021 0.5014 1.839
Coordinates:
  * team     (team) <U1 'a' 'b' 'c' 'd' 'e' 'f'
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: subject, batch

xarray.DataArray

subject: 5
batch: 3
chain: 4
draw: 10
team: 6

0.4488 0.4036 0.3949 0.708 0.7141 ... -0.03576 0.3021 0.5014 1.839

array([[[[[ 4.48814474e-01,  4.03605793e-01,  3.94886801e-01,
            7.07997975e-01,  7.14111564e-01,  3.84515585e+00],
          [ 6.76914812e-02,  1.77900734e-01,  2.83191576e-01,
            7.09834530e-01,  3.36857562e-01,  8.09155907e-01],
          [ 9.38637840e-01,  2.52541876e-01,  1.42727073e+00,
            1.39092466e+00,  2.54205800e+00,  3.01375524e+00],
          ...,
          [-3.57839484e-02,  1.67964904e+00,  2.17548516e+00,
           -2.54804591e-01,  1.05886345e+00,  1.22115057e+00],
          [ 7.44444259e-01, -7.15320418e+00,  7.96298168e-01,
            3.77780546e-01,  7.77018572e+00, -3.98688043e-01],
          [ 8.76817435e-01,  5.94383135e-02,  2.59913306e-01,
           -4.19235753e-01,  9.89502160e-01,  6.06266300e-01]],

         [[ 6.25262257e-01,  1.23212657e+00,  2.14067017e-01,
            3.51069367e-01,  1.35274500e+00,  1.80834518e+00],
          [-1.12276740e-01,  2.65000530e+00,  8.17802909e-01,
            1.40679505e-03,  5.16417528e-01,  4.80911950e-01],
          [ 1.64517456e+00,  8.96554995e-01,  4.46294448e+00,
            9.20873807e-02,  6.14296246e-01,  6.68699116e-01],
...
          [ 1.43333280e+00,  5.52076645e+00,  1.85802794e+00,
            3.53500992e+00,  1.02022308e+00, -3.94223722e-01],
          [ 1.10411671e+00, -9.02793159e-02,  2.48887326e-01,
            9.65387822e-02,  6.67743225e-01,  2.28541312e+00],
          [ 8.44030337e-01,  1.43928951e+00,  6.44018250e-01,
            4.50959614e+00,  1.03585981e+00,  1.28420403e-02]],

         [[ 4.38663845e+00,  1.92245996e+00, -3.00146548e-01,
            6.06559465e-01,  2.16664636e+00,  1.31367358e+00],
          [ 5.91616552e-01,  1.07343145e+00, -1.54732271e-02,
            8.62622726e-01,  2.60137285e+00,  1.79985364e+00],
          [ 8.53760890e-01,  1.31066525e+00,  2.57510978e-01,
            4.16870761e-01,  6.27544705e-01, -2.94310343e-01],
          ...,
          [ 9.88714289e-01, -8.33157154e-01,  2.40289667e+00,
            5.27623114e-02, -8.03469491e-01,  5.90965952e-01],
          [ 2.72267084e+00,  1.15634599e+00,  5.12461316e+00,
            2.98752470e-02, -6.16644878e-03,  9.87440617e-02],
          [ 1.25363211e+00,  1.89480216e-01, -3.57639086e-02,
            3.02066243e-01,  5.01398619e-01,  1.83900934e+00]]]]])

Coordinates: (3)
- team
  (team)
  <U1
  'a' 'b' 'c' 'd' 'e' 'f'
```
array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U1')
```
- chain
  (chain)
  int64
  0 1 2 3
```
array([0, 1, 2, 3])
```
- draw
  (draw)
  int64
  0 1 2 3 4 5 6 7 8 9
```
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```

Indexes: (3)

team

PandasIndex

PandasIndex(Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object', name='team'))

chain

PandasIndex

PandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))

draw

PandasIndex

PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='draw'))

Attributes: (0)

The behaviour for other methods is similar:

norm.logcdf(ds["x_plot"])

<xarray.DataArray (plot_dim: 20, chain: 4, draw: 10, team: 6)>
-1.318 -2.617 -6.71 -0.7968 ... -1.248e-263 -7.905e-247 -1.896e-242 -1.636e-170
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9
  * team     (team) <U1 'a' 'b' 'c' 'd' 'e' 'f'
Dimensions without coordinates: plot_dim

xarray.DataArray

plot_dim: 20
chain: 4
draw: 10
team: 6

-1.318 -2.617 -6.71 -0.7968 ... -7.905e-247 -1.896e-242 -1.636e-170

array([[[[-1.31814802e+000, -2.61700460e+000, -6.70984312e+000,
          -7.96758432e-001, -9.76978667e-001, -5.56513656e+001],
         [-3.45516048e+001, -2.24242213e+002, -5.31630197e+002,
          -3.06955601e+003, -7.45460050e+002, -4.11436176e+003],
         [-2.93300270e+001, -1.73142138e+000, -5.85792712e+001,
          -5.01669004e+001, -1.94131977e+002, -2.26088180e+002],
         ...,
         [-2.61065921e+000, -1.93122889e+000, -9.72150741e+000,
          -1.41870701e+000, -6.96407412e-001, -1.06473149e+000],
         [-7.58012758e-001, -7.07393807e-001, -6.96007056e-001,
          -7.52665545e-001, -1.65852828e+000, -8.33622299e-001],
         [-6.64744400e+000, -1.50898434e+000, -1.26070669e+000,
          -8.52837124e-001, -6.19387930e+000, -3.81621506e+000]],

        [[-7.09383202e+000, -3.32685208e+001, -1.24971341e+000,
          -3.51849228e+000, -2.35554038e+001, -7.65938671e+001],
         [-8.01810086e-001, -1.28470739e+001, -1.67267638e+000,
          -7.07539914e-001, -7.78746575e-001, -8.54586498e-001],
         [-8.76925849e+000, -5.23864357e+000, -6.80419766e+001,
          -1.29309098e+000, -1.58693433e+000, -6.45163171e+000],
...
         [-1.42373730e-012, -5.06948860e-005, -1.56198911e-013,
          -2.42543737e-012, -4.84753967e-008, -1.20907445e-013],
         [-6.62302017e-154, -1.30007624e-170, -1.66454121e-165,
          -2.14407919e-153, -3.49925418e-170, -3.12494797e-102],
         [-2.15726326e-242, -1.09992494e-204, -1.47775415e-266,
          -6.21292855e-098, -2.79174378e-230, -2.02479144e-274]],

        [[-2.73164669e-010, -1.14161669e-011, -2.65890355e-012,
          -3.30916639e-015, -1.89695293e-011, -3.41435682e-014],
         [-8.44311081e-039, -1.24194863e-048, -1.76918645e-044,
          -2.74896757e-044, -6.04557188e-040, -2.77926761e-032],
         [-7.54718328e-125, -1.11356671e-125, -2.74207096e-137,
          -5.54265264e-141, -1.52734744e-134, -5.32589611e-149],
         ...,
         [-1.92935103e-010, -8.23601232e-016, -6.14326647e-016,
          -1.76041922e-010, -1.59830023e-015, -2.27373466e-014],
         [-8.64029651e-225, -0.00000000e+000, -9.14158745e-109,
          -0.00000000e+000, -0.00000000e+000, -0.00000000e+000],
         [-5.59473393e-212, -6.68701767e-261, -1.24826806e-263,
          -7.90478090e-247, -1.89573242e-242, -1.63578038e-170]]]])

Coordinates: (3)
- chain
  (chain)
  int64
  0 1 2 3
```
array([0, 1, 2, 3])
```
- draw
  (draw)
  int64
  0 1 2 3 4 5 6 7 8 9
```
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```
- team
  (team)
  <U1
  'a' 'b' 'c' 'd' 'e' 'f'
```
array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U1')
```

Indexes: (3)

chain

PandasIndex

PandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))

draw

PandasIndex

PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='draw'))

team

PandasIndex

PandasIndex(Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object', name='team'))

Attributes: (0)

For convenience, you can also use array_like input which is converted to a DataArray under the hood. In such cases, the dimension name is quantile for ppf and isf, point otherwise. In both cases, the values passed as input are preserved as coordinate values.

norm.ppf([.25, .5, .75])

<xarray.DataArray (quantile: 3, chain: 4, draw: 10, team: 6)>
-0.02018 0.2885 0.8726 -0.204 -0.1332 ... 0.2786 0.2264 0.5523 0.6391 2.198
Coordinates:
  * quantile  (quantile) float64 0.25 0.5 0.75
  * chain     (chain) int64 0 1 2 3
  * draw      (draw) int64 0 1 2 3 4 5 6 7 8 9
  * team      (team) <U1 'a' 'b' 'c' 'd' 'e' 'f'

xarray.DataArray

quantile: 3
chain: 4
draw: 10
team: 6

-0.02018 0.2885 0.8726 -0.204 -0.1332 ... 0.2264 0.5523 0.6391 2.198

array([[[[-2.01800741e-02,  2.88534502e-01,  8.72635176e-01,
          -2.03975466e-01, -1.33188131e-01,  3.54108894e+00],
         [ 6.53432089e-02,  1.82630503e-01,  2.85863098e-01,
           6.97717849e-01,  3.39995801e-01,  8.08900563e-01],
         [ 9.21908029e-01,  3.52705213e-02,  1.37618510e+00,
           1.26087896e+00,  2.63314526e+00,  2.85282312e+00],
         [ 1.51044386e+00,  1.10976904e+00,  3.01749065e-01,
           3.28712217e-01,  4.15856535e-01,  6.86272472e-01],
         [ 1.68346461e+00,  1.90766529e-01,  6.52080659e-01,
           3.28691927e+00, -6.30485387e-01, -1.74309855e-01],
         [ 3.35206762e-01, -4.21692650e-01,  2.31275005e+00,
          -2.65402262e-01, -3.95759952e-01,  4.86217591e-01],
         [-8.91817658e-01, -5.23170648e-01, -9.73468976e-01,
          -1.06294312e+00, -7.87525839e-01, -7.42689463e-01],
         [ 6.09704014e-01,  3.01593573e-01,  2.49260208e+00,
           1.98892826e-02, -5.26864306e-01, -2.16233652e-01],
         [-2.01972699e+00, -2.22678143e+00, -2.27475993e+00,
          -2.04114072e+00,  6.84429482e-01, -1.72745928e+00],
         [ 7.07236992e-01,  2.84026916e-02, -3.08230463e-02,
          -1.47778401e-01,  6.64740709e-01,  4.05233155e-01]],
...
        [[ 3.36845710e+00,  2.79114789e+00,  2.53915094e+00,
           1.46631523e+00,  2.88077401e+00,  1.82656118e+00],
         [ 1.65256981e+00,  5.44380058e-01,  9.96843647e-01,
           1.01821012e+00,  1.51651969e+00,  2.47037290e+00],
         [ 1.16405676e+00,  1.13327546e+00,  7.13650010e-01,
           5.83912315e-01,  8.11228897e-01,  3.08161450e-01],
         [ 7.59299816e-01,  1.52437072e+00,  1.04747369e+00,
           4.38899246e+00,  5.57969292e-01,  2.02782354e+00],
         [ 1.00924180e+00,  1.18050658e+00,  7.38246122e-01,
           4.74937322e-01,  1.21374415e+00,  1.67719381e+00],
         [ 7.26617959e-01,  1.03795855e-01,  1.75509091e-01,
           1.12328161e+00,  1.75687581e+00,  1.19176729e-01],
         [ 3.77360756e+00,  3.96871167e+00,  2.62215755e+00,
           3.86722860e+00,  3.16404457e+00,  3.63702052e+00],
         [ 3.10841342e+00,  1.00368077e+00,  9.59050243e-01,
           3.09079949e+00,  1.10534997e+00,  1.52326193e+00],
         [ 2.96618623e+00,  1.24958516e+00,  5.18136826e+00,
           2.12500618e-01,  2.44317005e-01,  5.77282225e-01],
         [ 1.26613746e+00,  2.78617268e-01,  2.26408430e-01,
           5.52329628e-01,  6.39115142e-01,  2.19785614e+00]]]])

Coordinates: (4)
- quantile
  (quantile)
  float64
  0.25 0.5 0.75
```
array([0.25, 0.5 , 0.75])
```
- chain
  (chain)
  int64
  0 1 2 3
```
array([0, 1, 2, 3])
```
- draw
  (draw)
  int64
  0 1 2 3 4 5 6 7 8 9
```
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```
- team
  (team)
  <U1
  'a' 'b' 'c' 'd' 'e' 'f'
```
array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U1')
```

Indexes: (4)

quantile

PandasIndex

PandasIndex(Index([0.25, 0.5, 0.75], dtype='float64', name='quantile'))

chain

PandasIndex

PandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))

draw

PandasIndex

PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='draw'))

team

PandasIndex

PandasIndex(Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object', name='team'))

Attributes: (0)

pdf = norm.pdf(np.linspace(-5, 5))
pdf

<xarray.DataArray (point: 50, chain: 4, draw: 10, team: 6)>
5.321e-44 2.898e-49 4.753e-60 5.206e-41 ... 3.563e-57 4.449e-55 3.664e-24
Coordinates:
  * point    (point) float64 -5.0 -4.796 -4.592 -4.388 ... 4.388 4.592 4.796 5.0
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9
  * team     (team) <U1 'a' 'b' 'c' 'd' 'e' 'f'

xarray.DataArray

point: 50
chain: 4
draw: 10
team: 6

5.321e-44 2.898e-49 4.753e-60 ... 3.563e-57 4.449e-55 3.664e-24

array([[[[5.32112942e-044, 2.89833875e-049, 4.75267686e-060,
          5.20639765e-041, 3.77941277e-042, 4.58464322e-123],
         [0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
          0.00000000e+000, 0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 1.65727706e-292, 0.00000000e+000,
          0.00000000e+000, 0.00000000e+000, 0.00000000e+000],
         ...,
         [2.82927550e-014, 5.60439261e-013, 1.19120314e-023,
          7.51309532e-012, 8.02590547e-010, 5.99401574e-011],
         [3.52043422e-002, 3.86355488e-002, 3.94562035e-002,
          3.55507344e-002, 7.41966625e-003, 3.06779208e-002],
         [1.68051722e-083, 1.48239781e-065, 4.29484953e-064,
          2.95870563e-061, 2.58830926e-082, 3.01387470e-075]],

        [[3.78800522e-261, 0.00000000e+000, 9.75970877e-224,
          1.52914622e-242, 3.90536154e-313, 0.00000000e+000],
         [4.44188391e-012, 4.55273946e-030, 1.48974971e-014,
          9.79699569e-012, 5.36315769e-012, 2.91782026e-012],
         [5.59330949e-062, 1.49556107e-054, 9.43348872e-132,
          5.23361197e-042, 2.26044690e-043, 2.66371600e-057],
...
         [1.29398933e-003, 2.88800878e-001, 4.54002024e-004,
          1.65300951e-003, 7.64861570e-002, 4.00858834e-004],
         [3.43557945e-032, 4.83914626e-040, 1.35986071e-037,
          5.86493243e-032, 7.80739934e-040, 1.46480235e-011],
         [3.18623540e-052, 1.33608961e-035, 9.70132762e-064,
          9.50182789e-003, 1.08609742e-046, 1.29847885e-067]],

        [[4.19419455e-002, 1.40199747e-002, 8.08073539e-003,
          4.72076406e-004, 1.68745303e-002, 1.33955111e-003],
         [8.74604358e-008, 2.43176903e-012, 2.43193445e-010,
          2.98967786e-010, 2.78522611e-008, 3.64061879e-005],
         [1.65686065e-025, 6.99872898e-026, 2.90568735e-031,
          4.94205654e-033, 5.77033948e-030, 5.85949147e-037],
         ...,
         [2.82791279e-002, 1.52888495e-004, 1.32623410e-004,
          2.73991100e-002, 2.10344722e-004, 7.26945119e-004],
         [5.00870071e-021, 5.44558967e-066, 1.76028966e+000,
          1.91753659e-105, 4.27312596e-104, 1.64147102e-090],
         [4.15911393e-041, 4.79615957e-064, 2.11642726e-065,
          3.56349345e-057, 4.44906238e-055, 3.66397440e-024]]]])

Coordinates: (4)

point

(point)

float64

-5.0 -4.796 -4.592 ... 4.796 5.0

array([-5.      , -4.795918, -4.591837, -4.387755, -4.183673, -3.979592,
       -3.77551 , -3.571429, -3.367347, -3.163265, -2.959184, -2.755102,
       -2.55102 , -2.346939, -2.142857, -1.938776, -1.734694, -1.530612,
       -1.326531, -1.122449, -0.918367, -0.714286, -0.510204, -0.306122,
       -0.102041,  0.102041,  0.306122,  0.510204,  0.714286,  0.918367,
        1.122449,  1.326531,  1.530612,  1.734694,  1.938776,  2.142857,
        2.346939,  2.55102 ,  2.755102,  2.959184,  3.163265,  3.367347,
        3.571429,  3.77551 ,  3.979592,  4.183673,  4.387755,  4.591837,
        4.795918,  5.      ])

chain
(chain)
int64
0 1 2 3
```
array([0, 1, 2, 3])
```
draw
(draw)
int64
0 1 2 3 4 5 6 7 8 9
```
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```

team

(team)

<U1

'a' 'b' 'c' 'd' 'e' 'f'

array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U1')

Indexes: (4)

point

PandasIndex

PandasIndex(Index([               -5.0,  -4.795918367346939,  -4.591836734693878,
        -4.387755102040816,  -4.183673469387755,  -3.979591836734694,
       -3.7755102040816326,  -3.571428571428571,   -3.36734693877551,
        -3.163265306122449, -2.9591836734693877, -2.7551020408163263,
       -2.5510204081632653, -2.3469387755102042,  -2.142857142857143,
       -1.9387755102040813, -1.7346938775510203, -1.5306122448979593,
       -1.3265306122448979, -1.1224489795918364, -0.9183673469387754,
       -0.7142857142857144, -0.5102040816326525, -0.3061224489795915,
       -0.1020408163265305,  0.1020408163265305,  0.3061224489795915,
        0.5102040816326534,  0.7142857142857144,  0.9183673469387754,
        1.1224489795918373,  1.3265306122448983,  1.5306122448979593,
        1.7346938775510203,  1.9387755102040813,  2.1428571428571432,
        2.3469387755102042,  2.5510204081632653,   2.755102040816327,
         2.959183673469388,   3.163265306122449,    3.36734693877551,
         3.571428571428571,   3.775510204081632,   3.979591836734695,
         4.183673469387756,   4.387755102040817,   4.591836734693878,
         4.795918367346939,                 5.0],
      dtype='float64', name='point'))

chain

PandasIndex

PandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))

draw

PandasIndex

PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='draw'))

team

PandasIndex

PandasIndex(Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object', name='team'))

Attributes: (0)

Plot a subset of the pdf we just calculated with matplotlib.

import matplotlib.pyplot as plt
plt.rcParams["figure.facecolor"] = "white"

fig, ax = plt.subplots()
ax.plot(pdf.point, pdf.sel(team="d", chain=2), color="C0", alpha=.5)
ax.set(xlabel="x", ylabel="pdf of normal distribution", );

../_images/792866bc41082fcbb19eb5c4f22936c7c035131f43f84a55536ea7ed8dfe714f.png

Other functions#

The rest of the functions in the module have a very similar API to their scipy counterparts, the only differences are:

They take dims instead of axis. Moreover, dims can be str or a sequence of str instead of a single integer only as supported by axis.
Arguments that take array_like as values take DataArray inputs instead. For example the scale argument in median_abs_deviation
They accept extra arbitrary kwargs, that are passed to xarray.apply_ufunc.

Here are some examples of using functions in the stats module of xarray_einstats with dims argument instead of axis.

hmean(ds["mu"], dims="team")

<xarray.DataArray 'mu' (chain: 4, draw: 10)>
0.1588 0.2123 0.5543 0.7826 0.1913 0.6035 ... 0.1269 0.712 0.3044 0.1936 0.1223
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9

xarray.DataArray

'mu'

chain: 4
draw: 10

0.1588 0.2123 0.5543 0.7826 0.1913 ... 0.712 0.3044 0.1936 0.1223

array([[0.15881038, 0.21226184, 0.55427158, 0.7826267 , 0.19133639,
        0.60348734, 0.17150419, 0.01878916, 0.05505902, 0.18910122],
       [0.31844717, 0.05345659, 0.57211927, 0.45515712, 0.20021752,
        0.75086087, 0.90056515, 0.31490928, 0.33074251, 0.27157278],
       [0.21919034, 0.25996597, 0.3229822 , 0.32907978, 0.37967654,
        0.45405931, 0.35710855, 0.24102518, 0.52642867, 0.25553658],
       [1.34563532, 0.34372887, 0.20387597, 0.71141045, 0.58392139,
        0.1269031 , 0.71202737, 0.30438494, 0.19362311, 0.12227929]])

Coordinates: (2)
- chain
  (chain)
  int64
  0 1 2 3
```
array([0, 1, 2, 3])
```
- draw
  (draw)
  int64
  0 1 2 3 4 5 6 7 8 9
```
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```

Indexes: (2)

chain

PandasIndex

PandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))

draw

PandasIndex

PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='draw'))

Attributes: (0)

rankdata(ds["score"], dims=("chain", "draw"), method="min")

<__array_function__ internals>:200: RuntimeWarning: invalid value encountered in cast

<xarray.DataArray 'score' (match: 12, chain: 4, draw: 10)>
14 14 14 14 14 31 14 1 31 14 31 1 14 1 ... 15 15 15 15 15 1 34 15 15 1 34 34 34
Dimensions without coordinates: match, chain, draw

xarray.DataArray

'score'

match: 12
chain: 4
draw: 10

14 14 14 14 14 31 14 1 31 14 31 1 ... 15 15 15 1 34 15 15 1 34 34 34

array([[[14, 14, 14, 14, 14, 31, 14,  1, 31, 14],
        [31,  1, 14,  1,  1, 14, 38, 31, 14, 31],
        [ 1, 14, 14,  1,  1,  1, 38,  1,  1, 14],
        [31, 31, 14,  1, 14,  1, 14,  1, 40, 14]],

       [[ 1,  1, 14, 14, 14, 14, 35, 14, 14, 14],
        [ 1,  1,  1,  1, 14,  1, 14, 35,  1, 14],
        [ 1, 14,  1, 14, 14, 14, 14, 14,  1, 35],
        [14,  1, 14, 35, 35, 14, 14, 35, 14,  1]],

       [[ 1, 19, 19,  1, 19,  1,  1, 19,  1, 19],
        [36, 19,  1,  1,  1, 29, 36,  1,  1, 29],
        [29, 19, 29, 29, 19,  1, 29, 19,  1,  1],
        [ 1, 36, 29, 36,  1,  1,  1, 36,  1, 19]],

       [[14, 14, 14, 35,  1, 14, 14, 40,  1, 14],
        [14,  1, 38, 14,  1, 14,  1, 14, 14,  1],
        [14, 35,  1,  1, 14, 14,  1, 14, 14, 14],
        [ 1, 14, 14,  1,  1, 35, 14, 38, 14,  1]],

...

       [[17,  1,  1,  1, 34, 17, 38,  1, 17, 17],
        [17, 40, 17,  1,  1, 17,  1,  1, 17, 17],
        [17, 34,  1,  1, 17,  1, 34,  1, 17, 17],
        [17, 17, 17,  1, 17,  1,  1,  1, 38, 34]],

       [[ 1, 20,  1, 20,  1, 34, 37,  1,  1, 34],
        [20,  1,  1,  1,  1,  1, 20, 20,  1,  1],
        [ 1, 20, 20, 20,  1, 37, 20,  1, 20,  1],
        [20, 20,  1, 34, 20, 37,  1,  1, 37, 20]],

       [[14, 32, 14, 14, 39, 32, 14, 14,  1, 39],
        [ 1, 14, 14,  1,  1,  1, 14,  1, 36,  1],
        [14,  1, 14, 14, 14, 36,  1, 14, 14, 32],
        [36, 14, 14,  1,  1, 14, 32, 14,  1,  1]],

       [[15,  1, 15,  1,  1, 39, 15, 15, 40, 15],
        [15,  1,  1,  1, 15,  1,  1, 15, 34,  1],
        [ 1, 15, 15, 15, 15,  1,  1, 15, 15, 15],
        [15, 15,  1, 34, 15, 15,  1, 34, 34, 34]]])

Coordinates: (0)
Indexes: (0)
Attributes: (0)

Important

The statistical summaries and other statistical functions can take both DataArray and Dataset. Methods in probability functions and functions in linear algebra module are tested only on DataArrays.

When using Dataset inputs, you must make sure that all the dimensions in dims are present in all the DataArrays within the Dataset.

skew(ds[["score", "mu", "sigma"]], dims=("chain", "draw"))

<xarray.Dataset>
Dimensions:  (match: 12, team: 6)
Coordinates:
  * team     (team) <U1 'a' 'b' 'c' 'd' 'e' 'f'
Dimensions without coordinates: match
Data variables:
    score    (match) float64 1.466 0.2149 0.6788 1.361 ... 1.099 1.156 1.265
    mu       (team) float64 0.8152 1.84 2.102 1.806 1.091 0.9678
    sigma    float64 1.314

xarray.Dataset

Dimensions:
- match: 12
- team: 6
Coordinates: (1)
- team
  (team)
  <U1
  'a' 'b' 'c' 'd' 'e' 'f'
```
array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U1')
```

Data variables: (3)

score

(match)

float64

1.466 0.2149 0.6788 ... 1.156 1.265

array([1.46574203, 0.21489021, 0.67883816, 1.36145484, 1.20989681,
       1.24353132, 1.27861011, 1.15535825, 2.32419014, 1.09897114,
       1.15573649, 1.26465577])

(team)

float64

0.8152 1.84 2.102 ... 1.091 0.9678

array([0.81519314, 1.83978488, 2.101518  , 1.80570811, 1.09091834,
       0.96780379])

sigma
()
float64
1.314
```
array(1.31435254)
```

Indexes: (1)

team

PandasIndex

PandasIndex(Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object', name='team'))

Attributes: (0)

median_abs_deviation(ds)

<xarray.Dataset>
Dimensions:  ()
Data variables:
    x_plot   float64 2.632
    mu       float64 0.4878
    sigma    float64 0.39
    score    float64 1.0

xarray.Dataset

Dimensions:
Coordinates: (0)
Data variables: (4)
- x_plot
  ()
  float64
  2.632
```
array(2.63157895)
```
- mu
  ()
  float64
  0.4878
```
array(0.48775806)
```
- sigma
  ()
  float64
  0.39
```
array(0.38997393)
```
- score
  ()
  float64
  1.0
```
array(1.)
```
Indexes: (0)
Attributes: (0)

%load_ext watermark
%watermark -n -u -v -iv -w -p xarray_einstats,xarray

Last updated: Tue Jul 11 2023

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 8.14.0

xarray_einstats: 0.6.0
xarray         : 2023.6.0

scipy     : 1.11.1
numpy     : 1.24.4
matplotlib: 3.7.1

Watermark: 2.4.3