Data stuctures - basic overview and demo

07/06/22

This notebook extends the basic overview, including some updated functionality and file IO.

Note that the focus here is on low-level functions and base data structure handling, see the class demo for more general usage, and class data structures.

Firstly, load some demo data to play with…

Load data

[1]:
from pathlib import Path

import epsproc as ep

# Set data path
# Note this is set here from ep.__path__, but may not be correct in all cases - depends on where the Github repo is.
epDemoDataPath = Path(ep.__path__[0]).parent/'data'
OMP: Info #273: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
* sparse not found, sparse matrix forms not available.
* natsort not found, some sorting functions not available.
* Setting plotter defaults with epsproc.basicPlotters.setPlotters(). Run directly to modify, or change options in local env.
* Set Holoviews with bokeh.
* pyevtk not found, VTK export not available.
[2]:
# Load data from modPath\data
dataPath = Path(epDemoDataPath, 'photoionization')
dataFile = Path(dataPath, 'n2_3sg_0.1-50.1eV_A2.inp.out')  # Set for sample N2 data for testing

# Scan data file
dataSet = ep.readMatEle(fileIn = dataFile.as_posix())
data = dataSet[0]
# dataXS = ep.readMatEle(fileIn = dataFile.as_posix(), recordType = 'CrossSection')  # XS info currently not set in NO2 sample file.
*** ePSproc readMatEle(): scanning files for DumpIdy segments.

*** Scanning file(s)
['/home/jovyan/github/epsproc/data/photoionization/n2_3sg_0.1-50.1eV_A2.inp.out']

*** FileListSort
  Prefix: /home/jovyan/github/epsproc/data/photoionization/n2_3sg_0.1-50.1eV_A2.inp.out
  1 groups.

*** Reading ePS output file:  /home/jovyan/github/epsproc/data/photoionization/n2_3sg_0.1-50.1eV_A2.inp.out
*** IO.fileParse() found 1 segments with
        Start: ScatEng
        End: ['#'].
Expecting 51 energy points.
*** IO.fileParse() found 2 segments with
        Start: ScatSym
        End: ['FileName', '\n'].
Expecting 2 symmetries.
*** IO.fileParse() found 102 segments with
        Start: DumpIdy - dump
        End: ['+ Command', 'Time Now'].
Found 102 dumpIdy segments (sets of matrix elements).

Processing segments to Xarrays...
Processed 102 sets of DumpIdy file segments, (0 blank)

Xarray for ND data handling

All the core data is handled as Xarrays. This provides a wrapped for numpy ND arrays, including labelled coordinates, and various computational functions. For more details, see the Xarray Data Structures documentation.

[3]:
# Calling the array will provide some readable summary output
data
[3]:
<xarray.DataArray 'n2_3sg_0.1-50.1eV_A2.inp.out' (LM: 18, Eke: 51, Sym: 2,
                                                  mu: 3, it: 1, Type: 2)>
array([[[[[[           nan          +nanj,
                       nan          +nanj]],

          [[           nan          +nanj,
                       nan          +nanj]],

          [[           nan          +nanj,
                       nan          +nanj]]],


         [[[           nan          +nanj,
                       nan          +nanj]],

          [[           nan          +nanj,
                       nan          +nanj]],

          [[-1.7757076e+00+6.3474768e-01j,
            -1.9403462e+00+6.9465999e-01j]]]],


...


        [[[[           nan          +nanj,
                       nan          +nanj]],

          [[           nan          +nanj,
                       nan          +nanj]],

          [[           nan          +nanj,
                       nan          +nanj]]],


         [[[ 8.9213389e-06-4.6971505e-06j,
                       nan          +nanj]],

          [[           nan          +nanj,
                       nan          +nanj]],

          [[           nan          +nanj,
                       nan          +nanj]]]]]])
Coordinates:
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (LM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
  * mu       (mu) int64 -1 0 1
  * Type     (Type) <U1 'L' 'V'
  * it       (it) int64 1
  * Sym      (Sym) MultiIndex
  - Cont     (Sym) object 'SU' 'PU'
  - Targ     (Sym) object 'SG' 'SG'
  - Total    (Sym) object 'SU' 'PU'
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
Attributes:
    dataType:  matE
    file:      n2_3sg_0.1-50.1eV_A2.inp.out
    fileBase:  /home/jovyan/github/epsproc/data/photoionization
    fileList:  n2_3sg_0.1-50.1eV_A2.inp.out

Xarray functionality

Various low-level functions are available…

Subselect data, see https://docs.xarray.dev/en/latest/user-guide/indexing.html#indexing-and-selecting-data

[4]:
inds = {'Type':'L','Cont':'PU','mu':1}  # Set a dictionary of indexes (dimensions & coordinate labels) for selection
data.sel(inds).squeeze(drop=True).dropna(dim='LM',how='all')  # Select & drop redundant coords
[4]:
<xarray.DataArray 'n2_3sg_0.1-50.1eV_A2.inp.out' (LM: 6, Eke: 51)>
array([[-1.7757076e+00+6.3474768e-01j,  1.1629411e+00-1.3536696e+00j,
        -6.7206736e-01-1.5510867e+00j, -1.1575324e+00-1.1070670e+00j,
        -1.2756814e+00-8.2226340e-01j, -1.2812193e+00-6.5345031e-01j,
        -1.2458736e+00-5.5363257e-01j, -1.1945210e+00-4.9578904e-01j,
        -1.1372859e+00-4.6458490e-01j, -1.0788283e+00-4.5093199e-01j,
        -1.0215038e+00-4.4917215e-01j, -9.6658241e-01-4.5561598e-01j,
        -9.1476800e-01-4.6775578e-01j, -8.6644130e-01-4.8382351e-01j,
        -8.2178344e-01-5.0253208e-01j, -7.8084396e-01-5.2291827e-01j,
        -7.4358153e-01-5.4424449e-01j, -7.0989018e-01-5.6593542e-01j,
        -6.7961724e-01-5.8753627e-01j, -6.5257642e-01-6.0868475e-01j,
        -6.2855780e-01-6.2909188e-01j, -6.0733584e-01-6.4852853e-01j,
        -5.8867600e-01-6.6681581e-01j, -5.7234039e-01-6.8381767e-01j,
        -5.5809256e-01-6.9943523e-01j, -5.4570155e-01-7.1360190e-01j,
        -5.3494519e-01-7.2627923e-01j, -5.2561275e-01-7.3745299e-01j,
        -5.1750696e-01-7.4712960e-01j, -5.1044546e-01-7.5533287e-01j,
        -5.0426146e-01-7.6210091e-01j, -4.9880421e-01-7.6748321e-01j,
        -4.9393887e-01-7.7153823e-01j, -4.8954597e-01-7.7433102e-01j,
        -4.8552088e-01-7.7593124e-01j, -4.8177277e-01-7.7641155e-01j,
        -4.7822364e-01-7.7584598e-01j, -4.7480729e-01-7.7430896e-01j,
        -4.7146809e-01-7.7187415e-01j, -4.6816000e-01-7.6861380e-01j,
...
        -9.0462446e-07-8.0723014e-06j, -3.5692078e-06-8.0823829e-06j,
        -6.0595598e-06-7.2648315e-06j, -8.1505140e-06-5.7875648e-06j,
        -9.7102828e-06-3.8498801e-06j, -1.0685741e-05-1.6486584e-06j,
        -1.1082956e-05+6.4149356e-07j, -1.0948224e-05+2.8775353e-06j,
        -1.0352085e-05+4.9502470e-06j, -9.3770535e-06+6.7823468e-06j,
        -8.1088449e-06+8.3245490e-06j, -6.6304985e-06+9.5509003e-06j,
        -5.0186922e-06+1.0454159e-05j, -3.3416219e-06+1.1041596e-05j,
        -1.6579511e-06+1.1331354e-05j, -1.6474347e-08+1.1349342e-05j,
         1.5437551e-06+1.1126623e-05j,  2.9929859e-06+1.0697236e-05j,
         4.3100846e-06+1.0096395e-05j,  5.4816976e-06+9.3589937e-06j,
         6.5013386e-06+8.5184202e-06j,  7.3683722e-06+7.6056698e-06j,
         8.0869941e-06+6.6486520e-06j,  8.6652467e-06+5.6717845e-06j,
         9.1140134e-06+4.6957574e-06j,  9.4461595e-06+3.7374684e-06j,
         9.6756893e-06+2.8101589e-06j,  9.8170597e-06+1.9235909e-06j,
         9.8845861e-06+1.0844167e-06j,  9.8919648e-06+2.9650008e-07j,
         9.8519366e-06-4.3861456e-07j,  9.7760218e-06-1.1213393e-06j,
         9.6743985e-06-1.7535722e-06j,  9.5558217e-06-2.3383770e-06j,
         9.4276453e-06-2.8795705e-06j,  9.2958830e-06-3.3814926e-06j,
         9.1652969e-06-3.8486844e-06j,  9.0395586e-06-4.2857498e-06j,
         8.9213389e-06-4.6971505e-06j]])
Coordinates:
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 3 5 7 9 11
  - m        (LM) int64 -1 -1 -1 -1 -1 -1
    mu       int64 1
    Type     <U1 'L'
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
Attributes:
    dataType:  matE
    file:      n2_3sg_0.1-50.1eV_A2.inp.out
    fileBase:  /home/jovyan/github/epsproc/data/photoionization
    fileList:  n2_3sg_0.1-50.1eV_A2.inp.out

Standard max, min etc. functionality…

[5]:
data.max()
[5]:
<xarray.DataArray 'n2_3sg_0.1-50.1eV_A2.inp.out' ()>
array(2.9815816-0.10062913j)
[6]:
data.min()
[6]:
<xarray.DataArray 'n2_3sg_0.1-50.1eV_A2.inp.out' ()>
array(-2.4961077+1.463158j)
[7]:
data.mean()
[7]:
<xarray.DataArray 'n2_3sg_0.1-50.1eV_A2.inp.out' ()>
array(0.01696046-0.10577857j)

… and dimension labels can be used here …

[8]:
# Get max value over Eke dim, then sum over other dims.
# data.max(dim='Eke').sum(['Type','Sym','mu'])

# Sum over some dims, then drop any singleton dims
data.sum(dim=['Type','Sym','mu']).squeeze()
[8]:
<xarray.DataArray 'n2_3sg_0.1-50.1eV_A2.inp.out' (LM: 18, Eke: 51)>
array([[-3.71605380e+00+1.32940767e+00j,  2.42148110e+00-2.82071840e+00j,
        -1.39552581e+00-3.21588310e+00j, -2.39259090e+00-2.28447720e+00j,
        -2.62720830e+00-1.68888930e+00j, -2.63057950e+00-1.33587412e+00j,
        -2.55159550e+00-1.12641525e+00j, -2.44146610e+00-1.00385635e+00j,
        -2.32072700e+00-9.36180970e-01j, -2.19861110e+00-9.04510200e-01j,
        -2.07963460e+00-8.97161920e-01j, -1.96613193e+00-9.06571550e-01j,
        -1.85933330e+00-9.27634150e-01j, -1.75986083e+00-9.56773680e-01j,
        -1.66797610e+00-9.91399180e-01j, -1.58371120e+00-1.02957565e+00j,
        -1.50694321e+00-1.06981966e+00j, -1.43743948e+00-1.11096889e+00j,
        -1.37488699e+00-1.15209766e+00j, -1.31891298e+00-1.19246128e+00j,
        -1.26910067e+00-1.23145911e+00j, -1.22500224e+00-1.26860928e+00j,
        -1.18615009e+00-1.30353122e+00j, -1.15206713e+00-1.33593242e+00j,
        -1.12227600e+00-1.36559818e+00j, -1.09630746e+00-1.39238276e+00j,
        -1.07370752e+00-1.41620140e+00j, -1.05404346e+00-1.43702252e+00j,
        -1.03690869e+00-1.45486026e+00j, -1.02192637e+00-1.46976751e+00j,
        -1.00875140e+00-1.48182895e+00j, -9.97071810e-01-1.49115448e+00j,
        -9.86608890e-01-1.49787360e+00j, -9.77116260e-01-1.50212986e+00j,
        -9.68378960e-01-1.50407621e+00j, -9.60211240e-01-1.50387117e+00j,
        -9.52454600e-01-1.50167506e+00j, -9.44975460e-01-1.49764769e+00j,
        -9.37662550e-01-1.49194570e+00j, -9.30424780e-01-1.48472117e+00j,
...
        -1.64037943e-06-1.57815704e-05j, -6.83546690e-06-1.58138539e-05j,
        -1.16917115e-05-1.42356673e-05j, -1.57732610e-05-1.13745017e-05j,
        -1.88253180e-05-7.61757260e-06j, -2.07452290e-05-3.34698710e-06j,
        -2.15448500e-05+1.09827796e-06j, -2.13140560e-05+5.44022550e-06j,
        -2.01899884e-05+9.46627930e-06j, -1.83333687e-05+1.30258207e-05j,
        -1.59114607e-05+1.60227457e-05j, -1.30865485e-05+1.84065656e-05j,
        -1.00086230e-05+2.01635133e-05j, -6.81110040e-06+2.13083880e-05j,
        -3.60864550e-06+2.18774150e-05j, -4.96425237e-07+2.19221130e-05j,
         2.44968629e-06+2.15040930e-05j,  5.17224680e-06+2.06907102e-05j,
         7.63087600e-06+1.95514597e-05j,  9.80056160e-06+1.81549921e-05j,
         1.16697959e-05+1.65667691e-05j,  1.32385190e-05+1.48473112e-05j,
         1.45160560e-05+1.30508715e-05j,  1.55191261e-05+1.12246656e-05j,
         1.62698488e-05+9.40846170e-06j,  1.67940218e-05+7.63451000e-06j,
         1.71194797e-05+5.92785060e-06j,  1.72747458e-05+4.30672650e-06j,
         1.72878942e-05+2.78332630e-06j,  1.71856378e-05+1.36446978e-06j,
         1.69926997e-05+5.25220200e-08j,  1.67313283e-05-1.15385555e-06j,
         1.64210862e-05-2.25880970e-06j,  1.60787198e-05-3.26864064e-06j,
         1.57182289e-05-4.19103410e-06j,  1.53509945e-05-5.03459120e-06j,
         1.49859830e-05-5.80824140e-06j,  1.46300450e-05-6.52096300e-06j,
         8.92133890e-06-4.69715050e-06j]])
Coordinates:
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (LM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
    it       int64 1
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)

Data is available as a numpy ND array at .values

[9]:
type(data.values)
[9]:
numpy.ndarray

As well as the core Xarray functionality, data can be piped directly to any numpy universal function

[10]:
import numpy as np
data.sum(dim=['Type','Sym','mu']).squeeze().pipe(np.abs)
[10]:
<xarray.DataArray 'n2_3sg_0.1-50.1eV_A2.inp.out' (LM: 18, Eke: 51)>
array([[3.94669236e+00, 3.71752915e+00, 3.50562354e+00, 3.30807003e+00,
        3.12323078e+00, 2.95034035e+00, 2.78916671e+00, 2.63978868e+00,
        2.50244053e+00, 2.37739973e+00, 2.26490167e+00, 2.16507430e+00,
        2.07788966e+00, 2.00312906e+00, 1.94036507e+00, 1.88895929e+00,
        1.84807791e+00, 1.81672346e+00, 1.79377904e+00, 1.77805943e+00,
        1.76836310e+00, 1.76351921e+00, 1.76242608e+00, 1.76407883e+00,
        1.76758644e+00, 1.77217939e+00, 1.77720968e+00, 1.78214515e+00,
        1.78656038e+00, 1.79012570e+00, 1.79259489e+00, 1.79379315e+00,
        1.79360598e+00, 1.79196828e+00, 1.78885524e+00, 1.78427412e+00,
        1.77825694e+00, 1.77085494e+00, 1.76213309e+00, 1.75216644e+00,
        1.74103633e+00, 1.72882810e+00, 1.71562858e+00, 1.70152477e+00,
        1.68660214e+00, 1.67094413e+00, 1.65463092e+00, 1.63773946e+00,
        1.62034267e+00, 1.60250954e+00, 1.58430490e+00],
       [5.72116951e+00, 5.57939668e+00, 5.45829928e+00, 5.35340899e+00,
        5.26190276e+00, 5.18211756e+00, 5.11281842e+00, 5.05250484e+00,
        4.99868223e+00, 4.94704790e+00, 4.89062586e+00, 4.81906686e+00,
        4.71870249e+00, 4.57426078e+00, 4.37301149e+00, 4.11057510e+00,
        3.79520959e+00, 3.44681659e+00, 3.09047259e+00, 2.74869545e+00,
        2.43681208e+00, 2.16237401e+00, 1.92696355e+00, 1.72851684e+00,
        1.56319004e+00, 1.42655906e+00, 1.31428001e+00, 1.22239986e+00,
...
        1.62093083e-05, 1.43798945e-05, 1.27561361e-05, 1.13159186e-05,
        1.00357026e-05, 8.89281634e-06, 7.86676012e-06, 6.93998526e-06,
        6.09852142e-06, 5.33258243e-06, 4.63749943e-06, 4.01522690e-06,
        3.47662758e-06, 3.04425233e-06, 2.75248954e-06, 2.63815273e-06,
        2.71837342e-06, 2.97557324e-06, 3.36960890e-06, 3.86001802e-06,
        4.41654201e-06, 5.01916494e-06, 5.65512665e-06, 6.31630124e-06,
        6.99737323e-06, 7.69485423e-06, 8.40630276e-06],
       [0.00000000e+00, 5.38836106e-08, 3.50508997e-07, 1.04088752e-06,
        2.14708757e-06, 3.60133414e-06, 5.30035321e-06, 7.13944302e-06,
        9.02762595e-06, 1.08920494e-05, 1.26775597e-05, 1.43445631e-05,
        1.58665941e-05, 1.72279303e-05, 1.84214641e-05, 1.94467234e-05,
        2.03081267e-05, 2.10134921e-05, 2.15728249e-05, 2.19973870e-05,
        2.22990151e-05, 2.24896512e-05, 2.25810310e-05, 2.25844949e-05,
        2.25108819e-05, 2.23704825e-05, 2.21730379e-05, 2.19277330e-05,
        2.16431740e-05, 2.13273914e-05, 2.09878499e-05, 2.06314019e-05,
        2.02643030e-05, 1.98922356e-05, 1.95202748e-05, 1.91529735e-05,
        1.87943378e-05, 1.84478972e-05, 1.81167325e-05, 1.78035035e-05,
        1.75105166e-05, 1.72397194e-05, 1.69927809e-05, 1.67710682e-05,
        1.65757140e-05, 1.64075971e-05, 1.62673749e-05, 1.61554988e-05,
        1.60721920e-05, 1.60175271e-05, 1.00823366e-05]])
Coordinates:
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (LM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
    it       int64 1
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)

Coordinates

Coordinates are also Xarrays, and accessible via dot notation for individual items, or for all at .coords.

See the Xarray coords documentation for more details.

Note that multiindex coordinates are also supported, as are “non-dimensional” coordinates, which can be used to provide alternative labels for existing dimensions.

[11]:
# Single index numerical coordinate
data.Eke
[11]:
<xarray.DataArray 'Eke' (Eke: 51)>
array([ 0.1,  1.1,  2.1,  3.1,  4.1,  5.1,  6.1,  7.1,  8.1,  9.1, 10.1, 11.1,
       12.1, 13.1, 14.1, 15.1, 16.1, 17.1, 18.1, 19.1, 20.1, 21.1, 22.1, 23.1,
       24.1, 25.1, 26.1, 27.1, 28.1, 29.1, 30.1, 31.1, 32.1, 33.1, 34.1, 35.1,
       36.1, 37.1, 38.1, 39.1, 40.1, 41.1, 42.1, 43.1, 44.1, 45.1, 46.1, 47.1,
       48.1, 49.1, 50.1])
Coordinates:
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
[12]:
# Multiindex coordinate
data.Sym
[12]:
<xarray.DataArray 'Sym' (Sym: 2)>
array([('SU', 'SG', 'SU'), ('PU', 'SG', 'PU')], dtype=object)
Coordinates:
  * Sym      (Sym) MultiIndex
  - Cont     (Sym) object 'SU' 'PU'
  - Targ     (Sym) object 'SG' 'SG'
  - Total    (Sym) object 'SU' 'PU'
[13]:
# All coordinates.
# Here multidimensional coords are `-` and non-dimensional coordinates are unmarked.
data.coords
[13]:
Coordinates:
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (LM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
  * mu       (mu) int64 -1 0 1
  * Type     (Type) <U1 'L' 'V'
  * it       (it) int64 1
  * Sym      (Sym) MultiIndex
  - Cont     (Sym) object 'SU' 'PU'
  - Targ     (Sym) object 'SG' 'SG'
  - Total    (Sym) object 'SU' 'PU'
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
[14]:
# For low-level manipulation, these can be returned Pandas index objects.
data.Eke.to_index()
[14]:
Float64Index([ 0.1,  1.1,  2.1,  3.1,  4.1,  5.1,  6.1,  7.1,  8.1,  9.1, 10.1,
              11.1, 12.1, 13.1, 14.1, 15.1, 16.1, 17.1, 18.1, 19.1, 20.1, 21.1,
              22.1, 23.1, 24.1, 25.1, 26.1, 27.1, 28.1, 29.1, 30.1, 31.1, 32.1,
              33.1, 34.1, 35.1, 36.1, 37.1, 38.1, 39.1, 40.1, 41.1, 42.1, 43.1,
              44.1, 45.1, 46.1, 47.1, 48.1, 49.1, 50.1],
             dtype='float64', name='Eke')
[15]:
data.Sym.to_index()
[15]:
MultiIndex([('SU', 'SG', 'SU'),
            ('PU', 'SG', 'PU')],
           names=['Cont', 'Targ', 'Total'])

Wrapped functionality

For specific functionality, ePSproc has higher-level wrapper for various data-manuipulation tasks built on Xarray, Numpy and Pandas.

[16]:
# Matrix element selector wraps Xarray selection routines & thresholding
ep.matEleSelector(data, thres=1e-2, inds = inds, sq = True)
[16]:
<xarray.DataArray 'n2_3sg_0.1-50.1eV_A2.inp.out' (LM: 3, Eke: 51)>
array([[-1.7757076 +0.63474768j,  1.1629411 -1.3536696j ,
        -0.67206736-1.5510867j , -1.1575324 -1.107067j  ,
        -1.2756814 -0.8222634j , -1.2812193 -0.65345031j,
        -1.2458736 -0.55363257j, -1.194521  -0.49578904j,
        -1.1372859 -0.4645849j , -1.0788283 -0.45093199j,
        -1.0215038 -0.44917215j, -0.96658241-0.45561598j,
        -0.914768  -0.46775578j, -0.8664413 -0.48382351j,
        -0.82178344-0.50253208j, -0.78084396-0.52291827j,
        -0.74358153-0.54424449j, -0.70989018-0.56593542j,
        -0.67961724-0.58753627j, -0.65257642-0.60868475j,
        -0.6285578 -0.62909188j, -0.60733584-0.64852853j,
        -0.588676  -0.66681581j, -0.57234039-0.68381767j,
        -0.55809256-0.69943523j, -0.54570155-0.7136019j ,
        -0.53494519-0.72627923j, -0.52561275-0.73745299j,
        -0.51750696-0.7471296j , -0.51044546-0.75533287j,
        -0.50426146-0.76210091j, -0.49880421-0.76748321j,
        -0.49393887-0.77153823j, -0.48954597-0.77433102j,
        -0.48552088-0.77593124j, -0.48177277-0.77641155j,
        -0.47822364-0.77584598j, -0.47480729-0.77430896j,
        -0.47146809-0.77187415j, -0.46816   -0.7686138j ,
...
        -0.03285559+0.01987271j, -0.03700358+0.02452761j,
        -0.04085651+0.02973065j, -0.04429991+0.03538024j,
        -0.04725264+0.0413641j , -0.04966431+0.04756824j,
        -0.05151118+0.0538834j , -0.05279158+0.06020905j,
        -0.05352147+0.06645575j, -0.05373045+0.07254632j,
        -0.0534582 +0.07841612j, -0.05275147+0.08401286j,
        -0.05166143+0.08929599j, -0.0502415 +0.09423595j,
        -0.04854557+0.09881319j, -0.04662654+0.10301696j,
        -0.044535  +0.10684421j, -0.0423184 +0.11029852j,
        -0.04002053+0.11338895j, -0.03768099+0.11612877j,
        -0.03533487+0.11853475j, -0.03301302+0.12062609j,
        -0.03074179+0.12242359j, -0.02854336+0.12394918j,
        -0.02643608+0.12522506j, -0.02443447+0.12627351j,
        -0.02254992+0.12711632j, -0.02079065+0.12777457j,
        -0.01916237+0.12826846j, -0.01766832+0.12861702j,
        -0.01630984+0.1288382j , -0.01508639+0.12894859j,
        -0.01399609+0.12896359j, -0.01303569+0.12889729j,
        -0.01220109+0.12876253j, -0.01148719+0.12857095j,
        -0.01088842+0.12833299j, -0.01039861+0.12805807j,
        -0.01001131+0.12775445j]])
Coordinates:
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 3 5
  - m        (LM) int64 -1 -1 -1
    mu       int64 1
    Type     <U1 'L'
    it       int64 1
    Sym      object ('SG', 'PU')
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
Attributes:
    dataType:  matE
    file:      n2_3sg_0.1-50.1eV_A2.inp.out
    fileBase:  /home/jovyan/github/epsproc/data/photoionization
    fileList:  n2_3sg_0.1-50.1eV_A2.inp.out

For quick tabulation, any ND array can be restacked and pushed to a Pandas DataFrame.

[17]:
dataRed = ep.matEleSelector(data, thres=1e-1, sq = True)  # Threshold
dataPD, _ = ep.multiDimXrToPD(dataRed, colDims = 'Eke')  # Convert to PD
dataPD
[17]:
Eke 0.1 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 ... 41.1 42.1 43.1 44.1 45.1 46.1 47.1 48.1 49.1 50.1
Cont Targ Total Type l m mu
PU SG PU L 1 -1 1 -1.775708+0.634748j 1.162941-1.353670j -0.672067-1.551087j -1.157532-1.107067j -1.275681-0.822263j -1.281219-0.653450j -1.245874-0.553633j -1.194521-0.495789j -1.137286-0.464585j -1.078828-0.450932j ... -0.461494-0.759895j -0.458083-0.754568j -0.454593-0.748681j -0.451012-0.742291j -0.447330-0.735453j -0.443542-0.728221j -0.439646-0.720642j -0.435642-0.712762j -0.431531-0.704623j -0.427317-0.696266j
1 -1 -1.775708+0.634748j 1.162941-1.353670j -0.672067-1.551087j -1.157532-1.107067j -1.275681-0.822263j -1.281219-0.653450j -1.245874-0.553633j -1.194521-0.495789j -1.137286-0.464585j -1.078828-0.450932j ... -0.461494-0.759895j -0.458083-0.754568j -0.454593-0.748681j -0.451012-0.742291j -0.447330-0.735453j -0.443542-0.728221j -0.439646-0.720642j -0.435642-0.712762j -0.431531-0.704623j -0.427317-0.696266j
3 -1 1 0.075363+0.602171j -0.802725-0.016979j -0.072799+0.966132j 0.621785+0.923677j 1.038312+0.681287j 1.293225+0.411102j 1.452447+0.151245j 1.549842-0.090010j 1.603740-0.311672j 1.624887-0.514366j ... -0.238529-1.106837j -0.250888-1.064943j -0.260839-1.023926j -0.268590-0.983874j -0.274339-0.944857j -0.278272-0.906928j -0.280566-0.870124j -0.281386-0.834471j -0.280885-0.799984j -0.279206-0.766670j
1 -1 0.075363+0.602171j -0.802725-0.016979j -0.072799+0.966132j 0.621785+0.923677j 1.038312+0.681287j 1.293225+0.411102j 1.452447+0.151245j 1.549842-0.090010j 1.603740-0.311672j 1.624887-0.514366j ... -0.238529-1.106837j -0.250888-1.064943j -0.260839-1.023926j -0.268590-0.983874j -0.274339-0.944857j -0.278272-0.906928j -0.280566-0.870124j -0.281386-0.834471j -0.280885-0.799984j -0.279206-0.766670j
5 -1 1 N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N ... -0.017668+0.128617j -0.016310+0.128838j -0.015086+0.128949j -0.013996+0.128964j -0.013036+0.128897j -0.012201+0.128763j -0.011487+0.128571j -0.010888+0.128333j -0.010399+0.128058j -0.010011+0.127754j
1 -1 N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N ... -0.017668+0.128617j -0.016310+0.128838j -0.015086+0.128949j -0.013996+0.128964j -0.013036+0.128897j -0.012201+0.128763j -0.011487+0.128571j -0.010888+0.128333j -0.010399+0.128058j -0.010011+0.127754j
V 1 -1 1 -1.940346+0.694660j 1.258540-1.467049j -0.723458-1.664796j -1.235059-1.177410j -1.351527-0.866626j -1.349360-0.682424j -1.305722-0.572783j -1.246945-0.508067j -1.183441-0.471596j -1.119783-0.453578j ... -0.454402-0.706387j -0.450419-0.700769j -0.446380-0.694730j -0.442274-0.688328j -0.438094-0.681614j -0.433838-0.674636j -0.429504-0.667439j -0.425096-0.660061j -0.420614-0.652538j -0.416065-0.644901j
1 -1 -1.940346+0.694660j 1.258540-1.467049j -0.723458-1.664796j -1.235059-1.177410j -1.351527-0.866626j -1.349360-0.682424j -1.305722-0.572783j -1.246945-0.508067j -1.183441-0.471596j -1.119783-0.453578j ... -0.454402-0.706387j -0.450419-0.700769j -0.446380-0.694730j -0.442274-0.688328j -0.438094-0.681614j -0.433838-0.674636j -0.429504-0.667439j -0.425096-0.660061j -0.420614-0.652538j -0.416065-0.644901j
3 -1 1 0.068357+0.571167j -0.773225-0.019394j -0.073996+0.936982j 0.600334+0.898848j 1.002170+0.663088j 1.243120+0.400846j 1.388841+0.150941j 1.474054-0.078382j 1.517927-0.286693j 1.531724-0.475379j ... -0.198046-1.000173j -0.208607-0.960881j -0.217039-0.922640j -0.223549-0.885520j -0.228328-0.849569j -0.231558-0.814820j -0.233405-0.781289j -0.234023-0.748981j -0.233553-0.717891j -0.232123-0.688004j
1 -1 0.068357+0.571167j -0.773225-0.019394j -0.073996+0.936982j 0.600334+0.898848j 1.002170+0.663088j 1.243120+0.400846j 1.388841+0.150941j 1.474054-0.078382j 1.517927-0.286693j 1.531724-0.475379j ... -0.198046-1.000173j -0.208607-0.960881j -0.217039-0.922640j -0.223549-0.885520j -0.228328-0.849569j -0.231558-0.814820j -0.233405-0.781289j -0.234023-0.748981j -0.233553-0.717891j -0.232123-0.688004j
5 -1 1 N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N ... -0.015309+0.114632j -0.014113+0.114629j -0.013048+0.114550j -0.012108+0.114408j -0.011289+0.114217j -0.010583+0.113988j -0.009985+0.113730j -0.009487+0.113451j -0.009084+0.113160j -0.008767+0.112863j
1 -1 N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N ... -0.015309+0.114632j -0.014113+0.114629j -0.013048+0.114550j -0.012108+0.114408j -0.011289+0.114217j -0.010583+0.113988j -0.009985+0.113730j -0.009487+0.113451j -0.009084+0.113160j -0.008767+0.112863j
SU SG SU L 1 0 0 2.736321-0.092697j -2.317060+1.358736j 0.168885+2.638956j 1.145145+2.344438j 1.545305+2.063919j 1.734959+1.869703j 1.835277+1.734989j 1.894457+1.634959j 1.934656+1.551998j 1.966673+1.473091j ... 0.003980+0.442685j -0.018265+0.443131j -0.039535+0.442919j -0.059880+0.442099j -0.079345+0.440715j -0.097967+0.438807j -0.115778+0.436415j -0.132807+0.433574j -0.149080+0.430319j -0.164618+0.426681j
3 0 0 -0.171588-0.795292j 1.105722-0.087176j -0.050239-1.375970j -1.071308-1.224861j -1.691984-0.789288j -2.079262-0.294961j -2.318548+0.212576j -2.446422+0.725808j -2.474172+1.243974j -2.398817+1.763415j ... -0.148369-0.160137j -0.177046-0.149018j -0.202882-0.137853j -0.226128-0.126750j -0.247008-0.115796j -0.265720-0.105062j -0.282444-0.094601j -0.297340-0.084456j -0.310553-0.074661j -0.322214-0.065240j
5 0 0 N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N ... 0.088802-0.064267j 0.091587-0.068078j 0.094163-0.071898j 0.096537-0.075716j 0.098717-0.079522j 0.100709-0.083306j 0.102521-0.087059j 0.104158-0.090774j 0.105627-0.094443j 0.106932-0.098059j
V 1 0 0 2.981582-0.100629j -2.496108+1.463158j 0.179285+2.808228j 1.204096+2.465973j 1.608059+2.148434j 1.789438+1.929371j 1.879352+1.778193j 1.929329+1.667535j 1.962587+1.578229j 1.990124+1.496260j ... 0.015975+0.420042j -0.005443+0.418377j -0.025715+0.416186j -0.044915+0.413527j -0.063112+0.410452j -0.080365+0.407005j -0.096731+0.403230j -0.112258+0.399164j -0.126993+0.394841j -0.140979+0.390292j
3 0 0 -0.181065-0.833214j 1.165772-0.093119j -0.053699-1.447490j -1.119518-1.279028j -1.751213-0.816219j -2.130835-0.301405j -2.354209+0.217159j -2.464400+0.733184j -2.476613+1.248347j -2.390015+1.761736j ... -0.116277-0.160735j -0.143786-0.149539j -0.168320-0.138401j -0.190173-0.127428j -0.209605-0.116699j -0.226851-0.106274j -0.242118-0.096198j -0.255596-0.086501j -0.267451-0.077206j -0.277837-0.068325j
5 0 0 N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N N00000000a00000000N ... N00000000a00000000N N00000000a00000000N N00000000a00000000N 0.080639-0.061209j 0.082625-0.064518j 0.084440-0.067811j 0.086093-0.071082j 0.087590-0.074325j 0.088940-0.077533j 0.090149-0.080702j

18 rows × 51 columns

Data models & types

Some core data types are defined in ep.util.listFuncs.dataTypesList(). These provide a reference for other functionality - although, in general, the use of Xarrays should make most routines agnostic to dimension names and ordering, some routines look for specific dimensions.

[18]:
# Calling directly returns a full dictionary, or a specific dataType can be requested by key
ep.util.listFuncs.dataTypesList()['matE']
[18]:
{'source': 'epsproc.IO.readMatEle',
 'desc': 'Raw photoionization matrix elements from ePS, DumpIdy command and file segments.',
 'recordType': 'DumpIdy',
 'dims': {'LM': ['l', 'm'], 'Sym': ['Cont', 'Targ', 'Total']},
 'def': <function epsproc.util.listFuncs.matEdimList(sType='stacked')>}
[19]:
# The 'def' field references the base dataType definition for more options, e.g. to check stacked (multiindex) vs. unstacked definitions
print(ep.util.listFuncs.dataTypesList()['matE']['def'](sType = 'stacked'))
print(ep.util.listFuncs.dataTypesList()['matE']['def'](sType = 'unstacked'))
['LM', 'Eke', 'Sym', 'mu', 'it', 'Type']
['l', 'm', 'Eke', 'Cont', 'Targ', 'Total', 'mu', 'it', 'Type']
[20]:
# The `sDict` type returns the stacked dim mappings
ep.util.listFuncs.dataTypesList()['matE']['def'](sType = 'sDict')
[20]:
{'LM': ['l', 'm'], 'Sym': ['Cont', 'Targ', 'Total']}

Based on this, it is easy to swap and rearrage dimensions (again, see the Xarray documentation for more details), and some wrappers are provided for higher-level use.

[21]:
# Unstack all dims with Xarray functionality
dataUnstacked = data.unstack()
dataUnstacked.coords
[21]:
Coordinates:
  * mu       (mu) int64 -1 0 1
  * Type     (Type) <U1 'L' 'V'
  * it       (it) int64 1
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
  * l        (l) int64 1 3 5 7 9 11
  * m        (m) int64 -1 0 1
  * Cont     (Cont) object 'PU' 'SU'
  * Targ     (Targ) object 'SG'
  * Total    (Total) object 'PU' 'SU'

The ep.util.misc.restack() routine wraps some core functionality to restack arrays according to defined ePSproc dataTypes, and will try and restack according to self.attrs['dataType'] by default.

[22]:
# "Safe" dimension restacker
# This will always skip missing dims rather than throwing errors.
dataRestacked, dims = ep.util.misc.restack(dataUnstacked)
dataRestacked.coords
[22]:
Coordinates:
  * mu       (mu) int64 -1 0 1
  * Type     (Type) <U1 'L' 'V'
  * it       (it) int64 1
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
  * Sym      (Sym) MultiIndex
  - Cont     (Sym) object 'PU' 'PU' 'SU' 'SU'
  - Targ     (Sym) object 'SG' 'SG' 'SG' 'SG'
  - Total    (Sym) object 'PU' 'SU' 'PU' 'SU'
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (LM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
[23]:
# Try restacking to a different dataType
# The default behaviour here is to skip/ignore missing dimensions
dataRestacked, dims = ep.util.misc.restack(dataUnstacked, refDims = 'BLM')
dataRestacked.coords
[23]:
Coordinates:
  * mu       (mu) int64 -1 0 1
  * Type     (Type) <U1 'L' 'V'
  * it       (it) int64 1
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
  * Cont     (Cont) object 'PU' 'SU'
  * Targ     (Targ) object 'SG'
  * Total    (Total) object 'PU' 'SU'
  * BLM      (BLM) MultiIndex
  - l        (BLM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (BLM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
[24]:
# The returned dims variable is a dictionary with various dim lists, differences and intersections, which is used by the restacked
# TODO: naming and definitions here!
# See ep.util.misc.checkDims()
dims
[24]:
{'dataDims': ('Eke', 'mu', 'it', 'Type', 'l', 'm', 'Cont', 'Targ', 'Total'),
 'dataDimsUS': ('Eke', 'mu', 'it', 'Type', 'l', 'm', 'Cont', 'Targ', 'Total'),
 'refDims': {'BLM': ['l', 'm'], 'Euler': ['P', 'T', 'C']},
 'refDimsUS': ['l', 'm', 'P', 'T', 'C'],
 'shared': [],
 'extra': ['it', 'Total', 'Targ', 'Eke', 'm', 'mu', 'Type', 'Cont', 'l'],
 'extraUS': ['it', 'Total', 'Targ', 'Eke', 'mu', 'Type', 'Cont'],
 'invalid': ['BLM', 'Euler'],
 'invalidUS': ['C', 'T', 'P'],
 'stacked': [],
 'stackedMap': {},
 'stackedShared': [],
 'stackedExtra': [],
 'stackedInvalid': ['Euler', 'BLM'],
 'missing': ['Euler', 'BLM'],
 'safeStack': {'BLM': ['l', 'm']}}
[25]:
# Restack with extra stacked dims - forceUnstack = False
# This will leave extra stacked dims intact
# Note this will fail if the "extra" stacked dims include dims which should be stacked elsewhere according to refDims.
daR, _ = ep.util.misc.restack(data.stack({'Test':['Eke','it']}), forceUnstack = False)  # OK!
# daR, _ = ep.util.misc.restack(dataTest.stack({'Test':['Eke','it','Targ']}), dataTypesList()[dataType]['def'](sType='sDict'), forceUnstack = False)  # FAILS - 'Targ' in both 'Test' stack, and refDims.
                                                                                                                                                    # This case is currently not defined in checkDims()
daR.coords
[25]:
Coordinates:
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (LM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
  * mu       (mu) int64 -1 0 1
  * Type     (Type) <U1 'L' 'V'
  * Sym      (Sym) MultiIndex
  - Cont     (Sym) object 'SU' 'PU'
  - Targ     (Sym) object 'SG' 'SG'
  - Total    (Sym) object 'SU' 'PU'
    Ehv      (Test) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Test) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
  * Test     (Test) MultiIndex
  - Eke      (Test) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
  - it       (Test) int64 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1 1
[26]:
# Restack with missing dims

# Copy data & drop a dimension
dataTest = data.copy()
dataTest['Sym'] = dataTest.indexes['Sym'].droplevel('Cont')

daR, _ = ep.util.misc.restack(dataTest)  # With missing dims
daR.coords
[26]:
Coordinates:
  * mu       (mu) int64 -1 0 1
  * Type     (Type) <U1 'L' 'V'
  * it       (it) int64 1
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
  * Sym      (Sym) MultiIndex
  - Targ     (Sym) object 'SG' 'SG'
  - Total    (Sym) object 'PU' 'SU'
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (LM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
[27]:
# Restack with missing dims & add dims
# Note added dims have unset coord values, but will be correctly dimensioned
daR, _ = ep.util.misc.restack(dataTest, conformDims = True)  # With missing dims
daR.coords
Added dim Cont
[27]:
Coordinates:
  * mu       (mu) int64 -1 0 1
  * Type     (Type) <U1 'L' 'V'
  * it       (it) int64 1
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
  * Sym      (Sym) MultiIndex
  - Cont     (Sym) object 'U' 'U'
  - Targ     (Sym) object 'SG' 'SG'
  - Total    (Sym) object 'PU' 'SU'
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (LM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1

Basic data IO (Xarray data file read/write)

For Xarrays, these can be written to disk and read back with ep.IO.writeXarray() and ep.IO.readXarray(). These make use of Xarray’s netCDF writer, and wrap some additional complex number and dim handling for ePSproc cases.

For other low-level data IO options, see the Xarray documentation. Higher level routines are currently in development for ePSproc & PEMtk, see further notes at https://github.com/phockett/ePSproc/issues/8 and https://github.com/phockett/PEMtk/issues/6.

[28]:
dataPath = Path(epDemoDataPath, 'photoionization')
dataFile = Path(dataPath, 'n2_3sg_0.1-50.1eV_A2.nc')  # Set for sample N2 data for testing

ep.IO.writeXarray(data, fileName = dataFile.as_posix(), filePath = dataPath.as_posix())   # Default case set as: engine = 'h5netcdf', forceComplex = False
['Written to h5netcdf format', '/home/jovyan/github/epsproc/data/photoionization/n2_3sg_0.1-50.1eV_A2.nc.nc']
[28]:
['Written to h5netcdf format',
 '/home/jovyan/github/epsproc/data/photoionization/n2_3sg_0.1-50.1eV_A2.nc.nc']
[29]:
dataIn = ep.IO.readXarray(fileName = dataFile.as_posix() + '.nc', filePath = dataPath.as_posix())  #, forceComplex=forceComplex, forceArray=False)
[30]:
dataIn
[30]:
<xarray.DataArray (Eke: 51, mu: 3, it: 1, Type: 2, Sym: 4, LM: 18)>
array([[[[[[           nan          +nanj,
                       nan          +nanj,
            -1.7757076e+00+6.3474768e-01j, ...,
                       nan          +nanj,
                       nan          +nanj,
                       nan          +nanj],
           [           nan          +nanj,
                       nan          +nanj,
                       nan          +nanj, ...,
                       nan          +nanj,
                       nan          +nanj,
                       nan          +nanj],
           [           nan          +nanj,
                       nan          +nanj,
                       nan          +nanj, ...,
                       nan          +nanj,
                       nan          +nanj,
                       nan          +nanj],
           [           nan          +nanj,
                       nan          +nanj,
...
                       nan          +nanj,
                       nan          +nanj],
           [           nan          +nanj,
                       nan          +nanj,
                       nan          +nanj, ...,
                       nan          +nanj,
                       nan          +nanj,
                       nan          +nanj],
           [           nan          +nanj,
                       nan          +nanj,
                       nan          +nanj, ...,
                       nan          +nanj,
                       nan          +nanj,
                       nan          +nanj],
           [           nan          +nanj,
                       nan          +nanj,
                       nan          +nanj, ...,
                       nan          +nanj,
                       nan          +nanj,
                       nan          +nanj]]]]]])
Coordinates:
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
  * Type     (Type) object 'L' 'V'
  * it       (it) int64 1
  * mu       (mu) int64 -1 0 1
    SF       (Eke) complex128 (2.1560627+3.741674j) ... (4.4127053+1.8281945j)
  * Sym      (Sym) MultiIndex
  - Cont     (Sym) object 'PU' 'PU' 'SU' 'SU'
  - Targ     (Sym) object 'SG' 'SG' 'SG' 'SG'
  - Total    (Sym) object 'PU' 'SU' 'PU' 'SU'
  * LM       (LM) MultiIndex
  - l        (LM) int64 1 1 1 3 3 3 5 5 5 7 7 7 9 9 9 11 11 11
  - m        (LM) int64 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1
Attributes:
    dataType:  matE
    file:      n2_3sg_0.1-50.1eV_A2.inp.out
    fileBase:  /home/jovyan/github/epsproc/data/photoionization
    fileList:  n2_3sg_0.1-50.1eV_A2.inp.out
[31]:
# Testing for equality with the original currently returns false - dim ordering might be the issue here?
dataIn.equals(data)
[31]:
False
[32]:
# Subtraction indicates identical data however.
(dataIn - data).max()
[32]:
<xarray.DataArray ()>
array(0.+0.j)

Low-level routines

The Xarray routines can be used directly, although may require additional args. Note that the netcdf writer always writes to DataSet format, although there are readers for both dataset and dataarrays.

[33]:
# Read file with base open_dataset
import xarray as xr
dataIn = xr.open_dataset(dataFile.as_posix() + '.nc', engine = 'h5netcdf')
dataIn
[33]:
<xarray.Dataset>
Dimensions:  (Cont: 2, Eke: 51, mu: 3, it: 1, Type: 2, l: 6, m: 3, Targ: 1,
              Total: 2)
Coordinates:
  * Cont     (Cont) object 'PU' 'SU'
    Ehv      (Eke) float64 15.68 16.68 17.68 18.68 ... 62.68 63.68 64.68 65.68
  * Eke      (Eke) float64 0.1 1.1 2.1 3.1 4.1 5.1 ... 46.1 47.1 48.1 49.1 50.1
    SFi      (Eke) float64 3.742 3.628 3.524 3.428 ... 1.871 1.857 1.842 1.828
    SFr      (Eke) float64 2.156 2.224 2.289 2.353 ... 4.311 4.345 4.379 4.413
  * Targ     (Targ) object 'SG'
  * Total    (Total) object 'PU' 'SU'
  * Type     (Type) object 'L' 'V'
  * it       (it) int64 1
  * l        (l) int64 1 3 5 7 9 11
  * m        (m) int64 -1 0 1
  * mu       (mu) int64 -1 0 1
Data variables:
    Im       (Eke, mu, it, Type, l, m, Cont, Targ, Total) float64 nan ... nan
    Re       (Eke, mu, it, Type, l, m, Cont, Targ, Total) float64 nan ... nan
[34]:
# Read file with base open_dataarray - this may fail for datasets
dataIn = xr.open_dataarray(dataFile.as_posix() + '.nc', engine = 'h5netcdf')
dataIn
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [34], in <cell line: 2>()
      1 # Read file with base open_dataarray - this may fail for datasets
----> 2 dataIn = xr.open_dataarray(dataFile.as_posix() + '.nc', engine = 'h5netcdf')
      3 dataIn

File /opt/conda/lib/python3.9/site-packages/xarray/backends/api.py:670, in open_dataarray(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    652 dataset = open_dataset(
    653     filename_or_obj,
    654     decode_cf=decode_cf,
   (...)
    666     **kwargs,
    667 )
    669 if len(dataset.data_vars) != 1:
--> 670     raise ValueError(
    671         "Given file dataset contains more than one data "
    672         "variable. Please read with xarray.open_dataset and "
    673         "then select the variable you want."
    674     )
    675 else:
    676     (data_array,) = dataset.data_vars.values()

ValueError: Given file dataset contains more than one data variable. Please read with xarray.open_dataset and then select the variable you want.

Complex number and attribute handling

Xarray methods for complex data have some limitations…

  1. Issues with complex data, which is not supported by netcdf - either convert to Re+Im format or use h5netcdf backend with forceComplex=True to workaround (but need to set this on file read too).
  2. Issues with nested dict attribs.
  3. Issues with tuples, esp. in Euler dim coords.

With h5netcdf and invalid_netcdf option (1) is OK (see Xarray docs for details), although still needs unstack, and may also need to set ‘to_dataset’ for more control, otherwise can get arb named items in file (if dataarray name is missing).

For general attrib handling, ep.IO.sanitizeAttrsNetCDF() attempts to quickly clear this up at file IO if there is an exception raised, although may be lossy in some cases.

TODO:

Versions

[35]:
import scooby
scooby.Report(additional=['epsproc', 'holoviews', 'hvplot', 'xarray', 'matplotlib', 'bokeh'])
[35]:
Tue Jun 07 21:17:13 2022 UTC
OS Linux CPU(s) 32 Machine x86_64 Architecture 64bit
RAM 50.1 GiB Environment Jupyter
Python 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0]
epsproc 1.3.1-dev holoviews 1.14.8 hvplot 0.8.0 xarray 2022.3.0
matplotlib 3.5.1 bokeh 2.4.2 numpy 1.21.5 scipy 1.8.0
IPython 8.1.1 scooby 0.5.12
[36]:
# Check current Git commit for local ePSproc version
from pathlib import Path
!git -C {Path(ep.__file__).parent} branch
!git -C {Path(ep.__file__).parent} log --format="%H" -n 1
* dev
  master
  numba-tests
  pkgUpdates
d87199802bc7f64cde181fedb08a614be9de6a24
[38]:
# Check current remote commits
!git ls-remote --heads git://github.com/phockett/ePSproc
fatal: unable to connect to github.com:
github.com[0: 140.82.112.3]: errno=Connection refused

d87199802bc7f64cde181fedb08a614be9de6a24