epsproc.classes.base module

Core classes for ePSproc data.

13/10/20 v1 Started class development, reverse engineering a little from multiJob.py case.

TODO:

  • Centralise subselection function.
  • Error checking on datatype per key for plotters etc.
class epsproc.classes.base.ePSbase(fileBase=None, fileIn=None, prefix=None, ext='.out', Edp=1, verbose=1, thres=0.01, thresDims='Eke', selDims={'Type': 'L'})[source]

Bases: object

Base class for ePSproc.

Define data model for a single ePS job, defined as a specific ionization event (but may involve multiple ePS output files over a range of Ekes and symmetries).

Define methods as wrappers for existing ePSproc functions on self.data.

13/10/20 v1, pulling code from ePSmultiJob().

  • Read datasets from a single dir (uses epsproc.readMatEle()).
  • Sort to dictionaries and Xarray datasets (needs some work).
  • Basic selection, plotting and calculation wrappers in development.
Parameters:
  • fileBase (str or Path object, default = None) – Base directory to scan for ePS files, subdirs will NOT be searched. Use ePSmultiJob class for multi-dir scanning case.
  • prefix (str, optional, default = None) – Set prefix string for file checks (cf. wfPlot class). Only necessary if automated file sorting fails.
  • ext (str, optional, default = '.out') – Set default file extension for dir scanning. This should match the file extension for ePolyScat output files.
  • Edp (int, optional, default = 2) – Set default dp for Ehv conversion. May want to set this elsewhere instead… maybe just for plotting? TODO: also consider axis reindex, lookups and interp functions here - useful for differences between datasets.
  • verbose (int, optional, default = 1) – Set verbosity level for printing/error checking. Not yet fully implemented, but, generally: - 0, no printed output. - 1, basic printed info. - 2, print all info, including subfunction outputs.

TODO:

  • verbosity levels, subtract for subfunctions? Or use a dict to handle multiple levels?
  • Stick with .data for all data, or just promote to top-level keys per dataset? This might be neater overall.
  • Change to decorators for basic function wrappers - should be cleaner, and enforce method/style.
  • Check file IO logic, some of this is already handled in lower level codes.
ADMplot(dataType='ADM', xDim='t', Etype='t', col=None, **kwargs)

Wrap BLMplot() for ADMs. Thin wrapper with some ADM-specific defaults.

TODO: make this good.

AFBLM(keys=None, **kwargs)[source]

Basic wrapper for epsproc.geomFunc.afblmXprod().

Currently set to calculate for all data, with afblmXprod defaults, unless additional kwargs passed.

TODO: - Add subselection here?

BLMplot(Erange=None, Etype='Eke', dataType='AFBLM', xDim=None, selDims=None, col='Labels', row=None, thres=None, keys=None, verbose=None, backend='xr', overlay=None, **kwargs)

Basic BLM line plots using Xarray plotter.

See https://epsproc.readthedocs.io/en/latest/methods/geometric_method_dev_pt3_AFBLM_090620_010920_dev_bk100920.html

Similar to epsproc.BLMplot(), may change to simple wrapper, but some differenences in terms of dim handling here.

For more flexibility, use self.lmPlot.

TODO: update BLMplot to support more datatypes, and implement here instead.

TODO: fix dim handling and subselection, see old plotting code.

24/11/21: quick additions, override printing with “verbose”, and added backend option for XR or Holoviews plotters.
Note this currently uses hvplot functionality, see https://hvplot.holoviz.org/user_guide/Gridded_Data.html. UPDATE: currently not working due to unhandled dims at Holomap stack - see tmo-dev for method.

05/06/21: added **kwargs pass to Xarray line plot 03/02/21: added col, row arguments for flexibility on calling. Still needs automated dim handling.

Esubset(key=None, dataType=None, Erange=None, Etype=None)

Basic Etype subselection & slice routine for base class and plot wrappers.

Will return view of data only (http://xarray.pydata.org/en/stable/indexing.html#copies-vs-views).

TODO: add matEleSelector for thresholding here too?

Q: currently set for single dataset. May want to handle multiple & set consistent Erange, as per original paradigm.

Note: originally written for E-subselection, but should work on any slicable dataType + dim. NOTE: may have issues using .sel for MultiIndex coord slices if all inds are not specified, see http://xarray.pydata.org/en/stable/indexing.html#multi-level-indexing

MFBLM(keys=None, **kwargs)[source]

Basic wrapper for epsproc.geomFunc.mfblmXprod().

Currently set to calculate for all data, with mfblmXprod defaults, unless additional kwargs passed.

TODO: - Add subselection here?

afpadNumeric(keys=None, **kwargs)[source]

AFPADs “direct” (numerical), without beta parameter computation.

Wrapper for epsproc.afblmGeom.AFwfExp(), loops over all loaded datasets.

NOTE: this is preliminary and unverified.

Parameters:
  • keys (str, int or list, optional, default = None) – If set, use only these datasets (keys). Otherwise run for all datasets in self.data.
  • **kwargs (optional) – Args passed to epsproc.mfpad().
  • NOTE (for large datasets and/or large res, this can be memory-hungry.) –
Returns:

Return type:

None, but sets self.data[key][‘TXaf’] and self.data[key][‘DeltaKQS’] for each key.

jobsSummary()

Print some general info.

TODO: add some info!

lmPlot(Erange=None, Etype='Eke', dataType='matE', xDim=None, keys=None, refDataKey=None, reindexTol=0.5, reindexFill=<sphinx.ext.autodoc.importer._MockObject object>, setPD=True, **kwargs)

Wrapper for epsproc.lmPlot() for multijob class. Runs lmPlot() for each dataset.

Parameters:
  • Erange (list of int or float, optional, default = None) – Set plot range [Emin, Emax]. Defaults to full data range if not set.
  • Etype (str, optional, default = 'Eke') – Set plot dimension, either ‘Eke’ (electron kinetic energy) or ‘Ehv’ (photon energy).
  • dataType (str, optional, default = 'matE') – Set data type to plot, corresponding to label in self.data - ‘matE’ raw matrix elements. - ‘AFBLM’ computed AF BLMs.
  • xDim (str, optional, default = None) – Settings for x-axis, if None plot vs. Etype. See epsproc.lmPlot() for more details.
  • keys (list, optional, default = None) – Keys for datasets to plot. If None, all datasets will be plotted.
  • refDataKey (str or int, optional, default = None) – If set, calculate difference plots against reference dataset. This must be a key in self.data. TODO: implement difference plots. TODO: implement testing logic, may fail without E-axis forcing, and sym summation?
  • reindexTol (float, optional, default = 0.1) – If computing difference data, the reference data is reindexed to ensure E grid matching. This specifies tolerance (in E units, usually eV) for reindexing. If this fails, difference plot may be null.
  • reindexFill (int or float, optional, default = NaN) – Value to use for missing values upon reindexing. Default matches [Xarray.reindex default](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.reindex.html), i.e. NaN, but this may give issues in some cases.
  • setPD (bool, optional, default = True) – Set Pandas array in main dataset?
  • kwargs (dict, optional, default = {}) – Plotting options to pass to epsproc.lmPlot(). These will also be set in self.lmPlotOpts for further use. Note that any existing options in self.lmPlotOpts will also be used, or overwritten if matching keys are found.

Notes

Basic scheme from ePSmultijob.plotGetCro, which loops and switches on Eke/Ehv. Should tidy up at some point.

matEtoPD(keys=None, xDim='Eke', Erange=None, dataType='matE', printTable=True, selDims=None, pType=None, thres=None, drop=True, fillna=False, squeeze=True, setPD=True)

Convert Xarray to PD for nice tabular display.

Basically code as per basicPlotters.lmPlot(), but looped over datasets.

30/10/20 Added & reworked from multiJob test code.
Changed output to nest in existing Xarray & allow multiple datatypes
mfpadNumeric(keys=None, **kwargs)[source]

MFPADs “direct” (numerical), without beta parameter computation.

Wrapper for epsproc.mfpad(), loops over all loaded datasets.

Parameters:
  • keys (str, int or list, optional, default = None) – If set, use only these datasets (keys). Otherwise run for all datasets in self.data.
  • **kwargs (optional) – Args passed to epsproc.mfpad().
  • NOTE (for large datasets and/or large res, this can be memory-hungry.) –
molSummary(dataKey=None, tolConv=0.01)
padPlot(selDims={}, sumDims={'Sym', 'it'}, Erange=None, Etype='Eke', keys=None, dataType='TX', facetDims=None, squeeze=False, reducePhi=None, pType='a', pStyle='polar', returnFlag=False, plotDict='plots', backend='mpl')

Plot I(theta,phi) data from BLMs or gridded datasets.

reducePhi : optional, default = None This allow phi selection or summation for parameters which required Ylm expansion before plotting.

Pass ‘sum’ to sum over phi before plotting. Pass a value to select.

TODO: fix dim handling for pl case, need to pass facetDim != None. TODO: return plot objects. Probably to self.data[key][pStyle], or as dictionary of plots per run with data? (Cf. PEMtk plotters.)

23/04/22: added plot data and object returns to self.data[key][plotDict][pStyle]
plotGetCro(pType='SIGMA', Erange=None, Etype='Eke', selDims=None, keys=None, backend='mpl')

Basic GetCro (cross-section) data plotting for multijob class. Run self.plot.line(x=Etype, col=’Type’) for each dataset. (See epsproc.classes.ePSmultiJob.plotGetCroComp() for comparitive plots over datasets.)

Note this is for LF averaged parameters, for more details see the ePS starter notes for more details.

Parameters:
  • pType (str, optional, default = 'SIGMA') – Set data for plotting, either ‘SIGMA’ (cross-section) or ‘BETA’ (B2 parameter). If backend = ‘hv’ this parameter is not used.
  • Erange (list of int or float, optional, default = None) – Set plot range [Emin, Emax]. Defaults to full data range if not set
  • Etype (str, optional, default = 'Eke') – Set plot dimension, either ‘Eke’ (electron kinetic energy) or ‘Ehv’ (photon energy).
  • selDims (str, optional, default = None) – Subselect dims, as a dictionary. E.g. to select a specific continuum symmetry set selDims = {‘Cont’:’Ag’}
  • keys (list, optional, default = None) – Keys for datasets to plot. If None, all datasets will be plotted.
  • backend (str, optional, default = 'mpl') –

    Set plotter to use.

    • ’mpl’ : Use Matplotlib/native Xarray plotter
    • ’hv’ : use Holoviews via epsproc.plotters.hvPlotters.XCplot()
plotGetCroComp(pType='SIGMA', pGauge='L', pSym=('All', 'All'), Erange=None, Etype='Eke', Eshift=None, keys=None, backend='mpl', returnHandles=False)

Basic GetCro (cross-section) data plotting for multijob class, comparitive plots. Run self.plot.line(x=Etype) for each dataset after subselection on Gauge and Symmetry, and use single axis. (See epsproc.classes.ePSmultiJob.plotGetCro() for plots per dataset.)

Note this is for LF averaged parameters, for more details see the ePS starter notes for more details.

Parameters:
  • pType (str, optional, default = 'SIGMA') – Set data for plotting, either ‘SIGMA’ (cross-section) or ‘BETA’ (B2 parameter).
  • pGauge (str, optional, default = 'L') – Set gauge, either ‘L’ (Length), ‘V’ (Velocity) or ‘M’ (Mixed)
  • pSym (tuple of strs, optional, default = ('All','All')) – Select symmetry, (Cont, Targ). Default value will plot all allowed symmetries.
  • Erange (list of int or float, optional, default = None) – Set plot range [Emin, Emax]. Defaults to full data range if not set
  • Etype (str, optional, default = 'Eke') – Set plot dimension, either ‘Eke’ (electron kinetic energy) or ‘Ehv’ (photon energy).
  • Eshift (int or float, optional, default = None) – Apply energy shift to results if set.
  • keys (list, optional, default = None) – Keys for datasets to plot. If None, all datasets will be plotted.
  • backend (str, optional, default = 'mpl') –

    Set plotter to use.

    • ’mpl’ : Use Matplotlib/native Xarray plotter
    • ’hv’ : use Holoviews via epsproc.plotters.hvPlotters.XCplot()
  • returnHandles (bool, optional, default = False) – If true, return plot object and legend test list.
  • NOTE (added backend options 27/10/20. CURRENTLY NOT WORKING for hv, due to data structure assumed in hvPlotters.XCplot()) –
  • 06/04/21 (UPDATE) –
scanFiles(dataPath=None, fileIn=None, reset=False, keyType='orb')

Scan ePS output files from a dir for multiple data types. Sort data, and set to list/dict/Xarray structures.

Adapted from https://phockett.github.io/ePSdata/XeF2-preliminary/XeF2_multi-orb_comparisons_270320-dist.html

Current implementation: - Read XS and matrix elements from source files, sort to Xarrays (one per file and data type), uses uses epsproc.readMatEle(). - Stack by Eke for multi-file E-chunked jobs. - Read additional data for jobs (uses epsproc.headerFileParse() and epsproc.molInfoParse()). - Sort data to lists by data type, and dict with keys per file/job (self.data). - Dict should be final data to use (note - can’t get heterogenous data types & dims to work well for Xarray Dataset, but this may change.)

TODO: - convert outputs to Xarray dataset. Did this before, but currently missing file (on AntonJr)! CHECK BACKUPS - NOPE. - Confirm HV scaling - may be better to redo this, rather than correct existing values?

  • Fix xr.dataset: currently aligns data, so will set much to Nan if, e.g., different symmetries etc.

Change to structure as ds(‘XS’,’matE’) per orb, rather than ds(‘XS’) and ds(‘matE’) for all orbs? This should also be in line with hypothetical base dataclass, which will be per orb by defn.

Parameters:
  • dataPath (str or Path object, optional, default = None) – Set dir to scan. Default is to use self.job[‘fileBase’] as set at init.
  • reset (bool, optional, default = False) – If False, new data will be appended to any existing data. If True, existing data will be removed. This allows for persistence over multiple calls, e.g. reading multiple dirs.
  • keyType (str, optional, default = 'orb') – ‘orb’: Use orbital labels as dataset keys ‘int’: Use integer labels as dataset keys (will be ordered by file read) Any other setting will result in key = keyType, which can be used to explicitly pass a key (e.g. in multijob wrapper case). This should be tidied up.
wignerDelay(keys=None, pType='phaseUW', **kwargs)[source]

Wigner delay computation as phase derivative of TX grid.

Multi data-set wrapper for numerics; uses epsproc.MFPAD.mfpad() and epsproc.MFPAD.mfWignerDelay().

27/10/20 initial version added. Looks OK for N2 & N2O test cases, but not carefully tested as yet. http://localhost:8888/lab/tree/dev/ePSproc/classDev/ePSproc_class_demo_161020_Wigner_271020.ipynb

Parameters:
  • keys (str, int or list, optional, default = None) – If set, use only these datasets (keys). Otherwise run for all datasets in self.data.
  • pType (str, optional, default = 'phaseUW') – Used to set data conversion options, as implemented in epsproc.plotTypeSelector() - ‘phase’ use np.angle() - ‘phaseUW’ unwapped with np.unwrap(np.angle())
  • **kwargs (optional) – Args passed to epsproc.mfpad().