epsproc.classes.multiJob_v1_131020 module¶

Core classes for ePSproc data to handle multiple job filesets (energy and/or orbitals etc.)

23/09/20 Added ePSmultiJob class, currently very rough.

14/09/20 v1 Started class development. See ePSproc_multijob_class_dev_140920_bemo.ipynb

class epsproc.classes.multiJob_v1_131020.ePSmultiJob(fileBase=None, jobDirs=None, jobStructure=None, prefix=None, ext='.out', Edp=1, verbose=1)[source]¶

Bases: object

Class for handling multiple ePS jobs/datasets.

Read datasets from a single dir, or set of dirs.
Sort to dictionaries and Xarray datasets (needs some work).
Comparitive plotting (in development).

Parameters:

fileBase (str or Path object, default = None) – Base directory to scan for ePS files, subdirs will also be searched. This is required unless jobDirs is set instead.
jobDirs (list of str or Path objects, default= None) – List of dirs containing ePS output files to read, subdirs will NOT be searched.
jobStructure (str, optional, default = None) – This will be set automatically by self.scanDirs(), but can also be passed to override. Values “subDirs” or “rootDir”, in former case multiple files will be stacked by Eke.
prefix (str, optional, default = None) – Set prefix string for file checks (cf. wfPlot class). NOT YET USED.
ext (str, optional, default = '.out') – Set default file extension for dir scanning. This should match the file extension for ePolyScat output files.
Edp (int, optional, default = 2) – Set default dp for Ehv conversion. May want to set this elsewhere instead… maybe just for plotting? TODO: also consider axis reindex, lookups and interp functions here - useful for differences between datasets.
verbose (int, optional, default = 1) – Set verbosity level for printing/error checking. Not yet fully implemented, but, generally: - 0, no printed output. - 1, basic printed info. - 2, print all info, including subfunction outputs.

TODO:

verbosity levels, subtract for subfunctions? Or use a dict to handle multiple levels?

jobLabel(key=None, lString=None, append=True)[source]¶

Reset or append text to jobLabel. Very basic.

TODO: consistency over [m], jobLabel vs. orbLabel rationalisation.

jobsSummary()[source]¶

Print some general info.

TODO: add some info!

lmPlot(Erange=None, Etype='Eke', keys=None, refDataKey=None, reindexTol=0.5, reindexFill=<sphinx.ext.autodoc.importer._MockObject object>, setPD=True, **kwargs)[source]¶

Wrapper for epsproc.lmPlot() for multijob class. Run lmPlot() for each dataset.

Parameters:

Erange (list of int or float, optional, default = None) – Set plot range [Emin, Emax]. Defaults to full data range if not set
Etype (str, optional, default = 'Eke') – Set plot dimension, either ‘Eke’ (electron kinetic energy) or ‘Ehv’ (photon energy).
keys (list, optional, default = None) – Keys for datasets to plot. If None, all datasets will be plotted.
refDataKey (tuple (key,m), optional, default = None) – If set, calculate difference plots against reference dataset. TODO: implement difference plots. TODO: implement testing logic, may fail without E-axis forcing, and sym summation?
reindexTol (float, optional, default = 0.1) – If computing difference data, the reference data is reindexed to ensure E grid matching. This specifies tolerance (in E units, usually eV) for reindexing. If this fails, difference plot may be null.
reindexFill (int or float, optional, default = NaN) – Value to use for missing values upon reindexing. Default matches [Xarray.reindex default](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.reindex.html), i.e. NaN, but this may give issues in some cases.
setPD (bool, optional, default = True) – Set Pandas array in main dataset?
kwargs (dict, optional, default = {}) – Plotting options to pass to epsproc.lmPlot(). These will also be set in self.lmPlotOpts for further use. Note that any existing options in self.lmPlotOpts will also be used, or overwritten if matching keys are found.

Notes

Basic scheme from ePSmultijob.plotGetCro, which loops and switches on Eke/Ehv. Should tidy up at some point.

matEtoPD(keys=None, xDim='Eke', Erange=None, printTable=True, selDims=None, dType=None, thres=None, drop=True, fillna=False, squeeze=True, setPD=True)[source]¶

Convert Xarray to PD for nice tabular display.

Basically code as per basicPlotters.lmPlot(), but looped over datasets.

mfpadNumeric(selDims={'Type': 'L', 'it': 1}, keys=None, res=50)[source]¶

MFPADs “direct” (numerical), without beta parameter computation.

Wrapper for epsproc.mfpad(), loops over all loaded datasets.

NOTE: for large datasets and/or large res, this can be memory-hungry.

mfpadPlot(selDims={}, sumDims={}, Erange=None, Etype='Eke', keys=None, pType='a', pStyle='polar', backend='mpl')[source]¶

molSummary(dataKey=None, tolConv=0.01)[source]¶

plotGetCro(pType='SIGMA', Erange=None, Etype='Eke', keys=None, backend='mpl')[source]¶

Basic GetCro (cross-section) data plotting for multijob class. Run self.plot.line(x=Etype, col=’Type’) for each dataset. (See epsproc.classes.ePSmultiJob.plotGetCroComp() for comparitive plots over datasets.)

Note this is for LF averaged parameters, for more details see the ePS starter notes for more details.

Parameters:

pType (str, optional, default = 'SIGMA') – Set data for plotting, either ‘SIGMA’ (cross-section) or ‘BETA’ (B2 parameter). If backend = ‘hv’ this parameter is not used.
Erange (list of int or float, optional, default = None) – Set plot range [Emin, Emax]. Defaults to full data range if not set
Etype (str, optional, default = 'Eke') – Set plot dimension, either ‘Eke’ (electron kinetic energy) or ‘Ehv’ (photon energy).
keys (list, optional, default = None) – Keys for datasets to plot. If None, all datasets will be plotted.
backend (str, optional, default = 'mpl') –
Set plotter to use.
- ’mpl’ : Use Matplotlib/native Xarray plotter
- ’hv’ : use Holoviews via epsproc.plotters.hvPlotters.XCplot()

plotGetCroComp(pType='SIGMA', pGauge='L', pSym=('All', 'All'), Erange=None, Etype='Eke', keys=None)[source]¶

Basic GetCro (cross-section) data plotting for multijob class, comparitive plots. Run self.plot.line(x=Etype) for each dataset after subselection on Gauge and Symmetry, and use single axis. (See epsproc.classes.ePSmultiJob.plotGetCro() for plots per dataset.)

Note this is for LF averaged parameters, for more details see the ePS starter notes for more details.

Parameters:

pType (str, optional, default = 'SIGMA') – Set data for plotting, either ‘SIGMA’ (cross-section) or ‘BETA’ (B2 parameter).
pGauge (str, optional, default = 'L') – Set gauge, either ‘L’ (Length), ‘V’ (Velocity) or ‘M’ (Mixed)
pSym (tuple of strs, optional, default = ('All','All')) – Select symmetry, (Cont, Targ). Default value will plot all allowed symmetries.
Erange (list of int or float, optional, default = None) – Set plot range [Emin, Emax]. Defaults to full data range if not set
Etype (str, optional, default = 'Eke') – Set plot dimension, either ‘Eke’ (electron kinetic energy) or ‘Ehv’ (photon energy).
keys (list, optional, default = None) – Keys for datasets to plot. If None, all datasets will be plotted.

scanDirs()[source]¶

Scan dir structure for folders containing ePS files.

Compatibility… this assumes one of two dir structures:

Old structure, with multiple job output files per dir (but not per E range).
New structure, with multiple output files per E range, subdirs per job/orb.

This is set in jobStructure variable, as “rootDir” or “subDirs” respectively.

scanFiles(keys=None)[source]¶

Scan ePS output files from dir list.

Adapted from https://phockett.github.io/ePSdata/XeF2-preliminary/XeF2_multi-orb_comparisons_270320-dist.html

Currently outputting complicated dataSets dictionary/list - need to sort this out! - Entry per dir scanned. - Sub-entries per file, but collapsed in some cases. - Sending to Xarray dataset with orb labels should be cleaner.

TODO:

Flatten data structure to remove unnecessary nesting.
Fix keys to apply to single dir case (above should also fix this).
convert outputs to Xarray dataset. Did this before, but currently missing file (on AntonJr)! CHECK BACKUPS - NOPE.
Confirm HV scaling - may be better to redo this, rather than correct existing values?
Fix xr.dataset: currently aligns data, so will set much to Nan if, e.g., different symmetries etc.

Change to structure as ds(‘XS’,’matE’) per orb, rather than ds(‘XS’) and ds(‘matE’) for all orbs? This should also be in line with hypothetical base dataclass, which will be per orb by defn.