epsproc.util package

Module contents

ePSproc utility functions.

Set of tools for assignment, sorting, normalisation and conversion.

16/03/20 Converted to submodule, mainly split out from old util.py, plus some new functions.
Imports may be buggy…

14/10/19 Added string replacement function (generic) 11/08/19 Added matEleSelector

epsproc.util.ADMdimList(sType='stacked')[source]

Return standard list of dimensions for frame definitions, from epsproc.sphCalc.setADMs().

Parameters:sType (string, optional, default = 'stacked') – Selected ‘stacked’ or ‘unstacked’ dimensions. Set ‘sDict’ to return a dictionary of unstacked <> stacked dims mappings for use with xr.stack({dim mapping}).
Returns:list
Return type:set of dimension labels.
epsproc.util.BLMdimList(sType='stacked')[source]

Return standard list of dimensions for calculated BLM.

Parameters:sType (string, optional, default = 'stacked') – Selected ‘stacked’ or ‘unstacked’ dimensions. Set ‘sDict’ to return a dictionary of unstacked <> stacked dims mappings for use with xr.stack({dim mapping}).
Returns:list
Return type:set of dimension labels.
epsproc.util.arraySort2D(a, col)[source]

Sort np.array a by specified column col. From https://thispointer.com/sorting-2d-numpy-array-by-column-or-row-in-python/

epsproc.util.conv_ev_atm(data, to='ev')[source]

Convert eV <> Hartree (atomic units)

Parameters:
  • data (int, float, np.array) – Values to convert.
  • to (str, default = 'ev') –
    • ‘ev’ to convert H > eV
    • ’H’ to convert eV > H
Returns:

Return type:

data converted in converted units.

epsproc.util.conv_ev_nm(data)[source]

Convert E(eV) <> lambda(nm).

epsproc.util.dataGroupSel(data, dInd)[source]
epsproc.util.dataTypesList()[source]

Return a dict of allowed dataTypes, corresponding to epsproc processed data.

Each dataType lists ‘source’, ‘desc’ and ‘recordType’ fields.

  • ‘source’ fields correspond to ePS functions which get or generate the data.
  • ‘desc’ brief description of the dataType.
  • ‘recordType’ gives the required segment in ePS files (and associated parser). If the segment is not present in the source file, then the dataType will not be available.

TODO: best choice of data structure here? Currently nested dictionary.

epsproc.util.eulerDimList(sType='stacked')[source]

Return standard list of dimensions for frame definitions, from epsproc.sphCalc.setPolGeoms().

Parameters:sType (string, optional, default = 'stacked') – Selected ‘stacked’ or ‘unstacked’ dimensions. Set ‘sDict’ to return a dictionary of unstacked <> stacked dims mappings for use with xr.stack({dim mapping}).
Returns:list
Return type:set of dimension labels.
epsproc.util.genLM(Lmax)[source]

Return array of (L,M) up to supplied Lmax

TODO: add return type options, include conversion to SHtools types.

epsproc.util.jobSummary(jobInfo=None, molInfo=None, tolConv=0.01)[source]

Print some jobInfo stuff & plot molecular structure. (Currently very basic.)

Parameters:
  • jobInfo (dict, default = None) – Dictionary of job data, as generated by :py:function:`epsproc.IO.headerFileParse()` from source ePS output file.
  • molInfo (dict, default = None) – Dictionary of molecule data, as generated by epsproc.IO.molInfoParse() from source ePS output file.
  • tolConv (float, default = 1e-2) – Used to check for convergence in ExpOrb outputs, which defines single-center expansion of orbitals.
Returns:

  • JobInfo (list)
  • orbInfo (dict) – Properties of ionizing orbital, as determined from (jobInfo, molInfo).

20/09/20 v2 Added orbInfo dict, and use this to hold all orbital related outputs for return. May break old codes (pre v1.2.6-dev).
Moved orbInfo to a separate function.
epsproc.util.lmSymSummary(data)[source]

Display summary info data tables.

Works nicely in a notebook cell, with Pandas formatted table… but not from function?

For a more sophisticated Pandas conversion, see epsproc.util.conversion.multiDimXrToPD()

epsproc.util.matEdimList(sType='stacked')[source]

Return standard list of dimensions for matrix elements.

Parameters:sType (string, optional, default = 'stacked') – Selected ‘stacked’ or ‘unstacked’ dimensions. Set ‘sDict’ to return a dictionary of unstacked <> stacked dims mappings for use with xr.stack({dim mapping}).
Returns:list
Return type:set of dimension labels.
epsproc.util.matEleSelector(da, thres=None, inds=None, dims=None, sq=False, drop=True)[source]

Select & threshold raw matrix elements in an Xarray. Wraps Xarray.sel(), plus some additional options.

See Xarray docs for more: http://xarray.pydata.org/en/stable/user-guide/indexing.html

Parameters:
  • da (Xarray) – Set of matrix elements to sub-select
  • thres (float, optional, default None) – Threshold value for abs(matElement), keep only elements > thres. This is element-wise.
  • inds (dict, optional, default None) – Dicitonary of additional selection criteria, in name:value format. These correspond to parameter dimensions in the Xarray structure. E.g. inds = {‘Type’:’L’,’Cont’:’A2’}
  • dims (str or list of strs, dimensions to look for max & threshold, default None) – Set for dimension-wise thresholding. If set, this is used instead of element-wise thresholding. List of dimensions, which will be checked vs. threshold for max value, according to abs(dim.max) > threshold This allows for consistent selection of continuous parameters over a dimension, by a threshold.
  • sq (bool, optional, default False) – Squeeze output singleton dimensions.
  • drop (bool, optional, default True) – Passed to da.where() for thresholding, drop coord labels for values below threshold.
Returns:

Xarray structure of selected matrix elements. Note that Nans are dropped if possible.

Return type:

daOut

Example

>>> daOut = matEleSelector(da, inds = {'Type':'L','Cont':'A2'})

Notes

xr.sel(inds) is used here. For single values xr.sel({name:[value]}) or xr.sel({name:value}) is different! Automatically squeeze out dim in latter case. (Tested on xr v0.15)

E.g., for selecting a single Eke value: da.sel({‘Eke’:[1.1]}) # Keeps Eke dim da.sel({‘Eke’:1.1}) # Drops Eke to non-dimension coord. da.sel({‘Eke’:1.1}, drop=True) # Drops Eke completely da.sel({‘Eke’:[1.1]}, drop=True) # Keeps Eke da.sel({‘Eke’:[1.1]}, drop=True).squeeze() # Drops Eke to non-dim coord

epsproc.util.multiDimXrToPD(da, colDims=None, rowDims=None, thres=None, squeeze=True, dropna=True, fillna=False, colRound=2, verbose=False)[source]

Convert multidim Xarray to stacked Pandas 2D array, (rowDims, colDims)

Parameters:
  • da (Xarray) – Array for conversion.
  • colDims (list of dims for columns, default = None) –
  • rowDims (list of dims for rows, default = None) –
  • NOTE (if xDim is a MultiIndex, pass as a dictionary mapping, otherwise it may be unstacked during data prep.) –
  • full control over dim stack ordering, specifiy both colDims and rowDims (For) –
  • NOTE
  • for plotting stacked (L,M), set xDim = {'LM' (E.g.) –
  • thres (float, optional, default = None) – Threshold values in output (pd table only) TODO: generalise this and use matEleSelector() for input?
  • squeeze (bool, optional, default = True) – Drop singleton dimensions.
  • dropna (bool, optional, default = True) – Drop all NaN dimensions from output pd data frame (columnwise and rowise).
  • fillna (bool, optional, default = False) – Fill any NaN values with 0.0. Useful for plotting/making data contiguous.
  • colRound (int, optional, default = True) – Round column values to colRound dp. Only applied for Eke, Ehv, Euler or t dimensions.
Returns:

  • daRestackpd (pandas data frame (2D) with sorted data.)
  • daRestack (Xarray with restacked data.)

Restack Xarray by specified dims, including basic dims checking, then use da.to_pandas().

12/03/20 Function adapted from lmPlot() code.

Note

This might casue epsproc.lmPlot() to fail for singleton x-dimensions if squeeze = True. TO do: add work-around, see lines 114-122.

epsproc.util.orb3DCoordConv(fileIn, coordMaxLen=50)[source]

Basic coord parse & conversion for volumetric wavefunction files from ePS.

Parameters:
  • fileIn (data from a single file) – List of values from a wavefunction file, as returned by epsproc.readOrb3D(). (Note this currently assumes a single file/set of values.)
  • coordMaxLen (int, optional, default=50) – Max coord grid size, assumed to demark native Cart (<coordMaxLen) from Spherical (>coordMaxLen) coords.
Returns:

x,y,z

Return type:

np.arrays of Cartesian coords (x,y,z)

epsproc.util.stringRepMap(string, replacements, ignore_case=False)[source]

Given a string and a replacement map, it returns the replaced string. :param str string: string to execute replacements on :param dict replacements: replacement dictionary {value to find: value to replace} :param bool ignore_case: whether the match should be case insensitive :rtype: str

CODE from: https://gist.github.com/bgusach/a967e0587d6e01e889fd1d776c5f3729 https://stackoverflow.com/questions/6116978/how-to-replace-multiple-substrings-of-a-string … more or less verbatim.

Thanks to bgusach for the Gist.