epsproc.util package¶

Submodules¶

Module contents¶

ePSproc utility functions.

Set of tools for assignment, sorting, normalisation and conversion.

16/03/20 Converted to submodule, mainly split out from old util.py, plus some new functions.: Imports may be buggy…

14/10/19 Added string replacement function (generic) 11/08/19 Added matEleSelector

epsproc.util.ADMdimList(sType='stacked')[source]¶

Return standard list of dimensions for frame definitions, from epsproc.sphCalc.setADMs().

Parameters:	sType (string, optional, default = 'stacked') – Selected ‘stacked’ or ‘unstacked’ dimensions. Set ‘sDict’ to return a dictionary of unstacked <> stacked dims mappings for use with xr.stack({dim mapping}).
Returns:	list
Return type:	set of dimension labels.

epsproc.util.BLMdimList(sType='stacked')[source]¶

Return standard list of dimensions for calculated BLM.

Parameters:	sType (string, optional, default = 'stacked') – Selected ‘stacked’ or ‘unstacked’ dimensions. Set ‘sDict’ to return a dictionary of unstacked <> stacked dims mappings for use with xr.stack({dim mapping}).
Returns:	list
Return type:	set of dimension labels.

epsproc.util.arraySort2D(a, col)[source]¶: Sort np.array a by specified column col. From https://thispointer.com/sorting-2d-numpy-array-by-column-or-row-in-python/

epsproc.util.conv_ev_atm(data, to='ev')[source]¶

Convert eV <> Hartree (atomic units)

Parameters:	data (int, float, np.array) – Values to convert. to (str, default = 'ev') – ‘ev’ to convert H > eV ’H’ to convert eV > H
Returns:
Return type:	data converted in converted units.

epsproc.util.conv_ev_nm(data)[source]¶: Convert E(eV) <> lambda(nm).

epsproc.util.dataGroupSel(data, dInd)[source]¶

epsproc.util.dataTypesList()[source]¶

Return a dict of allowed dataTypes, corresponding to epsproc processed data.

Each dataType lists ‘source’, ‘desc’ and ‘recordType’ fields.

‘source’ fields correspond to ePS functions which get or generate the data.
‘desc’ brief description of the dataType.
‘recordType’ gives the required segment in ePS files (and associated parser). If the segment is not present in the source file, then the dataType will not be available.
‘def’ provides definition function handle (if applicable).
‘dims’ lists results from def(sType = ‘sDict’)

TODO: best choice of data structure here? Currently nested dictionary.

epsproc.util.eulerDimList(sType='stacked')[source]¶

Return standard list of dimensions for frame definitions, from epsproc.sphCalc.setPolGeoms().

Parameters:	sType (string, optional, default = 'stacked') – Selected ‘stacked’ or ‘unstacked’ dimensions. Set ‘sDict’ to return a dictionary of unstacked <> stacked dims mappings for use with xr.stack({dim mapping}).
Returns:	list
Return type:	set of dimension labels.

epsproc.util.genLM(Lmax)[source]¶

Return array of (L,M) up to supplied Lmax

TODO: add return type options, include conversion to SHtools types.

epsproc.util.jobSummary(jobInfo=None, molInfo=None, tolConv=0.01)[source]¶

Print some jobInfo stuff & plot molecular structure. (Currently very basic.)

Parameters:

jobInfo (dict, default = None) – Dictionary of job data, as generated by :py:function:`epsproc.IO.headerFileParse()` from source ePS output file.
molInfo (dict, default = None) – Dictionary of molecule data, as generated by epsproc.IO.molInfoParse() from source ePS output file.
tolConv (float, default = 1e-2) – Used to check for convergence in ExpOrb outputs, which defines single-center expansion of orbitals.

Returns:

JobInfo (list)
orbInfo (dict) – Properties of ionizing orbital, as determined from (jobInfo, molInfo).

20/09/20 v2 Added orbInfo dict, and use this to hold all orbital related outputs for return. May break old codes (pre v1.2.6-dev).: Moved orbInfo to a separate function.

epsproc.util.lmSymSummary(data)[source]¶

Display summary info data tables.

Works nicely in a notebook cell, with Pandas formatted table… but not from function?

For a more sophisticated Pandas conversion, see epsproc.util.conversion.multiDimXrToPD()

epsproc.util.matEdimList(sType='stacked')[source]¶

Return standard list of dimensions for matrix elements.

Parameters:	sType (string, optional, default = 'stacked') – Selected ‘stacked’ or ‘unstacked’ dimensions. Set ‘sDict’ to return a dictionary of unstacked <> stacked dims mappings for use with xr.stack({dim mapping}).
Returns:	list
Return type:	set of dimension labels.

epsproc.util.matEleSelector(da, thres=None, inds=None, dims=None, sq=False, drop=True)[source]¶

Select & threshold raw matrix elements in an Xarray. Wraps Xarray.sel(), plus some additional options.

See Xarray docs for more: http://xarray.pydata.org/en/stable/user-guide/indexing.html

Parameters:	da (Xarray) – Set of matrix elements to sub-select thres (float, optional, default None) – Threshold value for abs(matElement), keep only elements > thres. This is element-wise. inds (dict, optional, default None) – Dicitonary of additional selection criteria, in name:value format. These correspond to parameter dimensions in the Xarray structure. E.g. inds = {‘Type’:’L’,’Cont’:’A2’} dims (str or list of strs, dimensions to look for max & threshold, default None) – Set for dimension-wise thresholding. If set, this is used instead of element-wise thresholding. List of dimensions, which will be checked vs. threshold for max value, according to abs(dim.max) > threshold This allows for consistent selection of continuous parameters over a dimension, by a threshold. sq (bool, optional, default False) – Squeeze output singleton dimensions. drop (bool, optional, default True) – Passed to da.where() for thresholding, drop coord labels for values below threshold.
Returns:	Xarray structure of selected matrix elements. Note that Nans are dropped if possible.
Return type:	daOut

Example

>>> daOut = matEleSelector(da, inds = {'Type':'L','Cont':'A2'})

Notes

xr.sel(inds) is used here. For single values xr.sel({name:[value]}) or xr.sel({name:value}) is different! Automatically squeeze out dim in latter case. (Tested on xr v0.15)

E.g., for selecting a single Eke value: da.sel({‘Eke’:[1.1]}) # Keeps Eke dim da.sel({‘Eke’:1.1}) # Drops Eke to non-dimension coord. da.sel({‘Eke’:1.1}, drop=True) # Drops Eke completely da.sel({‘Eke’:[1.1]}, drop=True) # Keeps Eke da.sel({‘Eke’:[1.1]}, drop=True).squeeze() # Drops Eke to non-dim coord

epsproc.util.multiDimXrToPD(da, colDims=None, rowDims=None, thres=None, squeeze=True, dropna=True, fillna=False, colRound=2, verbose=False)[source]¶

Convert multidim Xarray to stacked Pandas 2D array, (rowDims, colDims)

Parameters:

da (Xarray) – Array for conversion.
colDims (list of dims for columns, default = None) –
rowDims (list of dims for rows, default = None) –
NOTE (if xDim is a MultiIndex, pass as a dictionary mapping, otherwise it may be unstacked during data prep.) –
full control over dim stack ordering, specifiy both colDims and rowDims (For) –
NOTE –
for plotting stacked (L,M), set xDim = {'LM' (E.g.) –
thres (float, optional, default = None) – Threshold values in output (pd table only) TODO: generalise this and use matEleSelector() for input?
squeeze (bool, optional, default = True) – Drop singleton dimensions.
dropna (bool, optional, default = True) – Drop all NaN dimensions from output pd data frame (columnwise and rowise).
fillna (bool, optional, default = False) – Fill any NaN values with 0.0. Useful for plotting/making data contiguous.
colRound (int, optional, default = True) – Round column values to colRound dp. Only applied for Eke, Ehv, Euler or t dimensions.

Returns:

daRestackpd (pandas data frame (2D) with sorted data.)
daRestack (Xarray with restacked data.)

Restack Xarray by specified dims, including basic dims checking, then use da.to_pandas().

12/03/20 Function adapted from lmPlot() code.

Note

This might casue epsproc.lmPlot() to fail for singleton x-dimensions if squeeze = True. TO do: add work-around, see lines 114-122.

epsproc.util.orb3DCoordConv(fileIn, coordMaxLen=50)[source]¶

Basic coord parse & conversion for volumetric wavefunction files from ePS.

Parameters:	fileIn (data from a single file) – List of values from a wavefunction file, as returned by `epsproc.readOrb3D()`. (Note this currently assumes a single file/set of values.) coordMaxLen (int, optional, default=50) – Max coord grid size, assumed to demark native Cart (<coordMaxLen) from Spherical (>coordMaxLen) coords.
Returns:	x,y,z
Return type:	np.arrays of Cartesian coords (x,y,z)

epsproc.util.stringRepMap(string, replacements, ignore_case=False)[source]¶

Given a string and a replacement map, it returns the replaced string. :param str string: string to execute replacements on :param dict replacements: replacement dictionary {value to find: value to replace} :param bool ignore_case: whether the match should be case insensitive :rtype: str

CODE from: https://gist.github.com/bgusach/a967e0587d6e01e889fd1d776c5f3729 https://stackoverflow.com/questions/6116978/how-to-replace-multiple-substrings-of-a-string … more or less verbatim.

Thanks to bgusach for the Gist.