epsproc.util.misc module

ePSproc convenience functions

Collection of small functions for sorting etc.

epsproc.util.misc.arraySort2D(a, col)[source]

Sort np.array a by specified column col. From https://thispointer.com/sorting-2d-numpy-array-by-column-or-row-in-python/

epsproc.util.misc.checkDims(data, refDims=[], method='fast')[source]

Check dimensions for a data array (Xarray) vs. a reference list (or dict).

Parameters:
  • data (Xarray) – Data array to check.
  • refDims (str, list, dict, optional, default = []) – Dims to check vs. input data array. If dict is passed only keys (==stacked dims) are tested. Update 06/06/22: now also checks unstacked refDims case & returns safe ref mapping.
  • method (str, optional, default = 'fast') – Set which properties to check. - ‘fast’ basic stacked/unstacked dim checks, suitable for selectors. - ‘full’ includes ND dim checks for dictionary conversion/unwrapping. This may be brittle.
Returns:

Containing:

  • stacked and unstacked dims
  • stacked dim mappings
  • intersection and differences vs. refDims
  • safeStack for use with restack() function even if dims are missing.

Return type:

dictionary

Examples

>>> # Return dim lists
>>> ep.util.misc.checkDims(dataTest)
>>> # Return dim lists
>>> ep.util.misc.checkDims(dataTest)

TODO: check and order dims by size? Otherwise set return is alphebetical

23/06/22 Added support for non-dimensional coords.
These are treated separately from dimensional coords, and mappings pushed to dict (for MultiIndex) or list (for Index).
06/06/22 Added better support for stacked refDims & remapping with refDims.
Now also tests refDims items (unstacked dims), as well as keys (stacked dims). Added outputs ‘extraUS’, ‘invalidUS’, ‘remap’ May now be some repetition here, but didn’t touch existing items to ensure back compatibility!
28/09/21 Added basic support for Pandas DataFrames, still needs some work.
See https://stackoverflow.com/questions/21081042/detect-whether-a-dataframe-has-a-multiindex for some thoughts.
26/08/21 Added additional tests for stacked dims vs. ref.
Bit messy, but left as-is to avoid breaking old code. In future should amalgamate stacked & unstacked tests & tidy output.

23/08/21 Added stacked dim mapping output. 11/05/21 Added handling for stacked dims.

epsproc.util.misc.deconstructDims(data)[source]

Deconstruction (unstack) for Xarray, including non-dimensional dims.

Existing dim map is set to data.attrs[‘dimMaps’].

epsproc.util.misc.fileListSort(fList, groupByPrefix=True, prefixStr=None, verbose=1)[source]

Sort a list of file names, and group by prefix.

Note: this currently assumes a file name schema whereby split(‘_’)[0] picks the grouping string.

Note: os.path.commonprefix() is used for determining prefix, this may fail in some cases (e.g. for cases where a single file is passed, or files from different dirs). Pass prefix manaully in these cases.

Returns:
Return type:fListSorted, groupedList, prefixStr
epsproc.util.misc.reconstructDims(data, dropna=True)[source]

Reconstruction (stack) for Xarray, including non-dimensional dims.

Uses dim map as set in data.attrs[‘dimMaps’].

See also :py:func:`epsproc.util.misc.restack()`__ for restacking to specific data types without a dimMap.

epsproc.util.misc.restack(data, refDims=None, conformDims=False, forceUnstack=True, addMissing=False, unstackExtra=False, dimMap=None, strDims=['Cont', 'Targ', 'Total', 'Type'], verbose=True)[source]

Restack Xarray to conform to refDims.

Wraps checkDims() and data.stack() for “safe” restacking even with missing dims.

Parameters:
  • data (Xarray) – Data to restack.
  • refDims (optional, str, list or dict. Default = None.) – Reference dimensions - If None, use self.attrs[‘dataType’] - If string, use epsproc.util.listFuncs.dataTypesList()[refDims][‘def’](sType=’sDict’) - If list, treat as unstacked ref dims (will do dim check and missing dims only). - If dict, treat as stacked dim mappings.
  • conformDims (bool, default = False) – If True, conform stacked dims to ref as closely as possible, including adding missing dims and unstacking extra dims. This sets forceUnstack = True, addMissing = True and unstackExtra = True. Note that extra dims will not be deleted.
  • forceUnstack (bool, default = True) – Unstack input DataArray before further manipulation if True.
  • addMissing (bool, default = False) – Add missing (unstacked) dims if True. Note these are added as string or numerical coords, as defined by strDims. (Setting incorrect coord type can affect some selection functions, specifically lmplot())
  • unstackExtra (bool, default = False) – Unstack any extra stacked dims. Note that the dims are not removed, just unstacked.
  • dimMap (dict, optional, default = None) – NOT YET IMPLEMENTED Map for dim renaming.
  • strDims (list, default = ['Cont','Targ','Total','Type']) – Settings for addMissing for string coord dims.
  • verbose (bool, default = True) – Show extra output if True.
Returns:

  • Xarray (with restacked dims.)
  • Dict (output from checkDims() used to define the restacking.)

TODO:

  • Fix coord type issues, maybe define in dataTypesList?
  • Logging for before & after dims?
epsproc.util.misc.sortGroupFn(fListSorted, prefixStr)[source]
epsproc.util.misc.stringRepMap(string, replacements, ignore_case=False)[source]

Given a string and a replacement map, it returns the replaced string. :param str string: string to execute replacements on :param dict replacements: replacement dictionary {value to find: value to replace} :param bool ignore_case: whether the match should be case insensitive :rtype: str

CODE from: https://gist.github.com/bgusach/a967e0587d6e01e889fd1d776c5f3729 https://stackoverflow.com/questions/6116978/how-to-replace-multiple-substrings-of-a-string … more or less verbatim.

Thanks to bgusach for the Gist.

epsproc.util.misc.subselectDims(data, refDims=[])[source]

Subselect dims from shared dim dict. Check dimensions for a data array (Xarray) vs. a reference list.

Used to set safe selection criteria in matEleSelector.

epsproc.util.misc.timeStamp()[source]

Get local time and return formatted string “%d-%m-%y_%H-%M-%S” for time-stamping filesnames.