epsproc.util.misc module

ePSproc convenience functions

Collection of small functions for sorting etc.

epsproc.util.misc.arraySort2D(a, col)[source]

Sort np.array a by specified column col. From https://thispointer.com/sorting-2d-numpy-array-by-column-or-row-in-python/

epsproc.util.misc.checkDims(data, refDims=[], method='fast', forceStacked=False)[source]

Check dimensions for a data array (Xarray) vs. a reference list (or dict).

Parameters
  • data (Xarray) – Data array to check.

  • refDims (str, list, dict, optional, default = []) – Dims to check vs. input data array. If dict is passed only keys (==stacked dims) are tested. Update 06/06/22: now also checks unstacked refDims case & returns safe ref mapping.

  • method (str, optional, default = 'fast') – Set which properties to check. - ‘fast’ basic stacked/unstacked dim checks, suitable for selectors. - ‘full’ includes ND dim checks for dictionary conversion/unwrapping. This may be brittle.

  • forceStacked (bool, optional, default = False) – Force stacked/mixed dim behaviour (pre-2022 style output) from list refDims input. Required for some routines (e.g. density.dimRestack()) to function. May break others (e.g. paramPlot())!

Returns

Containing:

  • stacked and unstacked dims

  • stacked dim mappings

  • intersection and differences vs. refDims

  • safeStack for use with restack() function even if dims are missing.

Return type

dictionary

Examples

>>> # Return dim lists
>>> ep.util.misc.checkDims(dataTest)
>>> # Return dim lists
>>> ep.util.misc.checkDims(dataTest)

TODO: check and order dims by size? Otherwise set return is alphebetical TODO: tidy up mixed stacked/unstacked dim handling with refDims passed as list.

25/07/22 - additional checking on dims and to_index added - this previously failed for “0-dimensional” cases, which can appear after xr.squeeze(drop=False), AND for MultiIndex coords.

This may be XR/PD version dependent too, tested in XR 0.19, Pandas 1.2.4 only. See notes in code for more details.

21/07/22 Removed refDims[k] = [v] to force ref dims to list - this breaks original IO code due to xr.sel() for singleton MultiIndex case.

Update now reinstated, fixed issue with .copy() instead.

20/07/22 added forceStacked as optional flag to preserve old behaviour (pre-2022) for mixed cases.

Otherwise missing sharedDimsStacked However forceStacked=True also breaks some other cases (e.g. paramPlot) - so may need to more carefully rework this.

23/06/22 Added support for non-dimensional coords.

These are treated separately from dimensional coords, and mappings pushed to dict (for MultiIndex) or list (for Index).

06/06/22 Added better support for stacked refDims & remapping with refDims.

Now also tests refDims items (unstacked dims), as well as keys (stacked dims). Added outputs ‘extraUS’, ‘invalidUS’, ‘remap’ May now be some repetition here, but didn’t touch existing items to ensure back compatibility!

28/09/21 Added basic support for Pandas DataFrames, still needs some work.

See https://stackoverflow.com/questions/21081042/detect-whether-a-dataframe-has-a-multiindex for some thoughts.

26/08/21 Added additional tests for stacked dims vs. ref.

Bit messy, but left as-is to avoid breaking old code. In future should amalgamate stacked & unstacked tests & tidy output.

23/08/21 Added stacked dim mapping output. 11/05/21 Added handling for stacked dims.

epsproc.util.misc.deconstructDims(data, returnType='xr', splitComplex=False)[source]

Deconstruction (unstack) for Xarray, including non-dimensional dims.

Existing dim map is set to data.attrs[‘dimMaps’].

Parameters
  • data (Xarray) – Object to deconstruct (flatten)

  • returnType (str, optional, default = 'xr') – Set return object type. - ‘xr’ Xarray object. - ‘dict’ return as dictionary. - ‘all’ return both types.

Return type

Xarray and/or dictionary object according to returnType

Examples

>>> # Decon to dict
>>> safeDict = deconstructDims(array2).to_dict()
>>> # Rebuild
>>> xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict))

Notes

See discussion at https://github.com/pydata/xarray/issues/4073

epsproc.util.misc.fileListSort(fList, groupByPrefix=True, prefixStr=None, verbose=1)[source]

Sort a list of file names, and group by prefix.

Note: this currently assumes a file name schema whereby split(‘_’)[0] picks the grouping string.

Note: os.path.commonprefix() is used for determining prefix, this may fail in some cases (e.g. for cases where a single file is passed, or files from different dirs). Pass prefix manaully in these cases.

Return type

fListSorted, groupedList, prefixStr

epsproc.util.misc.reconstructDims(data, dropna=True)[source]

Reconstruction (stack) for Xarray or dictonary, including non-dimensional dims.

Uses dim map as set in data.attrs[‘dimMaps’], e.g. from array.attrs[‘dimMaps’] = checkDims(array, method = ‘full’). This is also set by the :py:func:`epsproc.util.misc.deconstructDims()`__ routine.

See also :py:func:`epsproc.util.misc.restack()`__ for restacking to specific data types without a dimMap.

Parameters

data (Xarray or dict) – Object to reconstruct (stack)

Return type

Xarray

Examples

>>> # Decon to dict, including dim unstacking
>>> safeDict = deconstructDims(array, returnType='dict')
>>> # Rebuild
>>> xrFromDict = reconstructDims(safeDict)
epsproc.util.misc.restack(data, refDims=None, conformDims=False, forceUnstack=True, addMissing=False, unstackExtra=False, dimMap=None, strDims=['Cont', 'Targ', 'Total', 'Type'], verbose=True)[source]

Restack Xarray to conform to refDims.

Wraps checkDims() and data.stack() for “safe” restacking even with missing dims.

See also epsproc.density.dimRestack(), which handles remapping stacked dims.

Parameters
  • data (Xarray) – Data to restack.

  • refDims (optional, str, list or dict. Default = None.) – Reference dimensions - If None, use self.attrs[‘dataType’] - If string, use epsproc.util.listFuncs.dataTypesList()[refDims][‘def’](sType=’sDict’) - If list, treat as unstacked ref dims (will do dim check and missing dims only). - If dict, treat as stacked dim mappings.

  • conformDims (bool, default = False) – If True, conform stacked dims to ref as closely as possible, including adding missing dims and unstacking extra dims. This sets forceUnstack = True, addMissing = True and unstackExtra = True. Note that extra dims will not be deleted.

  • forceUnstack (bool, default = True) – Unstack input DataArray before further manipulation if True.

  • addMissing (bool, default = False) – Add missing (unstacked) dims if True. Note these are added as string or numerical coords, as defined by strDims. (Setting incorrect coord type can affect some selection functions, specifically lmplot())

  • unstackExtra (bool, default = False) – Unstack any extra stacked dims. Note that the dims are not removed, just unstacked.

  • dimMap (dict, optional, default = None) – NOT YET IMPLEMENTED Map for dim renaming.

  • strDims (list, default = ['Cont','Targ','Total','Type']) – Settings for addMissing for string coord dims.

  • verbose (bool, default = True) – Show extra output if True.

Returns

  • Xarray (with restacked dims.)

  • Dict (output from checkDims() used to define the restacking.)

TODO:

  • Fix coord type issues, maybe define in dataTypesList?

  • Logging for before & after dims?

epsproc.util.misc.setDefaultArgs(defaults={}, presetDict={}, method='defaultKeys', **kwargs)[source]

Set default function arguments from dictionaries.

Set and update defaults dict from presetDict and any other passed kwargs.

Parameters
  • defaults (dict, optional, default = {}) – Default values for function (which must be set).

  • presetDict (dict, optional, default = {}) – Preset values, e.g. from another function or object.

  • method (str, optional, default = 'defaultKeys') –

    • defaultKeys: set outputs according to keys found in defaults.keys() only.

    • presetDictKeys: set outputs including all keys found in presetDict.keys().

Returns

defaults – Updated according to method. Note this is modified in-place unless a copy is explicitly passed to the function.

Return type

dict

epsproc.util.misc.sortGroupFn(fListSorted, prefixStr)[source]
epsproc.util.misc.stringRepMap(string, replacements, ignore_case=False)[source]

Given a string and a replacement map, it returns the replaced string. :param str string: string to execute replacements on :param dict replacements: replacement dictionary {value to find: value to replace} :param bool ignore_case: whether the match should be case insensitive :rtype: str

CODE from: https://gist.github.com/bgusach/a967e0587d6e01e889fd1d776c5f3729 https://stackoverflow.com/questions/6116978/how-to-replace-multiple-substrings-of-a-string … more or less verbatim.

Thanks to bgusach for the Gist.

epsproc.util.misc.subselectDims(data, refDims=[], ignoreItems=False)[source]

Subselect dims from shared dim dict. Check dimensions for a data array (Xarray) vs. a reference list.

Used to set safe selection criteria in matEleSelector.

Parameters
  • data (Xarray) – Object to use for dims to check.

  • refDims (dict or list) – Ref dims to compare against data.

  • ignoreItems (bool, optional, default = False) – If True, only pass refDims.keys() to checkDims()

Returns

Safe selection criteria in format to match input refDims.

Return type

Dict or list

20/10/22: added ignoreItems option. If true, only refDims.keys() is tested. This is better for selectors which don’t need to be tested vs. dims. Default behaviour is False, which matches original code.

epsproc.util.misc.timeStamp()[source]

Get local time and return formatted string “%d-%m-%y_%H-%M-%S” for time-stamping filesnames.