epsproc.IO module

ePSproc IO functions.

Module for file IO and data parsing.

Main function: epsproc.IO.readMatEle():

readMatEle(fileIn = None, fileBase = None, fType = ‘.out’):

Read ePS file(s) and return results as Xarray data structures containing matrix elements. File endings specified by fType, default .out.

History

13/10/20 Adapted main function readMatEle() to use grouped lists for multi-file jobs, should be back-compatible if stackE = False set.

06/11/19 Added jobInfo and molInfo data structures, from ePS file via epsproc.IO.headerFileParse() and epsproc.IO.molInfoParse().
Still needs a bit of work, and may want to implement other (comp chem) libraries here.

14/10/19 Added/debugged read functions for CrossSecion segments.

27/09/19 Added read functions for EDCS segments.

17/09/19 Added read/write to/from netCDF files for Xarrays.
Use built-in methods, with work-arounds for complex number format issues.

29/08/19 Updating docs to rst.

26/08/19 Added parsing for E, sym parameters from head of ePS file.
Added error checking by comparing read mat elements to expected list. Changed & fixed Xarray indexing - matrix elements now output with dims (LM, Eke, Sym, mu, it, Type) Current code rather ugly however.

19/08/19 Add functions for reading wavefunction files (3D data)

07/08/19 Naming convention tweaks, and some changes to comments, following basic tests with Sphinx.

05/08/19 v1 Initial python version.
Working, but little error checking as yet. Needs some tidying.

To do

  • Add IO for other file segments (only DumpIdy supported so far).
  • Better logic & flexibility for file scanning.
  • Restructure as class for brevity…?
  • More sophisticated methods/data structures for job & molecule info handling.
epsproc.IO.EDCSFileParse(fileName, verbose=1)[source]

Parse an ePS file for EDCS segments.

Parameters:
  • fileName (str) – File to read (file in working dir, or full path)
  • verbose (bool or int, optional) – If true, print segment info.
Returns:

  • list – [lineStart, lineStop], ints for line #s found from start and end phrases.
  • list – dumpSegs, list of lines read from file.
  • Lists contain entries for each dumpIdy segment found in the file.

epsproc.IO.EDCSSegParse(dumpSeg)[source]

Extract values from EDCS file segments.

Parameters:dumpSeg (list) – One EDCS segment, from dumpSegs[], as returned by epsproc.IO.EDCSFileParse()
Returns:
  • np.array – EDCS, array of scattering XS, [theta, Cross Section (Angstrom^2)]
  • list – attribs, list [Label, value, units]

Notes

Currently this is a bit messy, and relies on fixed EDCS format. No error checking as yet. Not yet reading all attribs.

Example

>>> EDCS, attribs = EDCSSegParse(dumpSegs[0])
epsproc.IO.EDCSSegsParseX(dumpSegs)[source]

Extract data from ePS EDCS segments into usable form.

Parameters:dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by epsproc.IO.EDCSFileParse()
Returns:
  • xr.array – Xarray data array, containing cross sections. Dimensions (Eke, theta)
  • int – Number of blank segments found. (CURRENTLY not implemented.)

Example

>>> data = EDCSSegsParseX(dumpSegs)

Notes

A rather cut-down version of epsproc.IO.dumpIdySegsParseX(), no error checking currently implemented.

epsproc.IO.dumpIdyFileParse(fileName, verbose=1)[source]

Parse an ePS file for dumpIdy segments.

Parameters:
  • fileName (str) – File to read (file in working dir, or full path)
  • verbose (bool or int, optional) – If true, print segment info.
Returns:

  • list – [lineStart, lineStop], ints for line #s found from start and end phrases.
  • list – dumpSegs, list of lines read from file.
  • Lists contain entries for each dumpIdy segment found in the file.

epsproc.IO.dumpIdySegParse(dumpSeg)[source]

Extract values from dumpIdy file segments.

Parameters:dumpSeg (list) – One dumpIdy segment, from dumpSegs[], as returned by epsproc.IO.dumpIdyFileParse()
Returns:
  • np.array – rawIdy, array of matrix elements, [m,l,mu,ip,it,Re,Im]
  • list – attribs, list [Label, value, units]

Notes

Currently this is a bit messy, and relies on fixed DumpIDY format. No error checking as yet. Not yet reading all attribs.

Example

>>> matEle, attribs = dumpIdySegParse(dumpSegs[0])
epsproc.IO.dumpIdySegsParseX(dumpSegs, ekeListUn, symSegs, verbose=1)[source]

Extract data from ePS dumpIdy segments into usable form.

Parameters:
  • dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by epsproc.IO.dumpIdyFileParse()
  • ekeListUn (list) – List of energies, used for error-checking and Xarray rearraging, as returned by epsproc.IO.scatEngFileParse()
  • verbose (bool, default True) – Print job info from file header if true.
Returns:

  • xr.array – Xarray data array, containing matrix elements etc. Dimensions (LM, Eke, Sym, mu, it, Type)
  • int – Number of blank segments found.

Example

>>> data = dumpIdySegsParseX(dumpSegs)
epsproc.IO.fileParse(fileName, startPhrase=None, endPhrase=None, comment=None, verbose=0)[source]

Parse a file, return segment(s) from startPhrase:endPhase, excluding comments.

Parameters:
  • fileName (str) – File to read (file in working dir, or full path)
  • startPhrase (str, optional) – Phrase denoting start of section to read. Default = None
  • endPhase (str or list, optional) – Phrase denoting end of section to read. Default = None
  • comment (str, optional) – Phrase denoting comment lines, which are skipped. Default = None
  • verbose (int, optional, default = 1) – Level of verbosity in output. - 0 no printed output - 1 print summary info only - 2 print detailed info
Returns:

  • list – [lineStart, lineStop], ints for line #s found from start and end phrases.
  • list – segments, list of lines read from file.
  • All lists can contain multiple entries, if more than one segment matches the search criteria.

epsproc.IO.getCroFileParse(fileName, verbose=1)[source]

Parse an ePS file for GetCro/CrossSection segments.

Parameters:
  • fileName (str) – File to read (file in working dir, or full path)
  • verbose (bool or int, optional) – If true, print segment info.
Returns:

  • list – [lineStart, lineStop], ints for line #s found from start and end phrases.
  • list – dumpSegs, list of lines read from file.
  • Lists contain entries for each dumpIdy segment found in the file.

epsproc.IO.getCroSegParse(dumpSeg)[source]

Extract values from GetCro/CrossSection file segments.

Parameters:dumpSeg (list) – One CrossSection segment, from dumpSegs[], as returned by epsproc.IO.getCroFileParse()
Returns:
  • np.array – CrossSections, table of results vs. energy.
  • list – attribs, list [Label, value, units]

Notes

Currently this is a bit messy, and relies on fixed CrossSection output format. No error checking as yet. Not yet reading all attribs.

Example

>>> XS, attribs = getCroSegParse(dumpSegs[0])
epsproc.IO.getCroSegsParseX(dumpSegs, symSegs, ekeList)[source]

Extract data from ePS getCro/CrossSecion segments into usable form.

Parameters:dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by epsproc.IO.getCroFileParse()
Returns:
  • xr.array – Xarray data array, containing cross sections. Dimensions (Eke, theta)
  • int – Number of blank segments found. (CURRENTLY not implemented.)

Example

>>> data = getCroSegsParseX(dumpSegs)

Notes

A rather cut-down version of epsproc.IO.dumpIdySegsParseX(), no error checking currently implemented.

epsproc.IO.getFiles(fileIn=None, fileBase=None, fType='.out', verbose=True)[source]

Read ePS file(s) and return results as Xarray data structures. File endings specified by fType, default .out.

Parameters:
  • fileIn (str, list of strs, optional.) – File(s) to read (file in working dir, or full path). Defaults to current working dir if only a file name is supplied. For consistent results, pass raw strings, e.g. fileIn = r"C:\share\code\ePSproc\python_dev\no2_demo_ePS.out"
  • fileBase (str, optional.) – Dir to scan for files. Currently only accepts a single dir. Defaults to current working dir if no other parameters are passed.
  • fType (str, optional) – File ending for ePS output files, default ‘.out’
  • verbose (bool, optional) – Print output details, default True.
Returns:

List of Xarray data arrays, containing matrix elements etc. from each file scanned.

Return type:

list

epsproc.IO.headerFileParse(fileName, verbose=True)[source]

Parse an ePS file for header & input job info.

Parameters:
  • fileName (str) – File to read (file in working dir, or full path)
  • verbose (bool, default True) – Print job info from file header if true.
Returns:

  • jobInfo (dict) – Dictionary generated from job details.
  • TO DO
  • —–
  • - Tidy up methods - maybe with parseDigits?
  • - Tidy up dict output.

epsproc.IO.matEleGroupDim(data, dimGroups=[3, 4, 2])[source]

Group ePS matrix elements by redundant labels.

Default is to group by [‘ip’, ‘it’, ‘mu’] terms, all have only a few values.

TODO: better ways to do this? Shoud be possible at Xarray level.

Parameters:data (list) – Sections from dumpIdy segment, as created in dumpIdySegsParseX() Ordering is [labels, matElements, attribs].
epsproc.IO.matEleGroupDimX(daIn)[source]

Group ePS matrix elements by redundant labels (Xarray version).

Group by [‘ip’, ‘it’, ‘mu’] terms, all have only a few values. Rename ‘ip’:1,2 as ‘Type’:’L’,’V’

TODO: better ways to do this? Via Stack/Unstack? http://xarray.pydata.org/en/stable/api.html#id16 See also tests in funcTests_210819.py for more versions/tests.

Parameters:data (Xarray) – Data array with matrix elements to be split and recombined by dims.
Returns:data – Data array with reordered matrix elements (dimensions).
Return type:Xarray
epsproc.IO.matEleGroupDimXnested(da)[source]

Group ePS matrix elements by redundant labels (Xarray version).

Group by [‘ip’, ‘it’, ‘mu’] terms, all have only a few values.

TODO: better ways to do this? See also tests in funcTests_210819.py for more versions/tests.

Parameters:data (Xarray) – Data array with matrix elements to be split and recombined by dims.
epsproc.IO.molInfoParse(fileName, verbose=True)[source]

Parse an ePS file for input molecule info.

Parameters:
  • fileName (str) – File to read (file in working dir, or full path)
  • verbose (bool, default True) – Print job info from file header if true.
Returns:

molInfo – Dictionary with atom & orbital details.

Return type:

dict

Notes

Only tested for Molden input (MoldenCnv2006).

epsproc.IO.parseLineDigits(testLine)[source]

Use regular expressions to extract digits from a string. https://stackoverflow.com/questions/4289331/how-to-extract-numbers-from-a-string-in-python

epsproc.IO.readMatEle(fileIn=None, fileBase=None, fType='.out', recordType='DumpIdy', verbose=1, stackE=True)[source]

Read ePS file(s) and return results as Xarray data structures. File endings specified by fType, default *.out.

Parameters:
  • fileIn (str, list of strs, optional.) – File(s) to read (file in working dir, or full path). Defaults to current working dir if only a file name is supplied. For consistent results, pass raw strings, e.g. fileIn = r"C:\share\code\ePSproc\python_dev\no2_demo_ePS.out"
  • fileBase (str, optional.) – Dir to scan for files. Currently only accepts a single dir. Defaults to current working dir if no other parameters are passed.
  • fType (str, optional) – File ending for ePS output files, default ‘.out’
  • recordType (str, optional, default 'DumpIdy') – Type of record to scan for, currently set for ‘DumpIdy’, ‘EDCS’ or ‘CrossSection’. For a full list of descriptions, types and sources, run: >>> epsproc.util.dataTypesList()
  • verbose (int, optional, default = 1) – Level of verbosity in output. - 0 no printed output - 1 print summary info only - 2 print detailed info
  • stackE (bool, optional, default = True) – Identify and stack multi-part jobs to single array (by E) if True.
Returns:

  • list – List of Xarray data arrays, containing matrix elements etc. from each file scanned.
  • To do
  • —–
  • - Change to pathlib paths.
  • - Implement outputType options…?
  • 13/10/20 Adapted to use grouped lists for multi-file jobs, should be back-compatible if stackE = False set.

Examples

>>> dataSet = readMatEle()  # Scan current dir
>>> fileIn = r'C:\share\code\ePSproc\python_dev\no2_demo_ePS.out'
>>> dataSet = readMatEle(fileIn)  # Scan single file
>>> dataSet = readMatEle(fileBase = r'C:\share\code\ePSproc\python_dev') # Scan dir

Note

  • Files are scanned for matrix element output flagged by “DumpIdy” headers.
  • Each segment found is parsed for attributes and data (set of matrix elements).
  • Matrix elements and attributes are combined and output as an Xarray array.
epsproc.IO.readOrb3D(fileIn=None, fileBase=None, fType='_Orb.dat', verbose=True)[source]

Read ePS 3D data file(s) and return results. File endings specified by fType, default *_Orb.dat.

fileIn : str, list of strs, optional.
File(s) to read (file in working dir, or full path). Defaults to current working dir if only a file name is supplied. For consistent results, pass raw strings, e.g. fileIn = r”C:sharecodeePSprocpython_dev

o2_demo_ePS.out”

fileBase : str, optional.
Dir to scan for files. Currently only accepts a single dir. Defaults to current working dir if no other parameters are passed.
fType : str, optional
File ending for ePS output files, default ‘_Orb.dat’
verbose : bool, optional
Print output details, default True.
list
List of data arrays, containing matrix elements etc. from each file scanned.

# TODO: Change output to Xarray?

>>> dataSet = readOrb3D()  # Scan current dir
>>> fileIn = r'C:\share\code\ePSproc\python_dev\DABCOSA2PPCA2PP_10.5eV_Orb.dat'
>>> dataSet = readOrb3D(fileIn)  # Scan single file
>>> dataSet = readOrb3D(fileBase = r'C:\share\code\ePSproc\python_dev') # Scan dir
epsproc.IO.readOrbCoords(f, headerLines)[source]
epsproc.IO.readOrbData(f, headerLines)[source]
epsproc.IO.readOrbElements(f, n)[source]
epsproc.IO.readOrbHeader(f)[source]
epsproc.IO.readXarray(fileName, filePath=None, engine='scipy')[source]

Read file from netCDF format via Xarray method.

Parameters:
  • fileName (str) – File to read.
  • filePath (str, optional, default = None) – Full path to file. If set to None (default) the file will be written in the current working directory (as returned by os.getcwd()).
Returns:

Data from file. May be in serialized format.

Return type:

Xarray

Notes

The default option for Xarray is to use Scipy netCDF writer, which does not support complex datatypes. In this case, the data array is written as a dataset with a real and imag component.

Multi-level indexing is also not supported, and must be serialized first. Ugh.

TODO: generalize multi-level indexing here.

epsproc.IO.scatEngFileParse(fileName, verbose=1)[source]

Parse an ePS file for ScatEng list.

Parameters:
  • fileName (str) – File to read (file in working dir, or full path)
  • verbose (bool or int, optional) – If true, print segment info.
Returns:

  • list – ekeList, np array of energies set in the ePS file.
  • Lists contain entries for each dumpIdy segment found in the file.

epsproc.IO.symFileParse(fileName, verbose=1)[source]

Parse an ePS file for scattering symmetries.

Parameters:
  • fileName (str) – File to read (file in working dir, or full path)
  • verbose (bool or int, optional) – If true, print segment info.
Returns:

  • list – symSegs, raw lines from the ePS file.
  • Lists contain entries for each ScatSym setting found in file header (job input).

epsproc.IO.writeOrb3Dvtk(dataSet)[source]

Write ePS 3D data file(s) to vtk format. This can be opened in, e.g., Paraview.

Parameters:
  • dataSet (list) – List of data arrays, containing matrix elements etc. from each file scanned. Assumes format as output by readOrb3D(), [fileName, headerLines, coords, data]
  • TODO (#) –
Returns:

List of output files.

Return type:

list

epsproc.IO.writeXarray(dataIn, fileName=None, filePath=None, engine='h5netcdf')[source]

Write file to netCDF format via Xarray method.

Parameters:
  • dataIn (Xarray) – Data array to write to disk.
  • fileName (str, optional, default = None) – Filename to use. If set to None (default) the file will be written with a datastamp.
  • filePath (str, optional, default = None) – Full path to file. If set to None (default) the file will be written in the current working directory (as returned by os.getcwd()).
  • engine (str, optional, default = 'h5netcdf') – netCDF engine for Xarray to_netcdf method. Some libraries may not support multidim data formats.
Returns:

Indicates save type and file path.

Return type:

str

Notes

The default option for Xarray is to use Scipy netCDF writer, which does not support complex datatypes. In this case, the data array is written as a dataset with a real and imag component.

TODO: implement try/except to handle various cases here, and test other netCDF writers (see http://xarray.pydata.org/en/stable/io.html#netcdf).

Multi-level indexing is also not supported, and must be serialized first. Ugh.