epsproc.IO module
ePSproc IO functions.
Module for file IO and data parsing.
Main function: epsproc.IO.readMatEle()
:
readMatEle(fileIn = None, fileBase = None, fType = ‘.out’):
Read ePS file(s) and return results as Xarray data structures containing matrix elements. File endings specified by fType, default .out.
History
03/10/22 Added basic R-matrix code dipoles IO.
- 27/06/22 Moved some backend functionality for read/write data to submodule ioBackends.
Also adding additional backends for data/class file IO therein.
- 07/06/22 Added various improvements to writeXarray and readXarray functionality.
See https://github.com/phockett/ePSproc/issues/8 for ongoing notes.
13/10/20 Adapted main function readMatEle() to use grouped lists for multi-file jobs, should be back-compatible if stackE = False set.
- 06/11/19 Added jobInfo and molInfo data structures, from ePS file via
epsproc.IO.headerFileParse()
andepsproc.IO.molInfoParse()
. Still needs a bit of work, and may want to implement other (comp chem) libraries here.
14/10/19 Added/debugged read functions for CrossSecion segments.
27/09/19 Added read functions for EDCS segments.
- 17/09/19 Added read/write to/from netCDF files for Xarrays.
Use built-in methods, with work-arounds for complex number format issues.
29/08/19 Updating docs to rst.
- 26/08/19 Added parsing for E, sym parameters from head of ePS file.
Added error checking by comparing read mat elements to expected list. Changed & fixed Xarray indexing - matrix elements now output with dims (LM, Eke, Sym, mu, it, Type) Current code rather ugly however.
19/08/19 Add functions for reading wavefunction files (3D data)
07/08/19 Naming convention tweaks, and some changes to comments, following basic tests with Sphinx.
- 05/08/19 v1 Initial python version.
Working, but little error checking as yet. Needs some tidying.
To do
Add IO for other file segments (only DumpIdy supported so far).
Better logic & flexibility for file scanning.
Restructure as class for brevity…?
More sophisticated methods/data structures for job & molecule info handling.
- epsproc.IO.EDCSFileParse(fileName, verbose=1)[source]
Parse an ePS file for EDCS segments.
- Parameters
fileName (str) – File to read (file in working dir, or full path)
verbose (bool or int, optional) – If true, print segment info.
- Returns
list – [lineStart, lineStop], ints for line #s found from start and end phrases.
list – dumpSegs, list of lines read from file.
Lists contain entries for each dumpIdy segment found in the file.
- epsproc.IO.EDCSSegParse(dumpSeg, verbose=False)[source]
Extract values from EDCS file segments.
- Parameters
dumpSeg (list) – One EDCS segment, from dumpSegs[], as returned by
epsproc.IO.EDCSFileParse()
verbose (bool, int, default = False) – Print additional info during run.
- Returns
np.array – EDCS, array of scattering XS, [theta, Cross Section (Angstrom^2)]
list – attribs, list [Label, value, units]
Notes
Only reads (theta,I) information from an “EDCS - differential cross section program” segment. Currently this is a bit messy, and relies on fixed EDCS format. No error checking as yet. Not yet reading all attribs.
Example
>>> EDCS, attribs = EDCSSegParse(dumpSegs[0])
- epsproc.IO.EDCSSegsParseX(dumpSegs)[source]
Extract data from ePS EDCS segments into usable form.
- Parameters
dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by
epsproc.IO.EDCSFileParse()
- Returns
xr.array – Xarray data array, containing cross sections. Dimensions (Eke, theta)
int – Number of blank segments found. (CURRENTLY not implemented.)
Example
>>> data = EDCSSegsParseX(dumpSegs)
Notes
A rather cut-down version of
epsproc.IO.dumpIdySegsParseX()
, no error checking currently implemented.
- epsproc.IO.dumpIdyFileParse(fileName, verbose=1)[source]
Parse an ePS file for dumpIdy segments.
- Parameters
fileName (str) – File to read (file in working dir, or full path)
verbose (bool or int, optional) – If true, print segment info.
- Returns
list – [lineStart, lineStop], ints for line #s found from start and end phrases.
list – dumpSegs, list of lines read from file.
Lists contain entries for each dumpIdy segment found in the file.
- epsproc.IO.dumpIdySegParse(dumpSeg)[source]
Extract values from dumpIdy file segments.
- Parameters
dumpSeg (list) – One dumpIdy segment, from dumpSegs[], as returned by
epsproc.IO.dumpIdyFileParse()
- Returns
np.array – rawIdy, array of matrix elements, [m,l,mu,ip,it,Re,Im]
list – attribs, list [Label, value, units]
Notes
Currently this is a bit messy, and relies on fixed DumpIDY format. No error checking as yet. Not yet reading all attribs.
Example
>>> matEle, attribs = dumpIdySegParse(dumpSegs[0])
- epsproc.IO.dumpIdySegsParseX(dumpSegs, ekeListUn, symSegs, verbose=1)[source]
Extract data from ePS dumpIdy segments into usable form.
- Parameters
dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by
epsproc.IO.dumpIdyFileParse()
ekeListUn (list) – List of energies, used for error-checking and Xarray rearraging, as returned by
epsproc.IO.scatEngFileParse()
verbose (bool, default True) – Print job info from file header if true.
- Returns
xr.array – Xarray data array, containing matrix elements etc. Dimensions (LM, Eke, Sym, mu, it, Type)
int – Number of blank segments found.
Example
>>> data = dumpIdySegsParseX(dumpSegs)
- epsproc.IO.fileParse(fileName, startPhrase=None, endPhrase=None, comment=None, verbose=0)[source]
Parse a file, return segment(s) from startPhrase:endPhase, excluding comments.
- Parameters
fileName (str) – File to read (file in working dir, or full path)
startPhrase (str, optional) – Phrase denoting start of section to read. Default = None
endPhase (str or list, optional) – Phrase denoting end of section to read. Default = None
comment (str, optional) – Phrase denoting comment lines, which are skipped. Default = None
verbose (int, optional, default = 1) – Level of verbosity in output. - 0 no printed output - 1 print summary info only - 2 print detailed info
- Returns
list – [lineStart, lineStop], ints for line #s found from start and end phrases.
list – segments, list of lines read from file.
All lists can contain multiple entries, if more than one segment matches the search criteria.
- epsproc.IO.getCroFileParse(fileName, verbose=1)[source]
Parse an ePS file for GetCro/CrossSection segments.
- Parameters
fileName (str) – File to read (file in working dir, or full path)
verbose (bool or int, optional) – If true, print segment info.
- Returns
list – [lineStart, lineStop], ints for line #s found from start and end phrases.
list – dumpSegs, list of lines read from file.
Lists contain entries for each dumpIdy segment found in the file.
- epsproc.IO.getCroSegParse(dumpSeg)[source]
Extract values from GetCro/CrossSection file segments.
- Parameters
dumpSeg (list) – One CrossSection segment, from dumpSegs[], as returned by
epsproc.IO.getCroFileParse()
- Returns
np.array – CrossSections, table of results vs. energy.
list – attribs, list [Label, value, units]
Notes
Currently this is a bit messy, and relies on fixed CrossSection output format. No error checking as yet. Not yet reading all attribs.
Example
>>> XS, attribs = getCroSegParse(dumpSegs[0])
- epsproc.IO.getCroSegsParseX(dumpSegs, symSegs, ekeList)[source]
Extract data from ePS getCro/CrossSecion segments into usable form.
- Parameters
dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by
epsproc.IO.getCroFileParse()
- Returns
xr.array – Xarray data array, containing cross sections. Dimensions (Eke, theta)
int – Number of blank segments found. (CURRENTLY not implemented.)
Example
>>> data = getCroSegsParseX(dumpSegs)
Notes
A rather cut-down version of
epsproc.IO.dumpIdySegsParseX()
, no error checking currently implemented.
- epsproc.IO.getFiles(fileIn=None, fileBase=None, fType='.out', verbose=True)[source]
Scan dir for ePS (or other) file(s) and return results as a list. File endings specified by fType, default .out. Files are checked for existence with
- Parameters
fileIn (str, list of strs, optional, default = None) – File(s) to read (file in working dir, or full path). Defaults to current working dir if only a file name is supplied. If None, fileBase dir will be scanned for files. If a list, items will be tested for validity. For consistent results, pass raw strings, e.g.
fileIn = r"C:\share\code\ePSproc\python_dev\no2_demo_ePS.out"
fileBase (str, optional.) – Dir to scan for files. Currently only accepts a single dir. Defaults to current working dir if no other parameters are passed.
fType (str, optional) – File ending for ePS output files, default ‘.out’
verbose (bool, optional) – Print output details, default True.
- Returns
List of Xarray data arrays, containing matrix elements etc. from each file scanned.
- Return type
list
Note: scans only a single dir, no subdirs, using os.listdir. See also classes.multiJob.ePSmultiJob.scanDirs() for alternative with Glob & subdir checks.
- epsproc.IO.headerFileParse(fileName, verbose=True)[source]
Parse an ePS file for header & input job info.
- Parameters
fileName (str) – File to read (file in working dir, or full path)
verbose (bool, default True) – Print job info from file header if true.
- Returns
jobInfo (dict) – Dictionary generated from job details.
TO DO
—–
- Tidy up methods - maybe with parseDigits?
- Tidy up dict output.
- epsproc.IO.matEleGroupDim(data, dimGroups=[3, 4, 2])[source]
Group ePS matrix elements by redundant labels.
Default is to group by [‘ip’, ‘it’, ‘mu’] terms, all have only a few values.
TODO: better ways to do this? Shoud be possible at Xarray level.
- Parameters
data (list) – Sections from dumpIdy segment, as created in dumpIdySegsParseX() Ordering is [labels, matElements, attribs].
- epsproc.IO.matEleGroupDimX(daIn)[source]
Group ePS matrix elements by redundant labels (Xarray version).
Group by [‘ip’, ‘it’, ‘mu’] terms, all have only a few values. Rename ‘ip’:1,2 as ‘Type’:’L’,’V’
TODO: better ways to do this? Via Stack/Unstack? http://xarray.pydata.org/en/stable/api.html#id16 See also tests in funcTests_210819.py for more versions/tests.
- Parameters
data (Xarray) – Data array with matrix elements to be split and recombined by dims.
- Returns
data (Xarray) – Data array with reordered matrix elements (dimensions).
NOTE Oct 2022 (this is currently failing at ‘it’ restack in XR >2022.3, likely due to change in selectors?)
See https (//github.com/phockett/ePSproc/issues/64)
Should rewrite in any case!
- epsproc.IO.matEleGroupDimXnested(da)[source]
Group ePS matrix elements by redundant labels (Xarray version).
Group by [‘ip’, ‘it’, ‘mu’] terms, all have only a few values.
TODO: better ways to do this? See also tests in funcTests_210819.py for more versions/tests.
- Parameters
data (Xarray) – Data array with matrix elements to be split and recombined by dims.
- epsproc.IO.molInfoParse(fileName, verbose=True)[source]
Parse an ePS file for input molecule info.
- Parameters
fileName (str) – File to read (file in working dir, or full path)
verbose (bool, default True) – Print job info from file header if true.
- Returns
molInfo – Dictionary with atom & orbital details.
- Return type
dict
Notes
Only tested for Molden input (MoldenCnv2006).
- epsproc.IO.parseLineDigits(testLine)[source]
Use regular expressions to extract digits from a string. https://stackoverflow.com/questions/4289331/how-to-extract-numbers-from-a-string-in-python
- epsproc.IO.readMatEle(fileIn=None, fileBase=None, fType='.out', recordType='DumpIdy', verbose=1, stackE=True, stackDim='Eke')[source]
Read ePS file(s) and return results as Xarray data structures. File endings specified by fType, default *.out.
- Parameters
fileIn (str, list of strs, optional.) – File(s) to read (file in working dir, or full path). Defaults to current working dir if only a file name is supplied. For consistent results, pass raw strings, e.g.
fileIn = r"C:\share\code\ePSproc\python_dev\no2_demo_ePS.out"
fileBase (str, optional.) – Dir to scan for files. Currently only accepts a single dir. Defaults to current working dir if no other parameters are passed.
fType (str, optional) – File ending for ePS output files, default ‘.out’
recordType (str, optional, default 'DumpIdy') – Type of record to scan for, currently set for ‘DumpIdy’, ‘EDCS’ or ‘CrossSection’. For a full list of descriptions, types and sources, run: >>> epsproc.util.dataTypesList()
verbose (int, optional, default = 1) – Level of verbosity in output. - 0 no printed output - 1 print summary info only - 2 print detailed info
stackE (bool, optional, default = True) – Identify and stack multi-part jobs to single array (by E) if True.
stackDim (bool, optional, default = 'Eke') – Dim to stack. Note if stackE=True, any dim can be set here (not just E).
- Returns
list – List of Xarray data arrays, containing matrix elements etc. from each file scanned.
To do
—–
- Change to pathlib paths.
- Implement outputType options…?
20/07/22 Added dict return for dumpIdy methods for troubleshooting if getCroSegsParseX fails. – Note this only returns if stackE = False is also set.
13/10/20 Adapted to use grouped lists for multi-file jobs, should be back-compatible if stackE = False set.
Examples
>>> dataSet = readMatEle() # Scan current dir
>>> fileIn = r'C:\share\code\ePSproc\python_dev\no2_demo_ePS.out' >>> dataSet = readMatEle(fileIn) # Scan single file
>>> dataSet = readMatEle(fileBase = r'C:\share\code\ePSproc\python_dev') # Scan dir
Note
Files are scanned for matrix element output flagged by “DumpIdy” headers.
Each segment found is parsed for attributes and data (set of matrix elements).
Matrix elements and attributes are combined and output as an Xarray array.
- epsproc.IO.readOrb3D(fileIn=None, fileBase=None, fType='_Orb.dat', verbose=True)[source]
Read ePS 3D data file(s) and return results. File endings specified by fType, default *_Orb.dat.
- fileInstr, list of strs, optional.
File(s) to read (file in working dir, or full path). Defaults to current working dir if only a file name is supplied. For consistent results, pass raw strings, e.g. fileIn = r”C:sharecodeePSprocpython_dev
o2_demo_ePS.out”
- fileBasestr, optional.
Dir to scan for files. Currently only accepts a single dir. Defaults to current working dir if no other parameters are passed.
- fTypestr, optional
File ending for ePS output files, default ‘_Orb.dat’
- verbosebool, optional
Print output details, default True.
- list
List of data arrays, containing matrix elements etc. from each file scanned.
# TODO: Change output to Xarray?
>>> dataSet = readOrb3D() # Scan current dir
>>> fileIn = r'C:\share\code\ePSproc\python_dev\DABCOSA2PPCA2PP_10.5eV_Orb.dat' >>> dataSet = readOrb3D(fileIn) # Scan single file
>>> dataSet = readOrb3D(fileBase = r'C:\share\code\ePSproc\python_dev') # Scan dir
- epsproc.IO.readXarray(fileName, filePath=None, engine='h5netcdf', forceComplex=False, forceArray=True, **kwargs)[source]
Wrapper for backend Xarray file readers.
- epsproc.IO.scatEngFileParse(fileName, verbose=1)[source]
Parse an ePS file for ScatEng list.
- Parameters
fileName (str) – File to read (file in working dir, or full path)
verbose (bool or int, optional) – If true, print segment info.
- Returns
list – ekeList, np array of energies set in the ePS file.
Lists contain entries for each dumpIdy segment found in the file.
- epsproc.IO.symFileParse(fileName, verbose=1)[source]
Parse an ePS file for scattering symmetries.
- Parameters
fileName (str) – File to read (file in working dir, or full path)
verbose (bool or int, optional) – If true, print segment info.
- Returns
list – symSegs, raw lines from the ePS file.
Lists contain entries for each ScatSym setting found in file header (job input).
- epsproc.IO.writeOrb3Dvtk(dataSet)[source]
Write ePS 3D data file(s) to vtk format. This can be opened in, e.g., Paraview.
- Parameters
dataSet (list) – List of data arrays, containing matrix elements etc. from each file scanned. Assumes format as output by readOrb3D(), [fileName, headerLines, coords, data]
TODO (#) –
- Returns
List of output files.
- Return type
list
Note
Uses Paulo Herrera’s eVTK, see:
pip install pyevtk to install.