epsproc.IO module¶
ePSproc IO functions.¶
Module for file IO and data parsing.
Main function: epsproc.IO.readMatEle()
:
readMatEle(fileIn = None, fileBase = None, fType = ‘.out’):
Read ePS file(s) and return results as Xarray data structures containing matrix elements. File endings specified by fType, default .out.
History¶
13/10/20 Adapted main function readMatEle() to use grouped lists for multi-file jobs, should be back-compatible if stackE = False set.
- 06/11/19 Added jobInfo and molInfo data structures, from ePS file via
epsproc.IO.headerFileParse()
andepsproc.IO.molInfoParse()
. - Still needs a bit of work, and may want to implement other (comp chem) libraries here.
14/10/19 Added/debugged read functions for CrossSecion segments.
27/09/19 Added read functions for EDCS segments.
- 17/09/19 Added read/write to/from netCDF files for Xarrays.
- Use built-in methods, with work-arounds for complex number format issues.
29/08/19 Updating docs to rst.
- 26/08/19 Added parsing for E, sym parameters from head of ePS file.
- Added error checking by comparing read mat elements to expected list. Changed & fixed Xarray indexing - matrix elements now output with dims (LM, Eke, Sym, mu, it, Type) Current code rather ugly however.
19/08/19 Add functions for reading wavefunction files (3D data)
07/08/19 Naming convention tweaks, and some changes to comments, following basic tests with Sphinx.
- 05/08/19 v1 Initial python version.
- Working, but little error checking as yet. Needs some tidying.
To do¶
- Add IO for other file segments (only DumpIdy supported so far).
- Better logic & flexibility for file scanning.
- Restructure as class for brevity…?
- More sophisticated methods/data structures for job & molecule info handling.
-
epsproc.IO.
EDCSFileParse
(fileName, verbose=1)[source]¶ Parse an ePS file for EDCS segments.
Parameters: - fileName (str) – File to read (file in working dir, or full path)
- verbose (bool or int, optional) – If true, print segment info.
Returns: - list – [lineStart, lineStop], ints for line #s found from start and end phrases.
- list – dumpSegs, list of lines read from file.
- Lists contain entries for each dumpIdy segment found in the file.
-
epsproc.IO.
EDCSSegParse
(dumpSeg)[source]¶ Extract values from EDCS file segments.
Parameters: dumpSeg (list) – One EDCS segment, from dumpSegs[], as returned by epsproc.IO.EDCSFileParse()
Returns: - np.array – EDCS, array of scattering XS, [theta, Cross Section (Angstrom^2)]
- list – attribs, list [Label, value, units]
Notes
Currently this is a bit messy, and relies on fixed EDCS format. No error checking as yet. Not yet reading all attribs.
Example
>>> EDCS, attribs = EDCSSegParse(dumpSegs[0])
-
epsproc.IO.
EDCSSegsParseX
(dumpSegs)[source]¶ Extract data from ePS EDCS segments into usable form.
Parameters: dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by epsproc.IO.EDCSFileParse()
Returns: - xr.array – Xarray data array, containing cross sections. Dimensions (Eke, theta)
- int – Number of blank segments found. (CURRENTLY not implemented.)
Example
>>> data = EDCSSegsParseX(dumpSegs)
Notes
A rather cut-down version of
epsproc.IO.dumpIdySegsParseX()
, no error checking currently implemented.
-
epsproc.IO.
dumpIdyFileParse
(fileName, verbose=1)[source]¶ Parse an ePS file for dumpIdy segments.
Parameters: - fileName (str) – File to read (file in working dir, or full path)
- verbose (bool or int, optional) – If true, print segment info.
Returns: - list – [lineStart, lineStop], ints for line #s found from start and end phrases.
- list – dumpSegs, list of lines read from file.
- Lists contain entries for each dumpIdy segment found in the file.
-
epsproc.IO.
dumpIdySegParse
(dumpSeg)[source]¶ Extract values from dumpIdy file segments.
Parameters: dumpSeg (list) – One dumpIdy segment, from dumpSegs[], as returned by epsproc.IO.dumpIdyFileParse()
Returns: - np.array – rawIdy, array of matrix elements, [m,l,mu,ip,it,Re,Im]
- list – attribs, list [Label, value, units]
Notes
Currently this is a bit messy, and relies on fixed DumpIDY format. No error checking as yet. Not yet reading all attribs.
Example
>>> matEle, attribs = dumpIdySegParse(dumpSegs[0])
-
epsproc.IO.
dumpIdySegsParseX
(dumpSegs, ekeListUn, symSegs, verbose=1)[source]¶ Extract data from ePS dumpIdy segments into usable form.
Parameters: - dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by
epsproc.IO.dumpIdyFileParse()
- ekeListUn (list) – List of energies, used for error-checking and Xarray rearraging, as returned by
epsproc.IO.scatEngFileParse()
- verbose (bool, default True) – Print job info from file header if true.
Returns: - xr.array – Xarray data array, containing matrix elements etc. Dimensions (LM, Eke, Sym, mu, it, Type)
- int – Number of blank segments found.
Example
>>> data = dumpIdySegsParseX(dumpSegs)
- dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by
-
epsproc.IO.
fileParse
(fileName, startPhrase=None, endPhrase=None, comment=None, verbose=0)[source]¶ Parse a file, return segment(s) from startPhrase:endPhase, excluding comments.
Parameters: - fileName (str) – File to read (file in working dir, or full path)
- startPhrase (str, optional) – Phrase denoting start of section to read. Default = None
- endPhase (str or list, optional) – Phrase denoting end of section to read. Default = None
- comment (str, optional) – Phrase denoting comment lines, which are skipped. Default = None
- verbose (int, optional, default = 1) – Level of verbosity in output. - 0 no printed output - 1 print summary info only - 2 print detailed info
Returns: - list – [lineStart, lineStop], ints for line #s found from start and end phrases.
- list – segments, list of lines read from file.
- All lists can contain multiple entries, if more than one segment matches the search criteria.
-
epsproc.IO.
getCroFileParse
(fileName, verbose=1)[source]¶ Parse an ePS file for GetCro/CrossSection segments.
Parameters: - fileName (str) – File to read (file in working dir, or full path)
- verbose (bool or int, optional) – If true, print segment info.
Returns: - list – [lineStart, lineStop], ints for line #s found from start and end phrases.
- list – dumpSegs, list of lines read from file.
- Lists contain entries for each dumpIdy segment found in the file.
-
epsproc.IO.
getCroSegParse
(dumpSeg)[source]¶ Extract values from GetCro/CrossSection file segments.
Parameters: dumpSeg (list) – One CrossSection segment, from dumpSegs[], as returned by epsproc.IO.getCroFileParse()
Returns: - np.array – CrossSections, table of results vs. energy.
- list – attribs, list [Label, value, units]
Notes
Currently this is a bit messy, and relies on fixed CrossSection output format. No error checking as yet. Not yet reading all attribs.
Example
>>> XS, attribs = getCroSegParse(dumpSegs[0])
-
epsproc.IO.
getCroSegsParseX
(dumpSegs, symSegs, ekeList)[source]¶ Extract data from ePS getCro/CrossSecion segments into usable form.
Parameters: dumpSegs (list) – Set of dumpIdy segments, i.e. dumpSegs, as returned by epsproc.IO.getCroFileParse()
Returns: - xr.array – Xarray data array, containing cross sections. Dimensions (Eke, theta)
- int – Number of blank segments found. (CURRENTLY not implemented.)
Example
>>> data = getCroSegsParseX(dumpSegs)
Notes
A rather cut-down version of
epsproc.IO.dumpIdySegsParseX()
, no error checking currently implemented.
-
epsproc.IO.
getFiles
(fileIn=None, fileBase=None, fType='.out', verbose=True)[source]¶ Read ePS file(s) and return results as Xarray data structures. File endings specified by fType, default .out.
Parameters: - fileIn (str, list of strs, optional.) – File(s) to read (file in working dir, or full path).
Defaults to current working dir if only a file name is supplied.
For consistent results, pass raw strings, e.g.
fileIn = r"C:\share\code\ePSproc\python_dev\no2_demo_ePS.out"
- fileBase (str, optional.) – Dir to scan for files. Currently only accepts a single dir. Defaults to current working dir if no other parameters are passed.
- fType (str, optional) – File ending for ePS output files, default ‘.out’
- verbose (bool, optional) – Print output details, default True.
Returns: List of Xarray data arrays, containing matrix elements etc. from each file scanned.
Return type: list
- fileIn (str, list of strs, optional.) – File(s) to read (file in working dir, or full path).
Defaults to current working dir if only a file name is supplied.
For consistent results, pass raw strings, e.g.
-
epsproc.IO.
headerFileParse
(fileName, verbose=True)[source]¶ Parse an ePS file for header & input job info.
Parameters: - fileName (str) – File to read (file in working dir, or full path)
- verbose (bool, default True) – Print job info from file header if true.
Returns: - jobInfo (dict) – Dictionary generated from job details.
- TO DO
- —–
- - Tidy up methods - maybe with parseDigits?
- - Tidy up dict output.
-
epsproc.IO.
matEleGroupDim
(data, dimGroups=[3, 4, 2])[source]¶ Group ePS matrix elements by redundant labels.
Default is to group by [‘ip’, ‘it’, ‘mu’] terms, all have only a few values.
TODO: better ways to do this? Shoud be possible at Xarray level.
Parameters: data (list) – Sections from dumpIdy segment, as created in dumpIdySegsParseX() Ordering is [labels, matElements, attribs].
-
epsproc.IO.
matEleGroupDimX
(daIn)[source]¶ Group ePS matrix elements by redundant labels (Xarray version).
Group by [‘ip’, ‘it’, ‘mu’] terms, all have only a few values. Rename ‘ip’:1,2 as ‘Type’:’L’,’V’
TODO: better ways to do this? Via Stack/Unstack? http://xarray.pydata.org/en/stable/api.html#id16 See also tests in funcTests_210819.py for more versions/tests.
Parameters: data (Xarray) – Data array with matrix elements to be split and recombined by dims. Returns: data – Data array with reordered matrix elements (dimensions). Return type: Xarray
-
epsproc.IO.
matEleGroupDimXnested
(da)[source]¶ Group ePS matrix elements by redundant labels (Xarray version).
Group by [‘ip’, ‘it’, ‘mu’] terms, all have only a few values.
TODO: better ways to do this? See also tests in funcTests_210819.py for more versions/tests.
Parameters: data (Xarray) – Data array with matrix elements to be split and recombined by dims.
-
epsproc.IO.
molInfoParse
(fileName, verbose=True)[source]¶ Parse an ePS file for input molecule info.
Parameters: - fileName (str) – File to read (file in working dir, or full path)
- verbose (bool, default True) – Print job info from file header if true.
Returns: molInfo – Dictionary with atom & orbital details.
Return type: dict
Notes
Only tested for Molden input (MoldenCnv2006).
-
epsproc.IO.
parseLineDigits
(testLine)[source]¶ Use regular expressions to extract digits from a string. https://stackoverflow.com/questions/4289331/how-to-extract-numbers-from-a-string-in-python
-
epsproc.IO.
readMatEle
(fileIn=None, fileBase=None, fType='.out', recordType='DumpIdy', verbose=1, stackE=True)[source]¶ Read ePS file(s) and return results as Xarray data structures. File endings specified by fType, default *.out.
Parameters: - fileIn (str, list of strs, optional.) – File(s) to read (file in working dir, or full path).
Defaults to current working dir if only a file name is supplied.
For consistent results, pass raw strings, e.g.
fileIn = r"C:\share\code\ePSproc\python_dev\no2_demo_ePS.out"
- fileBase (str, optional.) – Dir to scan for files. Currently only accepts a single dir. Defaults to current working dir if no other parameters are passed.
- fType (str, optional) – File ending for ePS output files, default ‘.out’
- recordType (str, optional, default 'DumpIdy') – Type of record to scan for, currently set for ‘DumpIdy’, ‘EDCS’ or ‘CrossSection’. For a full list of descriptions, types and sources, run: >>> epsproc.util.dataTypesList()
- verbose (int, optional, default = 1) – Level of verbosity in output. - 0 no printed output - 1 print summary info only - 2 print detailed info
- stackE (bool, optional, default = True) – Identify and stack multi-part jobs to single array (by E) if True.
Returns: - list – List of Xarray data arrays, containing matrix elements etc. from each file scanned.
- To do
- —–
- - Change to pathlib paths.
- - Implement outputType options…?
- 13/10/20 Adapted to use grouped lists for multi-file jobs, should be back-compatible if stackE = False set.
Examples
>>> dataSet = readMatEle() # Scan current dir
>>> fileIn = r'C:\share\code\ePSproc\python_dev\no2_demo_ePS.out' >>> dataSet = readMatEle(fileIn) # Scan single file
>>> dataSet = readMatEle(fileBase = r'C:\share\code\ePSproc\python_dev') # Scan dir
Note
- Files are scanned for matrix element output flagged by “DumpIdy” headers.
- Each segment found is parsed for attributes and data (set of matrix elements).
- Matrix elements and attributes are combined and output as an Xarray array.
- fileIn (str, list of strs, optional.) – File(s) to read (file in working dir, or full path).
Defaults to current working dir if only a file name is supplied.
For consistent results, pass raw strings, e.g.
-
epsproc.IO.
readOrb3D
(fileIn=None, fileBase=None, fType='_Orb.dat', verbose=True)[source]¶ Read ePS 3D data file(s) and return results. File endings specified by fType, default *_Orb.dat.
- fileIn : str, list of strs, optional.
- File(s) to read (file in working dir, or full path). Defaults to current working dir if only a file name is supplied. For consistent results, pass raw strings, e.g. fileIn = r”C:sharecodeePSprocpython_dev
o2_demo_ePS.out”
- fileBase : str, optional.
- Dir to scan for files. Currently only accepts a single dir. Defaults to current working dir if no other parameters are passed.
- fType : str, optional
- File ending for ePS output files, default ‘_Orb.dat’
- verbose : bool, optional
- Print output details, default True.
- list
- List of data arrays, containing matrix elements etc. from each file scanned.
# TODO: Change output to Xarray?
>>> dataSet = readOrb3D() # Scan current dir
>>> fileIn = r'C:\share\code\ePSproc\python_dev\DABCOSA2PPCA2PP_10.5eV_Orb.dat' >>> dataSet = readOrb3D(fileIn) # Scan single file
>>> dataSet = readOrb3D(fileBase = r'C:\share\code\ePSproc\python_dev') # Scan dir
-
epsproc.IO.
readXarray
(fileName, filePath=None, engine='scipy')[source]¶ Read file from netCDF format via Xarray method.
Parameters: - fileName (str) – File to read.
- filePath (str, optional, default = None) – Full path to file. If set to None (default) the file will be written in the current working directory (as returned by os.getcwd()).
Returns: Data from file. May be in serialized format.
Return type: Xarray
Notes
The default option for Xarray is to use Scipy netCDF writer, which does not support complex datatypes. In this case, the data array is written as a dataset with a real and imag component.
Multi-level indexing is also not supported, and must be serialized first. Ugh.
TODO: generalize multi-level indexing here.
-
epsproc.IO.
scatEngFileParse
(fileName, verbose=1)[source]¶ Parse an ePS file for ScatEng list.
Parameters: - fileName (str) – File to read (file in working dir, or full path)
- verbose (bool or int, optional) – If true, print segment info.
Returns: - list – ekeList, np array of energies set in the ePS file.
- Lists contain entries for each dumpIdy segment found in the file.
-
epsproc.IO.
symFileParse
(fileName, verbose=1)[source]¶ Parse an ePS file for scattering symmetries.
Parameters: - fileName (str) – File to read (file in working dir, or full path)
- verbose (bool or int, optional) – If true, print segment info.
Returns: - list – symSegs, raw lines from the ePS file.
- Lists contain entries for each ScatSym setting found in file header (job input).
-
epsproc.IO.
writeOrb3Dvtk
(dataSet)[source]¶ Write ePS 3D data file(s) to vtk format. This can be opened in, e.g., Paraview.
Parameters: - dataSet (list) – List of data arrays, containing matrix elements etc. from each file scanned. Assumes format as output by readOrb3D(), [fileName, headerLines, coords, data]
- TODO (#) –
Returns: List of output files.
Return type: list
Note
Uses Paulo Herrera’s eVTK, see:
- https://pyscience.wordpress.com/2014/09/06/numpy-to-vtk-converting-your-numpy-arrays-to-vtk-arrays-and-files/
- https://bitbucket.org/pauloh/pyevtk/src/default/
pip install pyevtk to install.
-
epsproc.IO.
writeXarray
(dataIn, fileName=None, filePath=None, engine='h5netcdf')[source]¶ Write file to netCDF format via Xarray method.
Parameters: - dataIn (Xarray) – Data array to write to disk.
- fileName (str, optional, default = None) – Filename to use. If set to None (default) the file will be written with a datastamp.
- filePath (str, optional, default = None) – Full path to file. If set to None (default) the file will be written in the current working directory (as returned by os.getcwd()).
- engine (str, optional, default = 'h5netcdf') – netCDF engine for Xarray to_netcdf method. Some libraries may not support multidim data formats.
Returns: Indicates save type and file path.
Return type: str
Notes
The default option for Xarray is to use Scipy netCDF writer, which does not support complex datatypes. In this case, the data array is written as a dataset with a real and imag component.
TODO: implement try/except to handle various cases here, and test other netCDF writers (see http://xarray.pydata.org/en/stable/io.html#netcdf).
Multi-level indexing is also not supported, and must be serialized first. Ugh.