ePSdata interface demo

30/07/20

This notebook provides a short demo for getting data from ePSdata repositories (via Zenodo). These contain sets of ePolyScat computations, including all source files, processed outputs, and possibly wavefunctions - see the ePSdata webpages for more details.

Setup

All that is required is the ePSdata class from epsproc.util.

[1]:
import sys
# ePSproc test codebase (local)
if sys.platform == "win32":
    modPath = r'D:\code\github\ePSproc'  # Win test machine
else:
    modPath = r'/home/femtolab/github/ePSproc/'  # Linux test machine

sys.path.append(modPath)
# import epsproc as ep

from epsproc.util.epsdata import ePSdata
* plotly not found, plotly plots not available.
* pyevtk not found, VTK export not available.

Select a dataset for download

Currently, this supports passing a full URL, a DOI or a Zenodo ID corresponding to the record. These can be found on the ePSdata pages, or Zenodo.

As an example, let’s grab the data for CH3I (orb 20) ionization. The corresponding web pages are:

[2]:
# Create data object.
# This will check for the Zenodo record, pull some details & create a download directory.
# The default is to set a dir in the current working directory, or pass downloadDir to specify (this must already exist).
CH3Idata = ePSdata(doi='10.5281/zenodo.3660708', downloadDir='~/Downloads')
*** Download dir set to: /home/femtolab/Downloads/3660708

*** Found Zenodo record 3660708: ePSproc: CH3I wavefn run, orb 20 ionization (Iodine 4d, A1), 1 - 60 eV
Zenodo URL: http://dx.doi.org/10.5281/zenodo.3660708
Record 3660708: 5 files, 81.1 MiB
CH3I wavefn run, orb 20 ionization (Iodine 4d, A1), 1 - 60 eV - photoionization calculations with ePolyScat (ePS) + ePSproc.

*Web version*: https://phockett.github.io/ePSdata/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.html

For more details of the calculations, see readme.txt, or:
Citation details: https://phockett.github.io/ePSdata/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.html#Cite-this-dataset

*** Created /home/femtolab/Downloads/3660708
[3]:
# All relevant IDs are stored in the recordID dict
CH3Idata.recordID
[3]:
{'doi': '10.5281/zenodo.3660708',
 'url': {'doi': 'http://dx.doi.org/10.5281/zenodo.3660708',
  'get': 'https://zenodo.org/api/records/3660708',
  'epsdata': 'https://phockett.github.io/ePSdata/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.html'},
 'zenID': 3660708,
 'downloadBase': PosixPath('/home/femtolab/Downloads'),
 'downloadDir': PosixPath('/home/femtolab/Downloads/3660708')}

Download record

If all looks good, let’s pull the files with the downloadFiles() method.

[4]:
CH3Idata.downloadFiles()

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/readme.txt
Pulled to file: /home/femtolab/Downloads/3660708/readme.txt

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.ipynb
Pulled to file: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.ipynb

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.md
Pulled to file: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.md

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.json
Pulled to file: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.json

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.zip
Pulled to file: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.zip
[5]:
# Note - if the files exist they will not be redownloaded, unless the filesizes don't match,
# or overwriteFlag=True is passed.

CH3Idata.downloadFiles()

# TODO: also add hash checking here.

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/readme.txt
Local file already exists, file size OK.
Skipping download.
Existing file OK: /home/femtolab/Downloads/3660708/readme.txt

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.ipynb
Local file already exists, file size OK.
Skipping download.
Existing file OK: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.ipynb

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.md
Local file already exists, file size OK.
Skipping download.
Existing file OK: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.md

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.json
Local file already exists, file size OK.
Skipping download.
Existing file OK: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.json

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.zip
Local file already exists, file size OK.
Skipping download.
Existing file OK: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.zip

Unzipping items

In addition to a few key files at the top level, ePSdata records contain archives of source files. There is some basic handling for unzipping too, although it’s currently limited and may not be that robust.

Note: this currently only supports extraction of the full archive. If you only want the ePolyScat output data file, this can be done manually. Better archive support with file selection to follow!

[6]:
CH3Idata.unzipFiles()

*** Found 1 archive(s).

*** Unzipping archive: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.zip
Unzipped archive size will be 714.8 MiB.
Unzip? (y/n): y
Unzipped file /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.zip to directory /home/femtolab/Downloads/3660708

***Record summary
Record 10.5281/zenodo.3660708, title: ePSproc: CH3I wavefn run, orb 20 ionization (Iodine 4d, A1), 1 - 60 eV
Base dir: /home/femtolab/Downloads/3660708
Found 6 directories:
        generators
        generators/CH3I_1-60eV
        CH3I_1-60eV
        CH3I_1-60eV/orb20_A1_idy
        CH3I_1-60eV/orb20_A1_waveFn
        electronic_structure

Found 256 items, with file types:
Counter({'.dat': 240,
         '.nc': 4,
         '': 2,
         '.inp': 2,
         '.idy': 2,
         '.err': 1,
         '.md': 1,
         '.json': 1,
         '.out': 1,
         '.molden': 1,
         '.log': 1})

Working with the ePS output data

Any ePS results files are logged to .ePSout, and can be used as normal for further computations.

[7]:
CH3Idata.ePSout
[7]:
[PosixPath('/home/femtolab/Downloads/3660708/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out')]

The data can be read with the usual ePSproc methods…

[8]:
import epsproc as ep

dataSet = ep.readMatEle(fileIn = CH3Idata.ePSout)
dataXS = ep.readMatEle(fileIn = CH3Idata.ePSout, recordType = 'CrossSection')
*** ePSproc readMatEle(): scanning files for DumpIdy segments.

*** Scanning file(s)
[PosixPath('/home/femtolab/Downloads/3660708/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out')]

*** Reading ePS output file:  /home/femtolab/Downloads/3660708/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out
Expecting 24 energy points.
Expecting 2 symmetries.
Scanning CrossSection segments.
Expecting 48 DumpIdy segments.
Found 48 dumpIdy segments (sets of matrix elements).

Processing segments to Xarrays...
Processed 48 sets of DumpIdy file segments, (0 blank)
*** ePSproc readMatEle(): scanning files for CrossSection segments.

*** Scanning file(s)
[PosixPath('/home/femtolab/Downloads/3660708/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out')]

*** Reading ePS output file:  /home/femtolab/Downloads/3660708/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out
Expecting 24 energy points.
Expecting 2 symmetries.
Scanning CrossSection segments.
Expecting 3 CrossSection segments.
Found 3 CrossSection segments (sets of results).
Processed 3 sets of CrossSection file segments, (0 blank)

… and then manipulated as usual. Here’s an example with (interactive) cross-section plots using Holoviews. (For more details, see the XC plotting notebook.)

[9]:
# Import plotting code for HV-based plots,
# See https://epsproc.readthedocs.io/en/dev/tests/hvPlotters_fn_tests_150720.html
from epsproc.plot import hvPlotters

# Set options
hvPlotters.setPlotters()

# Basic use to produce two plot layout
layout, *_ = hvPlotters.XCplot(dataXS[0])

layout
[9]:

More information

[10]:
# The full file list is now attached to the object, as a list of dicts corresponding to each zip file.
# This also contains zip archive info, as a list with one entry per archive.
# Here's a taster:
print(CH3Idata.zip[0]['zipfile'])
print(*CH3Idata.zip[0]['files'][0:10], sep='\n')
/home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.zip
CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.err
CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out_BLM-L_2020-01-28_09-39-23.nc
CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out_BLM-L_2020-02-10_09-08-01.nc
CH3I_1-60eV/CH3I_1-60eV_orb20_A1.md
CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out_BLM-V_2020-02-10_09-08-01.nc
CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out_BLM-V_2020-01-28_09-39-23.nc
CH3I_1-60eV/orb20_A1_idy/
CH3I_1-60eV/CH3I_1-60eV_orb20_A1.json
CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out
CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp
[11]:
# There's also a full dir + file listing, along with other useful info, in .record.
# This has the output from os.walk(), which provides a list of tuples (path, dirs, files) for each directory.

# Root dir
CH3Idata.record['files'][0]
[11]:
('/home/femtolab/Downloads/3660708',
 ['generators', 'CH3I_1-60eV', 'electronic_structure'],
 ['CH3I_1-60eV_orb20_A1.json',
  'CH3I_1-60eV_orb20_A1.md',
  'readme.txt',
  'CH3I_1-60eV_orb20_A1.zip',
  'CH3I_1-60eV_orb20_A1.ipynb'])
[12]:
# Main data sub dir
CH3Idata.record['files'][3]
[12]:
('/home/femtolab/Downloads/3660708/CH3I_1-60eV',
 ['orb20_A1_idy', 'orb20_A1_waveFn'],
 ['CH3I_1-60eV_orb20_A1.json',
  'CH3I_1-60eV_orb20_A1.md',
  'CH3I_1-60eV_orb20_A1.inp.out_BLM-V_2020-01-28_09-39-23.nc',
  'CH3I_1-60eV_orb20_A1.inp.out_BLM-L_2020-01-28_09-39-23.nc',
  'CH3I_1-60eV_orb20_A1.inp',
  'CH3I_1-60eV_orb20_A1.inp.out_BLM-V_2020-02-10_09-08-01.nc',
  'CH3I_1-60eV_orb20_A1.inp.err',
  'CH3I_1-60eV_orb20_A1.inp.out',
  'CH3I_1-60eV_orb20_A1.inp.out_BLM-L_2020-02-10_09-08-01.nc'])

Version info

[13]:
import scooby
scooby.Report(additional=['epsproc', 'holoviews'])
[13]:
Fri Jul 31 13:30:15 2020 EDT
OS Linux CPU(s) 4 Machine x86_64
Architecture 64bit RAM 7.7 GB Environment Jupyter
Python 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
epsproc 1.2.5-dev holoviews 1.12.6 numpy 1.18.1
scipy 1.3.1 IPython 7.13.0 matplotlib 3.2.0
scooby 0.5.5
Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications