1. Quickstart#

First let import the modules to work with. Some functionalities of aleimi may not be available on PyPi or Conda yet. This could be because the new version is not yet release, but in brief it is going to be. You could install directly from the repo if some problem pops up. Just paste the following in a code cell:

! pip install git+https://github.com/ale94mleon/aleimi.git@main

1.1. As a command line#

The command line interface of aleimi should be straightforward. It only needs the path were the molecules are and (if needed) a configuration yaml file with the parameters definition.

! aleimi-run -h
usage: aleimi-run [-h] [-p PARAMS] [-v] suppl

positional arguments:
  suppl                 The path to the directory were the molecule(s) is(are)

optional arguments:
  -h, --help            show this help message and exit
  -p PARAMS, --params PARAMS
                        Parameters to run ALEIMI
  -v, --version         show program's version number and exit

Import modules to work

import tempfile, os, yaml
import pandas as pd
import glob
# If you wnat to keep the files and explore the directory, set make_tmp_file = False
make_tmp_file = True

if make_tmp_file:
    tmp_path = tempfile.TemporaryDirectory()
    os.chdir(tmp_path.name)
else:
    os.makedirs('wd', exist_ok=True)
    os.chdir('wd')

We will use a .smi input. In reality we could use as input any molecule with extension: psb, mol, mol2 or smi. In this file we will define three molecules based on their SMILES representation.

suppl = 'suppl.smi'

with open(suppl, 'w') as s:
    s.write("COC(=O)C=1C=CC(=CC1)S(=O)(=O)N\nCCO\nCNCO")
! cat suppl.smi
COC(=O)C=1C=CC(=CC1)S(=O)(=O)N
CCO
CNCO
params = {
# aleimi.confgen.main
# Number of randoms conformations to generate, usually more than 1000 is recommended.
'numConfs': 10,
'rdkit_d_RMSD': 0.2,
# We will not optimize at classical level, only use the distance geometry method of RDKit
'UFF': False,
# The num,ber of threads for the conformer generation and/or classical optimization
'rdkit_numThreads': 0,
# Definition of the semi-empirical job, here we are using implicit solvent (EPS=78.4)
'mopac_keywords': 'PM7 precise ef xyz geo-ok t=3h EPS=78.4',

# aleimi.boltzmann.main
# Cutoff to consider redundant conformation, respect to the geometry
'Bd_rmsd': 1.0,
# Cutoff to consider redundant conformation, respect to the energy
'Bd_E': 0.0,

# aleimi.extractor.main
# How many conformation will pass for the refinement step at DF-MP2/6-31+G(d)/heavy-aug-cc-pvdz-jkfit
"energy_cut": 2,
# Engine to prepare the file for QM refinement, a frequency calculation will also be carry out in order to ensure that the critical point is a minimum and not a seattle point.
# in the Potential Energy Surface.
"engine": 'psi4',
}
with open('params.yml', 'w') as p:
    yaml.dump(params, p)

! cat params.yml
BOutPath: true
Bd_E: 0.0001
Bd_rmsd: 2.0
UFF: false
energy_cut: 2
engine: psi4
mopac_keywords: PM7 precise ef xyz geo-ok t=3h EPS=78.4
numConfs: 10
rdkit_d_RMSD: 0.2
rdkit_numThreads: 0

Now we just need to run aleimi from the command line

! aleimi-run suppl.smi --params params.yml
Resumen

Se están generando 10 conformaciones para la molécula con Id = 1...
Calculando RMSD ...
conf_mol_1:  10/10
Archivos de salida: conf_mol_1.sdf, conf_mol_1.mop
Número de átomos: 23

Se están generando 10 conformaciones para la molécula con Id = 2...
Calculando RMSD ...
conf_mol_2:  10/10
Archivos de salida: conf_mol_2.sdf, conf_mol_2.mop
Número de átomos: 9

Se están generando 10 conformaciones para la molécula con Id = 3...
Calculando RMSD ...
conf_mol_3:  10/10
Archivos de salida: conf_mol_3.sdf, conf_mol_3.mop
Número de átomos: 11

conf_mol_1
Mopac is running ...
Done!
'mopac'  13099.86 ms
conf_mol_2
Mopac is running ...
Done!
'mopac'  1446.40 ms
conf_mol_3
Mopac is running ...
Done!
'mopac'  6105.99 ms

In the working directory we will have for every input molecule:

  • *.sdf the stochastic conformation generated through RDKit.

  • *_boltzmann.csv . Here we have the information of the occupancy of each conformation

  • *.arc and *.out outputs of MOPAC.

Then it will be the directories for the selected conformation with the input files to perform the QM calculation:

  • job.sh

  • Input file for the selected engine: Psi4, Gaussian or Orca.

os.listdir()
['conf_mol_1_Cell_9',
 'params.yml',
 'conf_mol_2.arc',
 'conf_mol_3.sdf',
 'conf_mol_3_Cell_5',
 'conf_mol_1.mop',
 'conf_mol_2_boltzmann.csv',
 'conf_mol_2_Cell_8',
 'conf_mol_2_Cell_10',
 'conf_mol_2.out',
 'suppl.smi',
 'conf_mol_3.arc',
 'conf_mol_3.out',
 'conf_mol_2_Cell_5',
 'conf_mol_3_boltzmann.csv',
 'conf_mol_2_Cell_6',
 'conf_mol_2.sdf',
 'conf_mol_2_Cell_4',
 'conf_mol_1_Cell_8',
 'conf_mol_2.mop',
 'conf_mol_1_Cell_3',
 'conf_mol_1.sdf',
 'conf_mol_1_boltzmann.csv',
 'conf_mol_2_Cell_9',
 'conf_mol_2_Cell_2',
 'conf_mol_3.mop',
 'conf_mol_1_Cell_7',
 'conf_mol_1.out',
 'conf_mol_3_Cell_6',
 'Molecules used.png',
 'conf_mol_2_Cell_7',
 'conf_mol_2_Cell_1',
 'conf_mol_3_Cell_9',
 'conf_mol_3_Cell_4',
 'conf_mol_1.arc',
 'outparams.yml']
# Let's look to the content of the Boltzmann fle
pd.read_csv('conf_mol_1_boltzmann.csv')
Unnamed: 0 cell Class_E HeatOfFormation_kcal/mol Emin_Ei qi__Pi/Pmin__e^(Emin_Ei)/KbT Fraction_%__100*qi/q
0 0 8 NaN -159.98868 0.00000 1.000000 68.482786
1 1 3 NaN -158.92079 -1.06789 0.164904 11.293086
2 2 7 NaN -158.81017 -1.17851 0.136819 9.369741
3 3 9 NaN -158.56569 -1.42299 0.090561 6.201868
4 4 6 NaN -157.65133 -2.33735 0.019351 1.325232
5 5 1 NaN -157.45372 -2.53496 0.013863 0.949386
6 6 10 NaN -157.39704 -2.59164 0.012598 0.862772
7 7 5 NaN -157.37828 -2.61040 0.012206 0.835882
8 8 2 NaN -157.15556 -2.83312 0.008381 0.573971
9 9 4 NaN -156.15071 -3.83797 0.001537 0.105276
  • cell: conformer identifier

  • Class_E: Classic energy from the RDKit optimization. in this example is NaN because we did not perform this optimization.

  • HeatOfFormation_kcal: self-explanatory [kcal/mol].

  • Emin_Ei: Difference in energy between the lower and the i-th conformer in [kcal/mol].

  • qi__Pi/Pmin__e^(Emin_Ei)/KbT : Boltzmann factors. $ q_i = \frac{P_i}{Pmin} = e^{E_{min}-E{i}} $

  • Fraction_%__100*qi/q: Occupancy of each conformer: $ 100 \frac{q_i}{\sum{q_i}} $

In this example, with a cutoff of 2 kcal/mol is enough to have more than 95 % of the occupancy.

After you run the QM simulations. You can process the result using the function aleimi.processed.main. Only Psi4 output are currently accepted (in the future the other engines will be added as well a CLI for this step). However, you can generate your own scripts. You just need to:

  • Get the energies

  • Check if all the vibrational frequencies are correct (reals and positives)

  • Take the lower energy conformer for your specific study.

1.2. Post-processing#

After the QM calculation you can call the processed CLI to see if all the simulations ended well and to select the conformer with lower energy. For now this only works if Psi4 was used for the QM simulations.

! aleimi-processed -h
usage: aleimi-processed [-h] [--no_sub_dirs [NO_SUB_DIRS]]
                        [-e, --engine ENGINE] [--xyz_out [XYZ_OUT]]

optional arguments:
  -h, --help            show this help message and exit
  --no_sub_dirs [NO_SUB_DIRS]
                        Should be True if :meth:`aleimi.extractor.main` was used with ``mkdir = True``, by default True
  -e, --engine ENGINE   psi4, gaussian or orca. It depends on the engine defined on :meth:`aleimi.extractor.main` was used with ``engine`` keyword, by default 'psi4'
  --xyz_out [XYZ_OUT]   If True, it will write the xyz coordinates of the conformer with the lowest energy, by default False