{"id":15347392,"url":"https://github.com/vtlim/off_psi4","last_synced_at":"2026-01-06T18:05:20.710Z","repository":{"id":182914669,"uuid":"70878596","full_name":"vtlim/off_psi4","owner":"vtlim","description":"Quanformer is a Python-based pipeline for generating conformers, preparing quantum mechanical (QM) calculations, and processing QM results for a set of molecules and their conformers. *** This repo has a new location here:","archived":false,"fork":false,"pushed_at":"2019-01-09T19:34:17.000Z","size":3779,"stargazers_count":0,"open_issues_count":7,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-01T21:44:45.030Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/MobleyLab/quanformer","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vtlim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-10-14T05:58:02.000Z","updated_at":"2019-03-26T19:55:21.000Z","dependencies_parsed_at":"2023-07-22T04:10:02.689Z","dependency_job_id":null,"html_url":"https://github.com/vtlim/off_psi4","commit_stats":null,"previous_names":["vtlim/off_psi4"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vtlim%2Foff_psi4","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vtlim%2Foff_psi4/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vtlim%2Foff_psi4/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vtlim%2Foff_psi4/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vtlim","download_url":"https://codeload.github.com/vtlim/off_psi4/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245905445,"owners_count":20691782,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-01T11:33:07.482Z","updated_at":"2026-01-06T18:05:20.704Z","avatar_url":"https://github.com/vtlim.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Quanformer\nREADME last updated: Nov 28 2018  \n\nQuanformer is a Python-based pipeline for generating conformers, preparing quantum mechanical (QM) calculations, and processing QM results for a set of input molecules. \nThis pipeline is robust enough to use with hundreds of conformers per molecule and tens or hundreds of molecules.\nYou will need access to either Psi4 or Turbomole for running QM calculations.  \n\nFor each molecule, conformers are generated and optimized with the MM94S force field. \nThen input files for QM calculations are prepared for geometry optimizations, single point energy (SPE) calculations, or Hessian calculations.\nThe user can specify any QM method and basis set that is supported in the QM software package.\nAfter the calculations have finished, this pipeline will extract final energies and geometries as well as collect job-related details such as calculation time and number of optimization steps.\nAnalysis scripts are provided for comparing conformer energies from different QM methods, comparing calculation times from different methods, and generating nicely-formatted plots.\n\n*Example application*: \n * Input five molecules and generate conformations for each one.\n * Then run QM geometry optimizations using the MP2/def2-SV(P) level of theory as a relatively quick fine-tuning of the geometries.\n * Take those QM results and run a second geometry optimization stage using the more intensive B3LYP-D3MBJ/def2-TZVP method.\n * Consider questions such as, \"What is the spread of the conformer energies for molecule _x_?\", \"How does method _a_ compare to method _b_ for this molecule?\", etc.\n\nIn concept, this example would look like:   \n`initialize_confs.py` \u0026rarr; `confs_to_psi.py` \u0026rarr; `filter_confs.py` \u0026rarr; \\[QM jobs\\] \u0026rarr; `filter_confs.py` \u0026rarr; analysis\n\nIn practice, the `executor.py` code provides the interface for the various stages and components. \nThat being said, each component was written to be able to run independently of the others so variations of this pipeline can be conducted. \nInstructions are provided below for following this example workflow.\n\n\n## I. Python Dependencies\n\n* Anaconda or Miniconda Python\n* [OEChem Python Toolkit](https://docs.eyesopen.com/toolkits/python/quickstart-python/install.html)\n* [Psi4 QM software package](http://www.psicode.org/)\n   * [Conda install of Psi4](http://www.psicode.org/psi4manual/master/conda.html#detailed-installation-of-psifour)\n   * [Conda install of dftd3](http://www.psicode.org/psi4manual/master/dftd3.html)\n   * [(optional) Conda install of gcp](http://www.psicode.org/psi4manual/master/gcp.html)\n\n\n## II. Repository contents\n\nPipeline components and description:\n\n| Script               | Stage         | Brief description                                                          |\n| ---------------------|---------------|----------------------------------------------------------------------------|\n| `avgTimeEne.py`      | analysis      | analyze calculation stats and relative energies for a single batch of mols |\n| `confs_to_psi.py`    | setup         | generate Psi4 input files for each conformer/molecule                      |\n| `confs2turb.py`      | setup         | generate Turbomole input files for each conformer/molecule                 |\n| `opt_vs_spe.py`      | analysis      | compare how diff OPT energy is from pre-OPT single point energy            |\n| `executor.py`        | N/A           | main interface connecting \"setup\" and \"results\" scripts for Psi4           |\n| `filter_confs.py`    | setup/results | remover conformers of molecules that may be same structure                 |\n| `get_psi_results.py` | results       | get job results from Psi4                                                  |\n| `getTurbResults.py`  | results       | get job results from Turbomole                                             |\n| `match_minima.py`    | analysis      | match conformers from sets of different optimizations                      |\n| `match_plot.py`      | analysis      | additional plots that can be used from `match_minima.py` results            |\n| `plotTimes.py`       | analysis      | plot calculation time averaged over the conformers for each molecule       |\n| `proc_tags.py`       | results       | store QM energies \u0026 conformer details as data tags in SDF molecule files   |\n| `quan2modsem.py`     | analysis      | interface with modified Seminario Python code                              |\n| `initialize_confs.py`       | setup         | generate molecular structures and conformers for input SMILES string       |\n| `stitchSpe.py`       | analysis      | calculate relative conformer energies from sets of different SPEs          |\n\nThere are other scripts in this repository that are not integral to the pipeline. These are found in the `tools` directory. See the README file there.\n\n\n## III. Files that are generated throughout the pipeline\n\n### A. Directory setup for QM calculations\nThe input (SMILES or SDF) file must be in the main directory.  \nThe layout is `mainDirectory/moleculeName/conformerNumber/[qm_job_here]`.\n\n### B. Molecules files generated by Quanformer\n\nSDF files are numbered with the following code system. Let's say the pipeline starts with a file called `basename.smi`  \nand contains the list of SMILES strings.\n1. The first file generated will be `basename.sdf`. This contains all molecules and all conformers of each molecule.\n2. The next file will be `basename-100.sdf`, where `-100` means all molecules have been MM-optimized.\n3. Then comes `basename-200.sdf`, in which the MM-optimized molecules are filtered to remove any redundant structures (i.e., duplicate minima).\n4. After that is `basename-210.sdf`, which contains the QM-calculated molecules of the `-200` file.\n5. The QM molecules are filtered analogously to step 3 to yield `basename-220.sdf`.\n\nThis process can go through a second round of QM calculations. QM calculations can be either geometry optimizations or  \nsingle point energy calculations. If the `basename-200.sdf` is fed into both routes, then each route will have its own  \n`basename-210.sdf` file. Don't do this in the same directory obviously, else one file will be overwritten. The endmost  \nproduct will be `basename-222.sdf` though one could certainly stop before QM stage 2.\n\nWhy bother keeping the `-221` files? They can be used to compare relative energies of single point energy calculations,  \nor geometry optimizations, since (mol1,confA) will start from the same structure of the compared files. After filtering,  \nthe number of conformers may be reduced, so it can be hard to compare one to one.\n\nAn `-f` prefix means that the Omega-generated conformers were filtered based on their structures, but that these have not  \nbeen MM-optimized. For example, `basename-f020.sdf` means filtered from OpenEye Omega, no MM opt/filter, yes QM opt/filter, no QM stage 2.\n\nIn summary,\n\n * no suffix = original file with all omega conformers\n * `1xx` = MM opt but no filter\n * `2xx` = MM opt and filter\n * `x1x` = QM opt but no filter\n * `x2x` = QM opt and filter\n * `xx1` = either QM second opt or SPE and no filter\n * `xx2` = either QM second opt or SPE and filter\n\n\n## IV. Instructions\nThe instructions below describe how to take a set of molecules from their starting SMILES strings to:\n * Generate conformers\n * MM minimize those conformers using the MMFF94S force field\n * Filter out potentially redundant structures\n    * Output: `file-200.sdf`\n * Create Psi4 input files for MP2/def2-SV(P) geometry optimizations\n * Extract results from completed Psi4 jobs\n    * Output: `file-210.sdf`\n * Filter out potentially redundant structures\n    * Output: `file-220.sdf`\n * Create new Psi4 input files for B3LYP/def2-TZVP geometry optimizations\n * Extract results from completed Psi4 jobs\n    * Output: `file-221.sdf`\n * Filter out potentially redundant structures\n    * Output: `file-222.sdf`\n\n-----\n\n 1. Create input file with SMILES strings and names for each molecule. \n    See subsections below on \"Naming molecules in the input SMILES file\" and \"File name limitations\".\n\n 2. Generate conformers, perform quick MM optimization, and create Psi4 input files.\n    * `python executor.py -f file.smi --setup -m 'mp2' -b 'def2-sv(p)'`\n\n 3. Run Psi4 QM calculations.\n    * The `jobcount.sh` script in the tools directory can be helpful for counting number of total/remaining jobs.\n    * You can check the geometry for some optimization with the `xyzByStep.sh` script in the tools directory.  \n      E.g., `xyzByStep.sh 10 output.dat view.xyz`\n\n 4. Get Psi4 results.\n    * `python executor.py -f file-200.sdf --results`\n\n 5. In a **different directory** (e.g., subdirectory), set up Psi4 OPT2 calculations from last results.\n    * [for stage 2 OPT]  \n      `python executor.py -f file-220.sdf --setup -t 'opt' -m 'b3lyp-d3mbj' -b 'def2-tzvp'`\n    * [for stage 2 SPE]   \n      `python executor.py -f file-220.sdf --setup -t 'spe' -m 'b3lyp-d3mbj' -b 'def2-tzvp'`\n\n 6. Run Psi4 jobs. (See notes on step 3.)\n\n 7. Get Psi4 results from second-level calculations.\n    * [for stage 2 OPT]   \n      `python executor.py -f file-220.sdf --results -t 'opt'`\n    * [for stage 2 SPE]   \n      `python executor.py -f file-220.sdf --results -t 'spe'`\n\n 8. (opt.) Get wall clock times, num opt steps, relative energies. \n    * `python avgTimeEne.py --relene -f file.sdf -m 'b3lyp-d3mbj' -b 'def2-tzvp'` -- [TODO recheck]\n\n 9. Combine results from various job types to calculate model uncertainty.\n    * See file `analysis.md`\n\n\n### A. Quanformer assumptions\n* Input molecules have spin multiplicity equal to one (i.e., no unpaired electrons)\n* SCF algorithm is density-fitted (DF)\n    * There exists -JKFIT auxiliary set for your orbital basis/atom type\n\n### B. Limitations for file names\n\nBase names (e.g. `basename.smi`, `basename.sdf`) can contain underscores but *no dash (-) and no pound sign (#)*.\n  * Dashes should not be used in the base filename because this is a delimiter for the SDF numbering code (see above).\n  * Pound signs should not be used in the base filename because this is used to extract the file information such as file extension.\n  * Examples:\n    * Bad:  `basename-set1.smi`\n    * Bad:  `basename#set1.smi`\n    * Good: `basename_set1.smi`\n\n### C. Limitations on molecule names\n\nSmiles file should contain, in each line: `SMILES_STRING molecule_title` and be named in format of `basename.smi`.\n  * Molecule titles are required, as these are used to create subdirectories for the QM jobs. So don't have a space or strange characters in your molecule names.\n  * Molecule title should have *no dashes*, as Psi4 will raise an error.\n  * Molecule title should *NOT start with a number*, as Psi4 will raise error.\n  * Example:\n```\nCC(C(C(C)O)O)O AlkEthOH_c42\nCCCC AlkEthOH_c1008\nCCOC(C)(C)C(C)(C)O AlkEthOH_c1178\n```\n\n### D. File types supported by Quanformer\n\nThis pipeline is meant to be used with SDF files because it can store multiple molecules as well as data tags associated with each molecule.\nThat being said, it has been applied in a few scenarios with MOL2 files (one molecule and all its conformers).\nIf you try a non-SDF file, do check that the *molecule name* and the *total charge* are listed correctly in the Psi4 input files.\n\n### E. Preset parameters in Quanformer\n\nThis pipeline uses some preset parameters, which can be modified in the function calls of `executor.py` or in the parent code.\nDescriptions coming soon. [TODO]\n * `initialize_confs.py`: `resolve_clash=True`, for resolving steric clashes\n * `initialize_confs.py`: `do_opt=True`, for performing quick steepest descent optimization\n\n\n## V. Some terms and references\n\nPertaining to software packages:\n * [Psi4](http://www.psicode.org/)\n * [Turbomole](http://www.turbomole.com/)\n\nPertaining to files and formatting:\n  * SMILES - simplified molecular input line entry system, ([more info](http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html))\n  * SDF - structure data file, ([more info](http://link.fyicenter.com/out.php?ID=571), [example](http://biotech.fyicenter.com/resource/sdf_format.html))\n\nPertaining to QM method:\n  * `MP2` - second order Moller-Plesset perturbation theory (adds electron correlation effects upon Hartree-Fock)\n  * `B3LYP` - DFT hybrid functional, (Becke, three-parameter, Lee-Yang-Parr) exchange-correlation functional\n  * `PBE0` - DFT functional hybrid functional, (Perdew–Burke-Ernzerhof)\n  * `D3` - Grimme et al. dispersion correction method, ([ref](http://aip.scitation.org/doi/full/10.1063/1.3382344))\n  * `D3BJ` - D3 with Becke-Johnson damping, ([ref](http://onlinelibrary.wiley.com/doi/10.1002/jcc.21759/abstract))\n  * `D3MBJ` - Sherrill et al. modifications to D3BJ approach, ([ref](http://pubs.acs.org/doi/abs/10.1021/acs.jpclett.6b00780))\n\nPertaining to basis set:\n  * `def2` - 'default' basis sets with additional polarization fx compared to 'def-'\n  * `SV(P)` - double zeta valence with polarization on all non-hydrogen atoms\n  * `TZVP` - triple zeta valence with polarization on all atoms\n\n## VI. Contributors\n\n* Victoria Lim (UCI)\n* Chris Bayly (OpenEye)\n* Caitlin Bannan (UCI)\n* Jessica Maat (UCI)\n* David Mobley (UCI)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvtlim%2Foff_psi4","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvtlim%2Foff_psi4","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvtlim%2Foff_psi4/lists"}