{"id":17255226,"url":"https://github.com/unixjunkie/chemoinfo_recipes","last_synced_at":"2026-01-05T09:50:26.687Z","repository":{"id":144782288,"uuid":"76246458","full_name":"UnixJunkie/chemoinfo_recipes","owner":"UnixJunkie","description":"Command line recipes for the working chemoinformatician","archived":false,"fork":false,"pushed_at":"2022-04-06T01:57:48.000Z","size":34,"stargazers_count":7,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-31T09:31:32.231Z","etag":null,"topics":["chemaxon","chemoinformatics","howto","ligand","openbabel","rdkit","rmsd"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UnixJunkie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-12-12T10:31:35.000Z","updated_at":"2024-03-30T11:30:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"4b1caa45-8c14-4019-a36e-57dbdcda31a4","html_url":"https://github.com/UnixJunkie/chemoinfo_recipes","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UnixJunkie%2Fchemoinfo_recipes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UnixJunkie%2Fchemoinfo_recipes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UnixJunkie%2Fchemoinfo_recipes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UnixJunkie%2Fchemoinfo_recipes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UnixJunkie","download_url":"https://codeload.github.com/UnixJunkie/chemoinfo_recipes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245615010,"owners_count":20644376,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chemaxon","chemoinformatics","howto","ligand","openbabel","rdkit","rmsd"],"created_at":"2024-10-15T07:10:59.850Z","updated_at":"2026-01-05T09:50:26.651Z","avatar_url":"https://github.com/UnixJunkie.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Chemoinformatics recipes\nCommand line recipes for the working chemoinformatician\n\n# Unique InChI filtering of molecules (remove duplicates)\n\n    obabel INFILE -O OUTFILE --unique\n\n# Assign MMFF94 partial charges\n\n    obabel INFILE -O OUTFILE --partialcharge mmff94\n    \n# Assign MMFF94 partial charges and hydrogens (protonate) at given pH (7.4) usinb openbabel\n\n    obabel in.smi -O out.mol2 --partialcharge mmff94 -p 7.4\n    \n# Major tautomer at pH 7.4 usinb ChemAxon cxcalc\n# (-g: ignore errors; -H: pH; -f sdf: force output format to sdf, to preserve molecule names)\n# input file not being a .smiles might crash the tool!\n\ncxcalc -g majortautomer -H 7.4 -f sdf input.smiles \u003e output_taut74.sdf\n\n# Compute MACCS 166bits fingerprints and output them as strings\n# (will create a .csv file named after the input file)\n\n    mayachemtools/bin/MACCSKeysFingerprints.pl --size 166 [INFILE] --CompoundIDMode MolName\n\n# 3D conformer generation using Corina classic\n# (one low energy conformer per molecule)\n# the optional [-d wh] add/writes out hydrogens (makes them explicit)\n\n    corina [-d wh] \u003c INPUT.sdf \u003e OUTPUT.sdf\n\n# lowest energy conformer generation using cxcalc from Chemaxon\n\n    cxcalc conformers in.smi -m 1 \u003e out.sdf\n\n# lowest energy conformer generation using omega from OpenEye scientific\n\n    omega -in in.smi -out out.sdf -maxconfs 1\n\n# RMSD between two ligands (curr. will be superposed onto ref.)\n\n    fconv -rmsd current.mol2 --s=reference.mol2\n\n# compute Bemis-Murcko scaffolds of molecules\n\n    stripper --in molecules.smi --out scaffolds.txt\n\n# print a molecule in EPS format (for LateX manuscripts); obabel then inkscape\n# SMILES to EPS, MOL2 to EPS or SVG to EPS would work the same\n\n    obabel molecule.smi -O molecule.svg\n    inkscape molecule.svg -E molecule.eps --export-ignore-filters --export-ps-level=3\n\n# smi2eps in Bash (smi -\u003e svg -\u003e pdf -\u003e cropped-pdf -\u003e ps -\u003e eps)\n\n```\n# librsvg2-bin provides rsvg-convert\n# texlive-extra-utils provides pdfcrop\n# ghostscript provides pdf2ps\n# ps2eps provides ps2eps\nfunction svg2eps () {\n    tmp_pdf_out=`echo $1 | sed 's/\\.svg$/\\_tmp.pdf/g'`\n    pdf_out=`echo $1 | sed 's/\\.svg$/\\.pdf/g'`\n    ps_out=`echo $1 | sed 's/\\.svg$/\\.ps/g'`\n    eps_out=`echo $1 | sed 's/\\.svg$/\\.eps/g'`\n    svg=$1\n    rsvg-convert -f pdf $svg -o $tmp_pdf_out\n    pdfcrop $tmp_pdf_out $pdf_out\n    pdf2ps $pdf_out $ps_out\n    ps2eps \u003c $ps_out \u003e $eps_out\n}\n```\n\n```\n# openbabel provides obabel\nfunction smi2eps () {\n    smi=$1\n    svg_out=`echo $1 | sed 's/\\.smi$/\\.svg/g'`\n    obabel $smi -O $svg_out -xC -xd\n    svg2eps $svg_out\n}\n```\n\n# Install open babel from sources\n\n    wget https://github.com/openbabel/openbabel/archive/openbabel-2-4-1.tar.gz\n    tar xzf openbabel-2-4-1.tar.gz\n    cd openbabel-openbabel-2-4-1/\n    mkdir build\n    cd build\n    cat \u003c\u003cEOF \u003e build.sh\n    mkdir -p ~/usr\n    cmake -DPYTHON_BINDINGS=true -DCMAKE_INSTALL_PREFIX:PATH=$HOME/usr ../\n    EOF\n    chmod 755 build.sh\n    ./build.sh\n    make -j4\n    make install\n\n# Install rdkit on Mac OS X:\n\n    brew tap rdkit/rdkit\n    brew install rdkit --with-python3 --with-inchi\n \n If this does not work, try the conda way (but then usage will need to be in a conda environment):\n\n    wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh\n    sh Miniconda3-latest-MacOSX-x86_64.sh -p ~/usr/miniconda3\n    ~/usr/miniconda3/bin/conda install -q -y -c conda-forge rdkit\n    \nNow you should check that you can really use it from Python:\n\n    python3\n    import rdkit\n    from rdkit import Chem\n    m = Chem.MolFromSmiles('n1ccccc1')\n\n# Count molecules, works for various file formats\n\nStore this in a 'molcount' script, somewhere on your PATH.\n\n```\n#!/bin/bash\n\nfor f in \"$@\"; do\n    filename=`basename \"$f\"`\n    extension=\"${filename##*.}\"\n    case \"$extension\" in\n        mol2) egrep -c MOLECULE $f\n              ;;\n        plr) egrep -c '^END$' $f # position and contrib per atom to cLogP\n             ;;\n        pqr) egrep -c ^COMPND $f\n             ;;\n        sdf) grep -c '$$$$' $f\n             ;;\n        mol) grep -c '$$$$' $f\n             ;;\n        phar) grep -c '$$$$' $f # Pharao DB\n             ;;\n        smi) cat $f | wc -l\n             ;;\n        *) echo \"molcount: unsupported file format: .\"$f\n           ;;\n    esac\ndone\n```\n\n# Get molecules by name, for various file formats\n\nWorks even with a \"database\" file with millions of molecules.\n\nlbvs_consent_mol_get from https://github.com/UnixJunkie/consent\n\n```\nlbvs_consent_mol_get -i molecules.{sdf|mol2|smi} {-names \"mol1,mol2,...\"|-f names_file}\n```\n\n# Sayle hashing of a molecule\n\nSome kind of canonicalization of molecular representations, consisting in the pair:\n\nSayle_hash(m) = (Canonical_smile_forcing_only_single_bonds_and_noH(m), number_of_Hydrogens_on_non_carbons(m) - sum_of_formal_charges(m))\n\nm being the molecule to hash.\n\n# install deepchem on a Mac or Linux\n\nNo GPU support, but at least its an automatic and simple install procedure.\nDeepchem's version is fixed to a version that works for what I currently do.\n\n```\npip3 install joblib pandas sklearn tensorflow pillow simdna deepchem==2.1.1.dev353\n```\n\n# standardize molecules in parallel with pardi and standardiser\n\npip3 install chemo-standardizer\n\nopam install pardi\n```\n#!/bin/bash\n\nif [ $# -lt 1 ]; then\n    echo \"usage: \"$0\" input.smi output_std.smi\"\n    exit 1\nfi\n\nINPUT=$1\nOUTPUT=$2\n\npardi -i $INPUT -o $OUTPUT -c 400 -d l -ie '.smi' -oe '.smi' \\\n      -w 'standardiser -i %IN -o %OUT 2\u003e/dev/null'\n```\n\n# Links / Bibliography\n\n[1] http://www.mayachemtools.org/\n\n[2] http://openbabel.org/wiki/Main_Page\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funixjunkie%2Fchemoinfo_recipes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funixjunkie%2Fchemoinfo_recipes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funixjunkie%2Fchemoinfo_recipes/lists"}