Helpers

ehreact.helpers contains a variety of helper functions.

RDKit

Functions from ehreact.helpers.rdkit.py contain helper functions relying on RDKit.

ehreact.helpers.rdkit.canonicalize(mol)[source]

Outputs the canonical SMILES string of a molecule.

Parameters:

mol (rdkit.Chem.Mol) – RDKit molecule.

Returns:

canonical_smi – Canonical SMILES string.

Return type:

str

ehreact.helpers.rdkit.canonicalize_with_h(mol, reaction=False, old2new_mapno={})[source]

Outputs the canonical SMILES with hydrogens.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule.

  • reaction (bool, default False) – Whether the input SMILES stems from a reaction and therefore map numbers need to be included

  • old2new_mapno (dict, default {}) – Dictionary of custom map numbers to use. If not supplied, map numbers will be calculated from atom indices.

Returns:

  • canonical_smi (str) – Canonical SMILES string.

  • old2new_mapno (dict) – Dictionary of map numbers.

ehreact.helpers.rdkit.check_if_reaction(smi, expected)[source]

Function to check whether “>” occurs in a string, and raise a ValueError if the expected behavior is not met.

Parameters:
  • smi (str) – SMILES string (reaction SMILES or molecule SMILES)

  • expected (bool) – Whether the input string is expected to be a reaction.

ehreact.helpers.rdkit.do_charges_fit(mol, current_rule, match)[source]

Determines whether a current rule matches a molecule if formal charges are taken into account.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule object.

  • current_rule (rdkit.Chem.Mol) – RDKit molecule object of rule

  • match (tuple) – Indices of matching atoms in current_rule and mol.

Returns:

do_charges_fit – Whether formal charges are the same.

Return type:

bool

ehreact.helpers.rdkit.find_common(mols)[source]

Function to find the maximum common substructure of a list of RDKit molecules and check whether charges match.

Parameters:

mols (List[rdkit.Chem.Mol]) – List of RDKit mols

Returns:

new_mcs_mol – RDKit molecule of common substructure.

Return type:

rdkit.Chem.Mol

ehreact.helpers.rdkit.find_matching_atoms(train_mode, mol, rule, rule_small, tags_core)[source]

Find the substructure matching of a rule or lowest matching rule to a molecule and confirm the match contains the reaction center.

Parameters:
  • train_mode (Literal[“single_reactant”, “transition_state”]) – Mode in which diagram was constructed.

  • mol (rdkit.Chem.Mol) – RDKit molecule.

  • rule (rdkit.Chem.Mol) – RDKit molecule of substructure/reaction rule.

  • rule_small (rdkit.Chem.Mol) – RDKit molecule of substructure/reaction rule of lowest matching template

  • tags_core (List[str]) – A list of the atom map numbers in the reaction center.

Returns:

to_do_matches – List of valid matches.

Return type:

list

ehreact.helpers.rdkit.force_charge_fit(mol, current_rule, match)[source]

Forces the formal charges of a rule to match the formal charges of a molecule.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule object.

  • current_rule (rdkit.Chem.Mol) – RDKit molecule object of rule

  • match (tuple) – Indices of matching atoms in current_rule and mol.

Returns:

current_rule – RDKit molecule object of rule with updated formal charges.

Return type:

rdkit.Chem.Mol

ehreact.helpers.rdkit.make_fragment_indices(rule)[source]

Fragments a molecule into multiple molecules where it is disconnected. For examples, turns ‘CC.C’ into ‘CC’ and ‘C’.

Parameters:

rule (rdkit.Chem.Mol) – RDKit molecule.

Returns:

  • rule_fragment_indices (tuple) – Tuple of tuple of atom indices in each fragment.

  • rule_fragment_mols (tuple) – Tuple of RDKit molecule objects of each fragment.

ehreact.helpers.rdkit.make_mol(smi)[source]

Make RDKit molecule from a smiles string and add hydrogens.

Parameters:

smi (str) – SMILES string.

Returns:

mol – RDKit molecule.

Return type:

rdkit.Chem.Mol

ehreact.helpers.rdkit.make_mol_no_sanit_h(smi)[source]

Make RDKit molecule from a smiles string without sanitizing hydrogens (keep hydrogens as given).

Parameters:

smi (str) – SMILES string.

Returns:

mol – RDKit molecule.

Return type:

rdkit.Chem.Mol

ehreact.helpers.rdkit.match_and_mol_transition(mol, change_dict, rule, direction='regular')[source]

Converts an RDKit mol with the changes specified in change_dict, with bond numbers corresponding to rule not mol thus an additional conversion step is needed to identify the correct bonds.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule.

  • change_dict (dict) – Changes to be applied to bonds and atoms, indexing corresponding to rule.

  • rule (rdkit.Chem.Mol) – RDKit molecule of substructure/reaction rule.

  • direction (Literal[‘regular’, ‘reversed’], default ‘regular’) – Direction of change (‘regular’ or ‘reversed’).

  • Returns

  • mol2s (List[rdkit.Chem.Mol]) – List of transformed molecules.

ehreact.helpers.rdkit.match_includes_reaction_center(train_mode, match, atoms_core)[source]

Determindes whether a substructure match includes the full reaction center.

Parameters:
  • train_mode (Literal[“single_reactant”, “transition_state”]) – Mode in which diagram was constructed.

  • match (tuple) – Indices of substructure match.

  • atoms_core (List[int]) – Atom indices belonging to the reaction center.

Returns:

includes_rc – Boolean whether match includes the reaction center.

Return type:

bool

ehreact.helpers.rdkit.mol_to_rxn_smiles(initial_smiles, initial_mols, d, predict_mode, verbose)[source]

Creates a list of all possible reaction smiles from a given list of reactant(s).

Parameters:
  • initial_smiles (List[str]) – List of SMILES strings for reactant(s).

  • initial_mols (List[rdkit.Chem.Mol]) – RDKit molecules of reactant(s) list.

  • d (ehreact.diagram.diagram.Diagram) – Hasse Diagram.

  • predict_mode (Literal[“single_reactant”, “multi_reactant”, “transition_state”]) – Mode of prediction.

  • verbose (bool) – Whether to print additional information.

Returns:

  • current_smiles (List[str]) – List of reaction SMILES.

  • belongs_to (List[int]) – List of indices which item in current_smiles belongs to which initial_smiles.

  • combination (List[str]) – List of combination of reactants.

ehreact.helpers.rdkit.mol_transition(mol, change_dict)[source]

Converts an RDKit mol with the changes specified in change_dict.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule.

  • change_dict (dict) – A dictionary specifying changes to apply to bonds and atoms.

Returns:

mol2 – Transformed RDKit molecule.

Return type:

rdkit.Chem.Mol

ehreact.helpers.rdkit.mols_are_equal(mol1, mol2)[source]

Computes if two molecules are equal.

Parameters:
  • mol1 (rdkit.Chem.Mol) – RDKit molecule.

  • mol2 (rdkit.Chem.Mol) – RDKit molecule.

Returns:

Boolean whether molecules are equal.

Return type:

bool

ehreact.helpers.rdkit.moltosmiles_transition(mol, change_dict)[source]

Converts an RDKit mol with the changes specified in change_dict before converting to smiles.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule.

  • change_dict (dict) – A dictionary specifying changes to apply to bonds and atoms for reactants and products.

Returns:

Converted SMILES.

Return type:

str

ehreact.helpers.rdkit.morgan_bit_fp_from_mol(mol, chirality=False)[source]

Computes a Morgan Bit Fingerprint for a molecule.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule.

  • chirality (bool) – Whether to use stereoinformation in fingerprint.

Returns:

fp – Morgan fingerprint.

Return type:

rdkit.DataStructs.cDataStructs.ExplicitBitVect

ehreact.helpers.rdkit.preprocess(smi, delete_aam=True, reaction=False, old2new_mapno={})[source]

Preprocess a SMILES string.

Parameters:
  • smi (str) – SMILES string.

  • delete_aam (bool, default True) – Whether to delete atom mappings from the string.

  • reaction (bool, default False) – Whether the input SMILES stems from a reaction and molecules need to be created without sanitizing hydrogens.

  • old2new_mapno (dict, default {}) – Dictionary of custom map numbers to use.

Returns:

  • canonical_smi_no_stereo (str) – Canonical SMILES string without stereoinformation.

  • canonical_smi (str) – Canonical SMILES string with stereoinformation (if provided).

  • mol_no_stereo (rdkit.Chem.Mol) – RDKit molecule without stereoinformation.

  • mol (rdkit.Chem.Mol) – RDKit molecule with stereoinformation.

  • old2new_mapno (dict) – Dictionary of map numbers.

ehreact.helpers.rdkit.preprocess_rxn(rxn_smi)[source]

Preprocess a reaction SMILES string.

Parameters:

rxn_smi (str) – Reaction SMILES string.

Returns:

  • canonical_smi_no_stereo (str) – Canonical SMILES string without stereoinformation.

  • canonical_smi (str) – Canonical SMILES string with stereoinformation (if provided).

  • mol_no_stereo (rdkit.Chem.Mol) – RDKit molecule without stereoinformation.

  • mol (rdkit.Chem.Mol) – RDKit molecule with stereoinformation.

ehreact.helpers.rdkit.preprocess_seeds(seed_list, smiles, smiles_dict)[source]

Preprocesses a list of seeds and matches them to a list of SMILES strings. If given an empty list of seeds, a seed with the maximum common substructure of all molecules is created.

Parameters:
  • seed_list (List[str]) – List of seeds.

  • smiles (List[str]) – List of SMILES strings

  • smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.

Returns:

  • seeds (List[str]) – List of seeds.

  • rule_dict (dict) – A dictionary of all minimal templates of all seeds.

  • num_smiles_seed (List[int]) – List of how many SMILES strings fit each seed.

ehreact.helpers.rdkit.read_in_reactions(rxn_smiles)[source]

Computes canonical SMILES strings and RDKit molecules from a list of reaction SMILES strings.

Parameters:

rxn_smiles (List[str]) – List of reaction SMILES strings

Returns:

  • canonical_smiles (List[str]) – List of canonical SMILES strings

  • smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.

  • tags_core (dict) – Dictionary of atom map numbers of reaction center for each canonical SMILES.

ehreact.helpers.rdkit.read_in_reactions_unique(rxn_smiles)[source]

Computes canonical SMILES strings and RDKit molecules from a list of reaction SMILES strings and only keeps unique entries.

Parameters:

rxn_smiles (List[str]) – List of reaction SMILES strings

Returns:

  • seeds (List[str]) – List of SMILES seeds (of all minimal templates).

  • rule_dict (dict) – A dictionary of all minimal templates of all seeds.

  • canonical_smiles (List[str]) – List of canonical SMILES strings

  • smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.

  • skipped (List[str]) – List of skipped (because duplicate) SMILES strings.

  • tags_core (dict) – Dictionary of atom map numbers of reaction center for each canonical SMILES.

ehreact.helpers.rdkit.read_in_smiles(smiles)[source]

Computes canonical SMILES strings and RDKit molecules from a list of SMILES strings.

Parameters:

smiles (List[str]) – List of SMILES strings

Returns:

  • canonical_smiles (List[str]) – List of canonical SMILES strings

  • smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.

ehreact.helpers.rdkit.read_in_smiles_unique(smiles)[source]

Computes canonical SMILES strings and RDKit molecules from a list of SMILES strings and only keeps unique entries.

Parameters:

smiles (List[str]) – List of SMILES strings

Returns:

  • canonical_smiles (List[str]) – Sorted list of canonical SMILES strings

  • smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.

  • skipped (List[str]) – List of skipped (because duplicate) SMILES strings.

ehreact.helpers.rdkit.tanimoto_from_fp(fp1, fp2)[source]

Computes the Tanimoto similarity between fingerprints.

Parameters:
  • fp1 (rdkit.DataStructs.cDataStructs.ExplicitBitVect) – Molecular fingerprint.

  • fp2 (rdkit.DataStructs.cDataStructs.ExplicitBitVect) – Molecular fingerprint.

Returns:

tanimoto – Tanimoto similarity

Return type:

float

Utils

Functions from ehreact.helpers.utils.py contain other helper functions (not relying on RDKit).

ehreact.helpers.utils.findsubsets(S, m)[source]

Find all subsets of S containing m elements.

Parameters:
  • S (list) – List of objects.

  • m (int) – Number of elements per subset.

Returns:

subset_list – List of subsets.

Return type:

list

Transition state

Functions from ehreact.helpers.transition_state.py contain helper functions to construct imaginary transition states from reaction SMILES.

transition_state.py Some of the functions in this file originate from RDChiral (https://github.com/connorcoley/rdchiral).

ehreact.helpers.transition_state.atoms_are_different(atom1, atom2)[source]

Compares two RDKit atoms based on basic properties.

Parameters:
  • atom1 (rdkit.Chem.Atom) – First RDKit atom.

  • atom2 (rdkit.Chem.Atom) – Second RDKit atom.

Returns:

Boolean whether the atoms have the same properties.

Return type:

bool

ehreact.helpers.transition_state.bond_order_increases(bond1, bond2)[source]

Determines whether bond order gets larger from bond1 to bond2.

Parameters:
  • bond1 (rdkit.Chem.Bond) – RDKit bond object.

  • bond2 (rdkit.Chem.Bond) – RDKit bond object.

Returns:

Whether bond order increases from bond1 to bond2.

Return type:

bool

ehreact.helpers.transition_state.bond_to_label(bond)[source]

This function takes an RDKit bond and creates a label describing the most important attributes.

Parameters:

bond (rdkit.Chem.Bond) – RDKit bond.

Returns:

label – Label of bond.

Return type:

str

ehreact.helpers.transition_state.changed_bonds(included_bonds_reac, included_bonds_prod)[source]

Creates a dictionary of changed bonds from a list of included bonds in the reactants and products.

Parameters:
  • included_bonds_reac (list) – List of included bonds in the reactants.

  • included_bonds_prod (list) – List of included bonds in the products.

Returns:

change_dict_bonds – Dictionary of changed bonds.

Return type:

dict

ehreact.helpers.transition_state.changed_labels(reac_fragments, prod_fragments, reac_fragment_atoms, prod_fragment_atoms, reac_mols, prod_mols)[source]

Given the fragments of reactants and products (strings), the mapping of index to atom-mapping number (dictionaries), and the whole molecules (RDKit mols), this function computes the changed labels, this is the changed formal charges (e.g. R-O-H to R-O- (change of charge on the oxygen).

Parameters:
  • reac_fragments (str) – Fragment of reactant(s).

  • prod_fragments (str) – Fragment of product(s).

  • reac_fragment_atoms (dict) – Dictionary of index to atom-mapping numbers in reactant fragment.

  • prod_fragment_atoms (dict) – Dictionary of index to atom-mapping numbers in product fragment.

  • reac_mols (rdkit.Chem.Mol) – RDKit reactant(s) molecule.

  • reac_prod (rdkit.Chem.Mol) – RDKit product(s) molecule.

  • Returns

  • change_dict_atoms (list) – List of changed labels (=changed formal charges).

ehreact.helpers.transition_state.compute_aam_with_h(rxn_smiles)[source]

Helper function to produce an atom mapping for a reaction smiles via the ReactionRecoder tool (RDT) by Syed Asad Rahman. The tool maps hydrogens that undergo changes, and this function adds the necessary atom-mapping to all other hydrogens.

Parameters:

rxn_smiles (str) – Reaction SMILES.

Returns:

rxn_smiles – atom-mapped reaction SMILES with hydrogens.

Return type:

str

ehreact.helpers.transition_state.compute_aam_without_h(rxn_smiles)[source]

This helper function produces an atom mapping for a reaction smiles via the ReactionRecoder tool (RDT) by Syed Asad Rahman.

Parameters:

rxn_smiles (str) – Reaction SMILES.

Returns:

rxn_smiles – atom-mapped reaction SMILES without hydrogens.

Return type:

str

ehreact.helpers.transition_state.get_changed_atoms(reacs, prods)[source]

Looks at mapped atoms in a reaction and determines which ones changed.

Parameters:
  • reacs (rdkit.Chem.Mol) – RDKit molecule of reactant(s).

  • prods (rdkit.Chem.Mol) – RDKit molecule of product(s).

Returns:

  • changed_atoms (List[rdkit.Chem.Atom]) – List of changed atoms

  • changed_atom_tags (List[str]) – List of tag numbers of changed atoms

  • err (int) – Integer indicating an error if not equal 0.

ehreact.helpers.transition_state.get_fragments_for_changed_atoms(mol, changed_atom_tags)[source]

Given an RDKit mols and a list of changed atom tags, this function computes the SMILES string of molecular fragments using MolFragmentToSmiles for all changed fragments.

Parameters:
  • mol (rdkit.Chem.Mol) – RDKit molecule

  • changed_atom_tags (List[str]) – List of changed atom tags.

Returns:

  • this_fragment (str) – SMILES string of molecular fragment of changed atoms

  • fragment_atoms (dict) – Dictionary of atoms in fragment.

ehreact.helpers.transition_state.get_strict_smarts_for_atom(atom)[source]

For an RDkit atom object, generate a simple SMARTS pattern.

Parameters:
  • atom (rdkit.Chem.Atom) – RDKit atom.

  • Results

  • ——-

  • symbol (str) – SMARTS symbol of the atom.

ehreact.helpers.transition_state.get_tagged_atoms_from_mol(mol)[source]

Takes an RDKit molecule and returns list of tagged atoms and their corresponding numbers.

Parameters:

mol (rdkit.Chem.Mol) – RDKit molecule.

Returns:

  • atoms (List[rdkit.Chem.Atom]) – List of tagged atoms

  • atom_tags (List[str]) – List of atom-mapping numbers

ehreact.helpers.transition_state.include_bonds(mols, changed_atom_tags, fragment_atoms)[source]

Creates a list of bonds and their indices that have changed atoms attached and should thus be included in the transition state.

Parameters:
  • mols (List[rdkit.Chem.Mol]) – RDKit molecule(s).

  • changed_atom_tags (List[int]) – Indices of changed atoms.

  • fragment_atoms (dict) – Dictionary of index to atom-mapping numbers in fragment.

Returns:

  • included_bonds (list) – List of included bonds.

  • included_bonds_idx (List[int]) – List of included bond indices

ehreact.helpers.transition_state.make_mols(smi, stereoinformation)[source]

Takes a smiles string as input and creates an RDKit mol object. The molecule cannot be sanitized, since we would else get rid of the explicit hydrogens. Thus, instead we check if the molecule could be sanitized, and reorder the atoms corresponding to the sanitized molecule. The reordering of atoms and molecules ensures that the same reaction with different atom ordering gives the same transition state.

Parameters:
  • smi (str) – SMILES string.

  • stereoinformation (bool) – Whether to use stereoinformation.

Returns:

mol_ordered – RDKit molecule object.

Return type:

rdkit.Chem.Mol

ehreact.helpers.transition_state.make_ts(reac_mols, included_bonds_idx, change_dict_bonds, change_dict_atoms, reac_fragment_atoms, ts_seed)[source]

Makes an RDKit mol of the transition state, either the whole transistion state (with all atoms and bonds included, or only the seed (if ts_seed==True), which includes only the atoms undergoing changes and the corresponding bonds. Returns the transition state (whole or seed), and dictionaries of how to change bond types and charges to revert back to either the products or reactants.

Parameters:
  • reac_mols (rdkit.Chem.mol) – RDKit molecule of reactant(s).

  • included_bonds_idx (List[int]) – List of included bond indices in transition state.

  • change_dict_bonds (dict) – Dictionary of bond changes in reaction.

  • change_dict_atoms (dict) – Dictionary of formal charge changes in reaction.

  • reac_fragment_atoms (dict) – Dictionary of atom indices in reactant(s) fragment.

  • ts_seed (bool) – Boolean whether only to save the full transition state or only the reaction center.

Returns:

  • change_ts_to_reac (dict) – Dictionary of bond and atom changes from the ts to the reactants

  • change_ts_to_prod (dict) – Dictionary of bond and atom changes from the ts to the products,

  • ts (rdkit.Chem.Mol) – RDKit molecule of the transition state

ehreact.helpers.transition_state.mapping_list_dict(mol)[source]

Computes a list of all map numbers and an atom-mapping indices dictionary.

Parameters:

mol (rdkit.Chem.Mol) – RDKit molecule.

Returns:

  • map_numbers (List[int]) – Sorted list of atom map numbers

  • mapping (dict) – Dictionary of mapping

ehreact.helpers.transition_state.process_an_example(rxn_smiles)[source]

Processes one rxn_smiles string and returns the transition state (whole or seed depending on ts_seed) RDKit mol object, as well as dictionaries of how to change bond types and charges to revert the transition state back to either the products or reactants.

Parameters:

rxn_smiles (str) – Reaction SMILES.

Returns:

  • change_ts_to_reac (dict) – Dictionary of bond and atom changes from the ts to the reactants

  • change_ts_to_prod (dict) – Dictionary of bond and atom changes from the ts to the products

  • ts (rdkit.Chem.Mol) – RDKit molecule of the whole imaginary transition state

  • ts_seed (rdkit.Chem.Mol) – RDKit molecule of the reaction center