Helpers
ehreact.helpers contains a variety of helper functions.
RDKit
Functions from ehreact.helpers.rdkit.py contain helper functions relying on RDKit.
- ehreact.helpers.rdkit.canonicalize(mol)[source]
Outputs the canonical SMILES string of a molecule.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule.
- Returns:
canonical_smi – Canonical SMILES string.
- Return type:
str
- ehreact.helpers.rdkit.canonicalize_with_h(mol, reaction=False, old2new_mapno={})[source]
Outputs the canonical SMILES with hydrogens.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule.
reaction (bool, default False) – Whether the input SMILES stems from a reaction and therefore map numbers need to be included
old2new_mapno (dict, default {}) – Dictionary of custom map numbers to use. If not supplied, map numbers will be calculated from atom indices.
- Returns:
canonical_smi (str) – Canonical SMILES string.
old2new_mapno (dict) – Dictionary of map numbers.
- ehreact.helpers.rdkit.check_if_reaction(smi, expected)[source]
Function to check whether “>” occurs in a string, and raise a ValueError if the expected behavior is not met.
- Parameters:
smi (str) – SMILES string (reaction SMILES or molecule SMILES)
expected (bool) – Whether the input string is expected to be a reaction.
- ehreact.helpers.rdkit.do_charges_fit(mol, current_rule, match)[source]
Determines whether a current rule matches a molecule if formal charges are taken into account.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule object.
current_rule (rdkit.Chem.Mol) – RDKit molecule object of rule
match (tuple) – Indices of matching atoms in current_rule and mol.
- Returns:
do_charges_fit – Whether formal charges are the same.
- Return type:
bool
- ehreact.helpers.rdkit.find_common(mols)[source]
Function to find the maximum common substructure of a list of RDKit molecules and check whether charges match.
- Parameters:
mols (List[rdkit.Chem.Mol]) – List of RDKit mols
- Returns:
new_mcs_mol – RDKit molecule of common substructure.
- Return type:
rdkit.Chem.Mol
- ehreact.helpers.rdkit.find_matching_atoms(train_mode, mol, rule, rule_small, tags_core)[source]
Find the substructure matching of a rule or lowest matching rule to a molecule and confirm the match contains the reaction center.
- Parameters:
train_mode (Literal[“single_reactant”, “transition_state”]) – Mode in which diagram was constructed.
mol (rdkit.Chem.Mol) – RDKit molecule.
rule (rdkit.Chem.Mol) – RDKit molecule of substructure/reaction rule.
rule_small (rdkit.Chem.Mol) – RDKit molecule of substructure/reaction rule of lowest matching template
tags_core (List[str]) – A list of the atom map numbers in the reaction center.
- Returns:
to_do_matches – List of valid matches.
- Return type:
list
- ehreact.helpers.rdkit.force_charge_fit(mol, current_rule, match)[source]
Forces the formal charges of a rule to match the formal charges of a molecule.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule object.
current_rule (rdkit.Chem.Mol) – RDKit molecule object of rule
match (tuple) – Indices of matching atoms in current_rule and mol.
- Returns:
current_rule – RDKit molecule object of rule with updated formal charges.
- Return type:
rdkit.Chem.Mol
- ehreact.helpers.rdkit.make_fragment_indices(rule)[source]
Fragments a molecule into multiple molecules where it is disconnected. For examples, turns ‘CC.C’ into ‘CC’ and ‘C’.
- Parameters:
rule (rdkit.Chem.Mol) – RDKit molecule.
- Returns:
rule_fragment_indices (tuple) – Tuple of tuple of atom indices in each fragment.
rule_fragment_mols (tuple) – Tuple of RDKit molecule objects of each fragment.
- ehreact.helpers.rdkit.make_mol(smi)[source]
Make RDKit molecule from a smiles string and add hydrogens.
- Parameters:
smi (str) – SMILES string.
- Returns:
mol – RDKit molecule.
- Return type:
rdkit.Chem.Mol
- ehreact.helpers.rdkit.make_mol_no_sanit_h(smi)[source]
Make RDKit molecule from a smiles string without sanitizing hydrogens (keep hydrogens as given).
- Parameters:
smi (str) – SMILES string.
- Returns:
mol – RDKit molecule.
- Return type:
rdkit.Chem.Mol
- ehreact.helpers.rdkit.match_and_mol_transition(mol, change_dict, rule, direction='regular')[source]
Converts an RDKit mol with the changes specified in change_dict, with bond numbers corresponding to rule not mol thus an additional conversion step is needed to identify the correct bonds.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule.
change_dict (dict) – Changes to be applied to bonds and atoms, indexing corresponding to rule.
rule (rdkit.Chem.Mol) – RDKit molecule of substructure/reaction rule.
direction (Literal[‘regular’, ‘reversed’], default ‘regular’) – Direction of change (‘regular’ or ‘reversed’).
Returns
mol2s (List[rdkit.Chem.Mol]) – List of transformed molecules.
- ehreact.helpers.rdkit.match_includes_reaction_center(train_mode, match, atoms_core)[source]
Determindes whether a substructure match includes the full reaction center.
- Parameters:
train_mode (Literal[“single_reactant”, “transition_state”]) – Mode in which diagram was constructed.
match (tuple) – Indices of substructure match.
atoms_core (List[int]) – Atom indices belonging to the reaction center.
- Returns:
includes_rc – Boolean whether match includes the reaction center.
- Return type:
bool
- ehreact.helpers.rdkit.mol_to_rxn_smiles(initial_smiles, initial_mols, d, predict_mode, verbose)[source]
Creates a list of all possible reaction smiles from a given list of reactant(s).
- Parameters:
initial_smiles (List[str]) – List of SMILES strings for reactant(s).
initial_mols (List[rdkit.Chem.Mol]) – RDKit molecules of reactant(s) list.
d (ehreact.diagram.diagram.Diagram) – Hasse Diagram.
predict_mode (Literal[“single_reactant”, “multi_reactant”, “transition_state”]) – Mode of prediction.
verbose (bool) – Whether to print additional information.
- Returns:
current_smiles (List[str]) – List of reaction SMILES.
belongs_to (List[int]) – List of indices which item in current_smiles belongs to which initial_smiles.
combination (List[str]) – List of combination of reactants.
- ehreact.helpers.rdkit.mol_transition(mol, change_dict)[source]
Converts an RDKit mol with the changes specified in change_dict.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule.
change_dict (dict) – A dictionary specifying changes to apply to bonds and atoms.
- Returns:
mol2 – Transformed RDKit molecule.
- Return type:
rdkit.Chem.Mol
- ehreact.helpers.rdkit.mols_are_equal(mol1, mol2)[source]
Computes if two molecules are equal.
- Parameters:
mol1 (rdkit.Chem.Mol) – RDKit molecule.
mol2 (rdkit.Chem.Mol) – RDKit molecule.
- Returns:
Boolean whether molecules are equal.
- Return type:
bool
- ehreact.helpers.rdkit.moltosmiles_transition(mol, change_dict)[source]
Converts an RDKit mol with the changes specified in change_dict before converting to smiles.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule.
change_dict (dict) – A dictionary specifying changes to apply to bonds and atoms for reactants and products.
- Returns:
Converted SMILES.
- Return type:
str
- ehreact.helpers.rdkit.morgan_bit_fp_from_mol(mol, chirality=False)[source]
Computes a Morgan Bit Fingerprint for a molecule.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule.
chirality (bool) – Whether to use stereoinformation in fingerprint.
- Returns:
fp – Morgan fingerprint.
- Return type:
rdkit.DataStructs.cDataStructs.ExplicitBitVect
- ehreact.helpers.rdkit.preprocess(smi, delete_aam=True, reaction=False, old2new_mapno={})[source]
Preprocess a SMILES string.
- Parameters:
smi (str) – SMILES string.
delete_aam (bool, default True) – Whether to delete atom mappings from the string.
reaction (bool, default False) – Whether the input SMILES stems from a reaction and molecules need to be created without sanitizing hydrogens.
old2new_mapno (dict, default {}) – Dictionary of custom map numbers to use.
- Returns:
canonical_smi_no_stereo (str) – Canonical SMILES string without stereoinformation.
canonical_smi (str) – Canonical SMILES string with stereoinformation (if provided).
mol_no_stereo (rdkit.Chem.Mol) – RDKit molecule without stereoinformation.
mol (rdkit.Chem.Mol) – RDKit molecule with stereoinformation.
old2new_mapno (dict) – Dictionary of map numbers.
- ehreact.helpers.rdkit.preprocess_rxn(rxn_smi)[source]
Preprocess a reaction SMILES string.
- Parameters:
rxn_smi (str) – Reaction SMILES string.
- Returns:
canonical_smi_no_stereo (str) – Canonical SMILES string without stereoinformation.
canonical_smi (str) – Canonical SMILES string with stereoinformation (if provided).
mol_no_stereo (rdkit.Chem.Mol) – RDKit molecule without stereoinformation.
mol (rdkit.Chem.Mol) – RDKit molecule with stereoinformation.
- ehreact.helpers.rdkit.preprocess_seeds(seed_list, smiles, smiles_dict)[source]
Preprocesses a list of seeds and matches them to a list of SMILES strings. If given an empty list of seeds, a seed with the maximum common substructure of all molecules is created.
- Parameters:
seed_list (List[str]) – List of seeds.
smiles (List[str]) – List of SMILES strings
smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.
- Returns:
seeds (List[str]) – List of seeds.
rule_dict (dict) – A dictionary of all minimal templates of all seeds.
num_smiles_seed (List[int]) – List of how many SMILES strings fit each seed.
- ehreact.helpers.rdkit.read_in_reactions(rxn_smiles)[source]
Computes canonical SMILES strings and RDKit molecules from a list of reaction SMILES strings.
- Parameters:
rxn_smiles (List[str]) – List of reaction SMILES strings
- Returns:
canonical_smiles (List[str]) – List of canonical SMILES strings
smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.
tags_core (dict) – Dictionary of atom map numbers of reaction center for each canonical SMILES.
- ehreact.helpers.rdkit.read_in_reactions_unique(rxn_smiles)[source]
Computes canonical SMILES strings and RDKit molecules from a list of reaction SMILES strings and only keeps unique entries.
- Parameters:
rxn_smiles (List[str]) – List of reaction SMILES strings
- Returns:
seeds (List[str]) – List of SMILES seeds (of all minimal templates).
rule_dict (dict) – A dictionary of all minimal templates of all seeds.
canonical_smiles (List[str]) – List of canonical SMILES strings
smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.
skipped (List[str]) – List of skipped (because duplicate) SMILES strings.
tags_core (dict) – Dictionary of atom map numbers of reaction center for each canonical SMILES.
- ehreact.helpers.rdkit.read_in_smiles(smiles)[source]
Computes canonical SMILES strings and RDKit molecules from a list of SMILES strings.
- Parameters:
smiles (List[str]) – List of SMILES strings
- Returns:
canonical_smiles (List[str]) – List of canonical SMILES strings
smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.
- ehreact.helpers.rdkit.read_in_smiles_unique(smiles)[source]
Computes canonical SMILES strings and RDKit molecules from a list of SMILES strings and only keeps unique entries.
- Parameters:
smiles (List[str]) – List of SMILES strings
- Returns:
canonical_smiles (List[str]) – Sorted list of canonical SMILES strings
smiles_dict (dict) – Dictionary of canonical SMILES strings and their respective SMILES with stereoinformation and RDKit molecules.
skipped (List[str]) – List of skipped (because duplicate) SMILES strings.
- ehreact.helpers.rdkit.tanimoto_from_fp(fp1, fp2)[source]
Computes the Tanimoto similarity between fingerprints.
- Parameters:
fp1 (rdkit.DataStructs.cDataStructs.ExplicitBitVect) – Molecular fingerprint.
fp2 (rdkit.DataStructs.cDataStructs.ExplicitBitVect) – Molecular fingerprint.
- Returns:
tanimoto – Tanimoto similarity
- Return type:
float
Utils
Functions from ehreact.helpers.utils.py contain other helper functions (not relying on RDKit).
Transition state
Functions from ehreact.helpers.transition_state.py contain helper functions to construct imaginary transition states from reaction SMILES.
transition_state.py Some of the functions in this file originate from RDChiral (https://github.com/connorcoley/rdchiral).
- ehreact.helpers.transition_state.atoms_are_different(atom1, atom2)[source]
Compares two RDKit atoms based on basic properties.
- Parameters:
atom1 (rdkit.Chem.Atom) – First RDKit atom.
atom2 (rdkit.Chem.Atom) – Second RDKit atom.
- Returns:
Boolean whether the atoms have the same properties.
- Return type:
bool
- ehreact.helpers.transition_state.bond_order_increases(bond1, bond2)[source]
Determines whether bond order gets larger from bond1 to bond2.
- Parameters:
bond1 (rdkit.Chem.Bond) – RDKit bond object.
bond2 (rdkit.Chem.Bond) – RDKit bond object.
- Returns:
Whether bond order increases from bond1 to bond2.
- Return type:
bool
- ehreact.helpers.transition_state.bond_to_label(bond)[source]
This function takes an RDKit bond and creates a label describing the most important attributes.
- Parameters:
bond (rdkit.Chem.Bond) – RDKit bond.
- Returns:
label – Label of bond.
- Return type:
str
- ehreact.helpers.transition_state.changed_bonds(included_bonds_reac, included_bonds_prod)[source]
Creates a dictionary of changed bonds from a list of included bonds in the reactants and products.
- Parameters:
included_bonds_reac (list) – List of included bonds in the reactants.
included_bonds_prod (list) – List of included bonds in the products.
- Returns:
change_dict_bonds – Dictionary of changed bonds.
- Return type:
dict
- ehreact.helpers.transition_state.changed_labels(reac_fragments, prod_fragments, reac_fragment_atoms, prod_fragment_atoms, reac_mols, prod_mols)[source]
Given the fragments of reactants and products (strings), the mapping of index to atom-mapping number (dictionaries), and the whole molecules (RDKit mols), this function computes the changed labels, this is the changed formal charges (e.g. R-O-H to R-O- (change of charge on the oxygen).
- Parameters:
reac_fragments (str) – Fragment of reactant(s).
prod_fragments (str) – Fragment of product(s).
reac_fragment_atoms (dict) – Dictionary of index to atom-mapping numbers in reactant fragment.
prod_fragment_atoms (dict) – Dictionary of index to atom-mapping numbers in product fragment.
reac_mols (rdkit.Chem.Mol) – RDKit reactant(s) molecule.
reac_prod (rdkit.Chem.Mol) – RDKit product(s) molecule.
Returns
change_dict_atoms (list) – List of changed labels (=changed formal charges).
- ehreact.helpers.transition_state.compute_aam_with_h(rxn_smiles)[source]
Helper function to produce an atom mapping for a reaction smiles via the ReactionRecoder tool (RDT) by Syed Asad Rahman. The tool maps hydrogens that undergo changes, and this function adds the necessary atom-mapping to all other hydrogens.
- Parameters:
rxn_smiles (str) – Reaction SMILES.
- Returns:
rxn_smiles – atom-mapped reaction SMILES with hydrogens.
- Return type:
str
- ehreact.helpers.transition_state.compute_aam_without_h(rxn_smiles)[source]
This helper function produces an atom mapping for a reaction smiles via the ReactionRecoder tool (RDT) by Syed Asad Rahman.
- Parameters:
rxn_smiles (str) – Reaction SMILES.
- Returns:
rxn_smiles – atom-mapped reaction SMILES without hydrogens.
- Return type:
str
- ehreact.helpers.transition_state.get_changed_atoms(reacs, prods)[source]
Looks at mapped atoms in a reaction and determines which ones changed.
- Parameters:
reacs (rdkit.Chem.Mol) – RDKit molecule of reactant(s).
prods (rdkit.Chem.Mol) – RDKit molecule of product(s).
- Returns:
changed_atoms (List[rdkit.Chem.Atom]) – List of changed atoms
changed_atom_tags (List[str]) – List of tag numbers of changed atoms
err (int) – Integer indicating an error if not equal 0.
- ehreact.helpers.transition_state.get_fragments_for_changed_atoms(mol, changed_atom_tags)[source]
Given an RDKit mols and a list of changed atom tags, this function computes the SMILES string of molecular fragments using MolFragmentToSmiles for all changed fragments.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule
changed_atom_tags (List[str]) – List of changed atom tags.
- Returns:
this_fragment (str) – SMILES string of molecular fragment of changed atoms
fragment_atoms (dict) – Dictionary of atoms in fragment.
- ehreact.helpers.transition_state.get_strict_smarts_for_atom(atom)[source]
For an RDkit atom object, generate a simple SMARTS pattern.
- Parameters:
atom (rdkit.Chem.Atom) – RDKit atom.
Results
——-
symbol (str) – SMARTS symbol of the atom.
- ehreact.helpers.transition_state.get_tagged_atoms_from_mol(mol)[source]
Takes an RDKit molecule and returns list of tagged atoms and their corresponding numbers.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule.
- Returns:
atoms (List[rdkit.Chem.Atom]) – List of tagged atoms
atom_tags (List[str]) – List of atom-mapping numbers
- ehreact.helpers.transition_state.include_bonds(mols, changed_atom_tags, fragment_atoms)[source]
Creates a list of bonds and their indices that have changed atoms attached and should thus be included in the transition state.
- Parameters:
mols (List[rdkit.Chem.Mol]) – RDKit molecule(s).
changed_atom_tags (List[int]) – Indices of changed atoms.
fragment_atoms (dict) – Dictionary of index to atom-mapping numbers in fragment.
- Returns:
included_bonds (list) – List of included bonds.
included_bonds_idx (List[int]) – List of included bond indices
- ehreact.helpers.transition_state.make_mols(smi, stereoinformation)[source]
Takes a smiles string as input and creates an RDKit mol object. The molecule cannot be sanitized, since we would else get rid of the explicit hydrogens. Thus, instead we check if the molecule could be sanitized, and reorder the atoms corresponding to the sanitized molecule. The reordering of atoms and molecules ensures that the same reaction with different atom ordering gives the same transition state.
- Parameters:
smi (str) – SMILES string.
stereoinformation (bool) – Whether to use stereoinformation.
- Returns:
mol_ordered – RDKit molecule object.
- Return type:
rdkit.Chem.Mol
- ehreact.helpers.transition_state.make_ts(reac_mols, included_bonds_idx, change_dict_bonds, change_dict_atoms, reac_fragment_atoms, ts_seed)[source]
Makes an RDKit mol of the transition state, either the whole transistion state (with all atoms and bonds included, or only the seed (if ts_seed==True), which includes only the atoms undergoing changes and the corresponding bonds. Returns the transition state (whole or seed), and dictionaries of how to change bond types and charges to revert back to either the products or reactants.
- Parameters:
reac_mols (rdkit.Chem.mol) – RDKit molecule of reactant(s).
included_bonds_idx (List[int]) – List of included bond indices in transition state.
change_dict_bonds (dict) – Dictionary of bond changes in reaction.
change_dict_atoms (dict) – Dictionary of formal charge changes in reaction.
reac_fragment_atoms (dict) – Dictionary of atom indices in reactant(s) fragment.
ts_seed (bool) – Boolean whether only to save the full transition state or only the reaction center.
- Returns:
change_ts_to_reac (dict) – Dictionary of bond and atom changes from the ts to the reactants
change_ts_to_prod (dict) – Dictionary of bond and atom changes from the ts to the products,
ts (rdkit.Chem.Mol) – RDKit molecule of the transition state
- ehreact.helpers.transition_state.mapping_list_dict(mol)[source]
Computes a list of all map numbers and an atom-mapping indices dictionary.
- Parameters:
mol (rdkit.Chem.Mol) – RDKit molecule.
- Returns:
map_numbers (List[int]) – Sorted list of atom map numbers
mapping (dict) – Dictionary of mapping
- ehreact.helpers.transition_state.process_an_example(rxn_smiles)[source]
Processes one rxn_smiles string and returns the transition state (whole or seed depending on ts_seed) RDKit mol object, as well as dictionaries of how to change bond types and charges to revert the transition state back to either the products or reactants.
- Parameters:
rxn_smiles (str) – Reaction SMILES.
- Returns:
change_ts_to_reac (dict) – Dictionary of bond and atom changes from the ts to the reactants
change_ts_to_prod (dict) – Dictionary of bond and atom changes from the ts to the products
ts (rdkit.Chem.Mol) – RDKit molecule of the whole imaginary transition state
ts_seed (rdkit.Chem.Mol) – RDKit molecule of the reaction center