Train
ehreact.train contains functions and classes for calculating a Hasse diagram.
Calculate Diagram
Classes and functions from ehreact.train.calculate_diagram.py.
calculate_diagram.py Training Hasse diagrams.
- ehreact.train.calculate_diagram.calculate_diagram(smiles, verbose=False, quiet=True, compute_aam=False, save_path=None, save_plot=None, train_mode='transition_state', seed=[], no_props=False, plot_only_branches=False, temp_dir_img=None)[source]
Computes a Hasse diagram of a list of reaction or molecule smiles.
- Parameters:
smiles (List[str]) – List of SMILES or reaction SMILES.
verbose (bool, default False) – Whether to print additional information.
quiet (bool, default True) – Whether to silence all output.
compute_aam (bool, default False) – Whether to compute atom-mappings for reactions.
save_path (str, default None) – File to which diagram is saved.
save_plot (str, default None) – File to which save image of diagram.
train_mode (Literal[“single_reactant”,”transition_state”], default “transition_state”) – Train mode, either transition states extracted from reaction smiles or single reactants extracted from smiles.
seed (List[str], default []) – List of SMILES seeds for the reactant algorithm, usually a single seed is given.
no_props (bool, default False) – Do not compute any properties, just output the diagram.
plot_only_branches (bool, default False) – Plot only substructures that branch off.
temp_dir_img (str, default None) – Directory to save temporary image files
- Returns:
d – The Hasse diagram of the input list of molecules/reactions.
- Return type:
- ehreact.train.calculate_diagram.calculate_diagram_single_reactant(smiles, seed_list, verbose, quiet)[source]
Computes a Hasse diagram of a list of molecule smiles.
- Parameters:
smiles (List[str]) – List of SMILES.
seed_list (List[str]) – List of SMILES seeds.
verbose (bool) – Whether to print additional information.
quiet (bool) – Whether to silence all output.
- Returns:
d (ehreact.diagram.diagram.Diagram) – The Hasse diagram of the input list of molecules.
smiles_dict (dict) – A dictionary of the canonicalized input smiles.
- ehreact.train.calculate_diagram.calculate_diagram_transition_state(smiles, verbose, quiet, compute_aam)[source]
Computes a Hasse diagram of a list of reaction smiles.
- Parameters:
smiles (List[str]) – List of reaction SMILES.
verbose (bool, default False) – Whether to print additional information.
quiet (bool) – Whether to silence all output.
compute_aam (bool) – Whether to compute atom-mappings for reactions.
- Returns:
d (ehreact.diagram.diagram.Diagram) – The Hasse diagram of the input list of reactions.
smiles_dict (dict) – A dictionary of the canonicalized input smiles.
- ehreact.train.calculate_diagram.calculate_diversity(d, node, stereo, train_mode)[source]
Calculates diversity within a branch or tree.
- Parameters:
d (ehreact.diagram.diagram.Diagram) – Hasse diagram.
node (ehreact.diagram.diagram.Node) – Node for which to calculate diversity, all leaf nodes attached to this node by an arbitrary number of edges toward children are taken into account.
stereo (bool) – Whether to include stereochemistry in fingerprints
train_mode (Literal[“single_reactant”, “transition_state”]) – Train mode, either transition states extracted from reaction smiles or single reactants extracted from smiles.
- Returns:
mean_div_reac (float) – Mean pair similarity of reactants.
mean_div_prod (float) – Mean pair similarity of products.
- ehreact.train.calculate_diagram.fill_information(d, train_mode, verbose, smiles_dict)[source]
Function to fill topology information and fingerprints into a Hasse diagram (alters diagram in-place).
- Parameters:
d (ehreact.diagram.diagram.Diagram) – Hasse diagram.
train_mode (Literal[“single_reactant”,”transition_state”]) – Train mode, either transition states extracted from reaction smiles or single reactants extracted from smiles.
verbose (bool, default False) – Whether to print additional information.
smiles_dict (dict) – A dictionary of the canonicalized input smiles.
- ehreact.train.calculate_diagram.find_lowest_template(curr_node, d)[source]
Function to find the lowest (most general) substructure/reaction rule in the tree.
- Parameters:
curr_node (ehreact.diagram.diagram.Node) – Node for which to find the lowest template.
d (ehreact.diagram.diagram.Diagram) – Hasse diagram.
- Returns:
lowest_template – Name of the lowest template.
- Return type:
str
- ehreact.train.calculate_diagram.write_fragment_list_to_root(d, train_mode, verbose, smiles_dict)[source]
Function to calculate a list of reactant rule fragments (only atoms in reaction center). This in needed to transform inputted molecules to their corresponding transition state. Save to the root node (in-place).
- Parameters:
d (ehreact.diagram.diagram.Diagram) – Hasse diagram.
train_mode (Literal[“single_reactant”,”transition_state”]) – Train mode, either transition states extracted from reaction smiles or single reactants extracted from smiles.
verbose (bool, default False) – Whether to print additional information.
smiles_dict (dict) – A dictionary of the canonicalized input smiles.
Train
Classes and functions from ehreact.train.train.py
train.py Entry point for training Hasse diagrams.
Hasse
Classes and functions from ehreact.train.hasse.py
- ehreact.train.hasse.check_one_extension(pivot_rule, max_possible_rules)[source]
Function to iterate over all current molecules/pseudomolecules and check whether each of them allows for an extension resulting in the current pivot_rule.
- Parameters:
pivot_rule (RDKit.Chem.Mol) – A possible new template
max_possible_rules (dict) – A dictionary of the matching atoms and possible new templates for all molecules/pseudomolecules.
- Returns:
Whether or not the pivot rule has a substructure match with any of the possible extensions.
- Return type:
bool
- ehreact.train.hasse.enlarge_rule(rule, mols, smiles, change_dict, verbose)[source]
Function to look for possible atoms to add to current template, find best combination, and create a new template.
- Parameters:
rule (rdkit.Chem.Mol) – RDKit molecule object of the current template (molecule or pseudomolecule).
mols (List[rdkit.Chem.Mol]) – List of RDKit molecule objects of the molecules or pseudomolecules which are considered in the current branch.
smiles (List[str]) – List of SMILES strings of the molecules or pseudomolecules which are considered in the current branch.
change_dict (dict) – A dictionary of all changes upon going from reactants to products.
verbose (bool) – Whether to print additional information.
- Returns:
new_rules – A dictionary of new templates (childs to the current node), might be one or multiple templates.
- Return type:
dict
- ehreact.train.hasse.extend_by_atom(patt, m, match, idx_list, ring_extension=False)[source]
Extends the current rule (‘patt’) by adding neighbours of the atoms specified in the list of atom indices ‘idx_list’ according to the molecule ‘m’.
- Parameters:
patt (rdkit.Chem.Mol) – RDKit molecule object of the current template (molecule or pseudomolecule).
n (rdkit.Chem.Mol) – RDKit molecule object of the current leaf node.
match (tuple) – Tuple of matching atom indices of the current rule.
idx_list (list) – List of atom indices at which to extend the pattern.
ring_extension (bool) – Boolean whether current iteration has broken a ring and must thus be iterated until full ring is found.
- Returns:
new_patt – RDKit molecule object of the new, extended template (molecule or pseudomolecule).
- Return type:
rdkit.Chem.Mol
Examples
For the molecule CCOCO, the rule CCO matches the first three atoms and can be extended at atom 2 (the oxygen), yielding the new, extended molecule CCOC:
>>> rule = Chem.MolFromSmiles("CCO",sanitize=False) >>> new_rule=extend_by_atom(rule,Chem.MolFromSmiles("CCOCO"),(0, 1, 2),[2]) >>> print(Chem.MolToSmiles(new_rule)) 'COCC'
- ehreact.train.hasse.extend_by_single_match(possible_extensions, match, idx, mols, smiles, rule, change_dict, verbose, max_possible_rules)[source]
Enlarges the current template by selecting the best extensions.
- Parameters:
possible_extensions (dict) – A dictionary of possible extensions for each input molecule, containing the matching indices, atom indices of atoms to extend, and their possible extension as string.
match (tuple) – Tuple of the matching atom indices for the pivot molecule.
idx (str) – Index of pivot molecule.
mols (List[rdkit.Chem.Mol]) – List of RDKit molecule objects of the molecules or pseudomolecules which are considered in the current branch.
smiles (List[str]) – List of SMILES strings of the molecules or pseudomolecules which are considered in the current branch.
rule (rdkit.Chem.Mol) – RDKit molecule object of the current template (molecule or pseudomolecule).
change_dict (dict) – A dictionary of all changes upon going from reactants to products.
verbose (bool) – Whether to print additional information.
max_possible_rules (dict) – Dictionary of possibly extensions, containing the atoms indices of the matching atoms, as well as all the possible enlarged templates as string and RDKit molecule.
- Returns:
new_patterns – Dictionary of new templates, to be attached to current node as children, including list of molecules and smiles belonging to each new template.
- Return type:
dict
Examples
>>> smis = ["CCOCO", "CCOOC"] >>> mols = [Chem.MolFromSmiles(smi) for smi in smis] >>> rule = Chem.MolFromSmiles("CCO", sanitize=False) >>> possible_extensions = {'CCOCO': {(0, 1, 2): {2: 'OC'}}, 'CCOOC': {(0, 1, 2): {2: 'OO'}}} >>> change_dict={"reac": {"atom": {}, "bond": {}}, "prod": {"atom": {}, "bond": {}}} >>> max_possible_rules=get_max_possible_rules(possible_extensions,rule,mols,change_dict) >>> extend_by_single_match(possible_extensions,(0,1,2),0,mols,smis,rule,change_dict,False,max_possible_rules) {'COCC': {'rule': <rdkit.Chem.rdchem.Mol at 0x7ffc4d7592b0>, 'mols': [<rdkit.Chem.rdchem.Mol at 0x7ffc4d75eda0>], 'smiles': ['CCOCO']}, 'CCOO': {'rule': <rdkit.Chem.rdchem.Mol at 0x7ffc4d759f30>, 'mols': [<rdkit.Chem.rdchem.Mol at 0x7ffc4d75e1c0>], 'smiles': ['CCOOC']}}
- ehreact.train.hasse.extended_hasse(smiles_dict, seeds, rule_dict, tags_core, verbose, quiet)[source]
Create an extended Hasse diagram.
- Parameters:
smiles_dict (dict) – A dictionary of the canonicalized input smiles.
seeds (List[str]) – List of SMILES seeds.
rule_dict (dict) – A dictionary of all minimal templates of all seeds.
tags_core (dict) – A dictionary of the atom map numbers for each minimal template. Empty dictionary for single reactant mode.
verbose (bool) – Whether to print additional information.
quiet (bool) – Whether to silence all output.
- Returns:
d – The Hasse diagram of the input list of molecules.
- Return type:
- ehreact.train.hasse.get_max_possible_rules(possible_extensions, rule, mols, change_dict)[source]
Takes a dictionary of possible extensions and computes the corresponding enlarged templates as RDKit molecule objects.
- Parameters:
possible_extensions (dict) – A dictionary of possible extensions for each input molecule, containing the matching indices, atom indices of atoms to extend, and their possible extension as string.
rule (rdkit.Chem.Mol) – RDKit molecule object of the current template (molecule or pseudomolecule).
mols (List[rdkit.Chem.Mol]) – List of RDKit molecule objects of the molecules or pseudomolecules which are considered in the current branch.
change_dict (dict) – A dictionary of all changes upon going from reactants to products.
- Returns:
extension_dict – Dictionary of possibly extensions, containing the atoms indices of the matching atoms, as well as all the possible enlarged templates as string and RDKit molecule.
- Return type:
dict
Examples
>>> smis = ["CCOCO", "CCOOC"] >>> mols = [Chem.MolFromSmiles(smi) for smi in smis] >>> rule = Chem.MolFromSmiles("CCO", sanitize=False) >>> possible_extensions = {'CCOCO': {(0, 1, 2): {2: 'OC'}}, 'CCOOC': {(0, 1, 2): {2: 'OO'}}} >>> change_dict={"reac": {"atom": {}, "bond": {}}, "prod": {"atom": {}, "bond": {}}} >>> get_max_possible_rules(possible_extensions, rule, mols, change_dict) {'CCOCO': {(0, 1, 2): {'COCC': <rdkit.Chem.rdchem.Mol at 0x7ffc4d75bee0>}}, 'CCOOC': {(0, 1, 2): {'CCOO': <rdkit.Chem.rdchem.Mol at 0x7ffc4d75b4e0>}}}
- ehreact.train.hasse.get_possible_extensions(rule, mols, smiles)[source]
Searches for all possible extensions of a template taking into account a list of molecules.
- Parameters:
rule (rdkit.Chem.Mol) – RDKit molecule object of the current template (molecule or pseudomolecule).
mols (List[rdkit.Chem.Mol]) – List of RDKit molecule objects of the molecules or pseudomolecules which are considered in the current branch.
smiles (List[str]) – List of SMILES strings of the molecules or pseudomolecules which are considered in the current branch.
- Returns:
extension_dict – A dictionary of possible extensions for each input molecule, containing the matching indices, atom indices of atoms to extend, and their possible extension as string.
- Return type:
dict
Examples
For the molecules CCOCO and CCOOC, the rule CCO matches both molecules, and allows for one possible extension at the oxygen, adding a carbon for the first molecule, or an oxygen for the second molecule:
>>> smis = ["CCOCO", "CCOOC"] >>> mols = [Chem.MolFromSmiles(smi) for smi in smis] >>> rule = Chem.MolFromSmiles("CCO",sanitize=False) >>> get_possible_extensions(rule,mols,smis) {'CCOCO': {(0, 1, 2): {2: 'OC'}}, 'CCOOC': {(0, 1, 2): {2: 'OO'}}}
- ehreact.train.hasse.has_one_extension(pivot_rule, extensions_all_matches)[source]
Function to check whether any current template match and corresponding extension results in the current pivot_rule.
- Parameters:
pivot_rule (RDKit.Chem.Mol) – A possible new template
extensions_all_matches (dict) – A dictionary of matching atoms and possible new templates.
- Returns:
Whether or not the pivot rule has a substructure match with any of the possible extensions.
- Return type:
bool
- ehreact.train.hasse.iterate_algorithm(d, mols, smiles, rule, rule_smiles, verbose, quiet, change_dict, tags_core)[source]
Iterate atom extension algorithm by looking for possible atoms to add to, find best combination, create a new template and attach it to the current Hasse diagram (edits to diagram are in-place).
- Parameters:
d (ehreact.diagram.diagram.Diagram) – The current Hasse diagram.
mols (List[rdkit.Chem.Mol]) – List of RDKit molecule objects of the molecules or pseudomolecules which are considered in the current branch.
smiles (List[str]) – List of SMILES strings of the molecules or pseudomolecules which are considered in the current branch.
rule (rdkit.Chem.Mol) – RDKit molecule object of the current template (molecule or pseudomolecule).
rule_smiles (str) – Name of the current template.
verbose (bool) – Whether to print additional information.
quiet (bool) – Whether to silence all output.
change_dict (dict) – A dictionary of all changes upon going from reactants to products.
tags_core (dict) – A dictionary of the atom map numbers for each minimal template. Empty dictionary for single reactant mode.
- ehreact.train.hasse.quick_match(mol_small, mol_large)[source]
Computes whether mol_small could possibly be a subgraph of mol_large based on atom type counts.
- Parameters:
mol_small (RDKit.Chem.Mol) – A molecule.
mol_large (RDKit.Chem.Mol) – A molecule.
- Returns:
Whether a subgraph match is possible based on atom type counts.
- Return type:
bool