De-novo Chemical Reaction Generation by

Means of Temporal Convolutional Neural

Networks

Andrei Buin,∗,† Hung Yi Chiang,† S. Andrew Gadsden,∗,‡ and Faraz A. Alderson†

†Department of Mechanical Engineering, University of Guelph, 50 Stone Rd E, Guelph,

ON, N1G 2W1

‡Department of Mechanical Engineering, McMaster University, 1280 Main Street West

Hamilton, Ontario, Canada L8S 4L7

E-mail: phquanta@gmail.com; gadsden@mcmaster.ca

Abstract

We present here a combination of two networks, Recurrent Neural Networks (RNN)

and Temporarily Convolutional Neural Networks (TCN) in de novo reaction genera-

tion using the novel Reaction Smiles-like representation of reactions (CGRSmiles) with

atom mapping directly incorporated. Recurrent Neural Networks are known for their

autoregressive properties and are frequently used in language modelling with direct

application to SMILES generation. The relatively novel TCNs possess similar proper-

ties with wide receptive field while obeying the causality required for natural language

processing (NLP). The combination of both latent representations expressed through

TCN and RNN results in an overall better performance compared to RNN alone. Ad-

ditionally, it is shown that different fine-tuning protocols have a profound impact on

generative scope of the model when applied on a dataset of interest via transfer learning.

1

ar
X

iv
:2

31
0.

17
34

1v
3 

 [
cs

.L
G

] 
 1

 N
ov

 2
02

3

phquanta@gmail.com
gadsden@mcmaster.ca


Introduction

With advances in Deep Learning(DL) generative methods, it is becoming more common

to utilize DL’s generative properties in a variety of applications. One such application is

retorsynthetic planning, where given the products of a reaction, one tries to predict react-

ing precursors that resulted in the products. There are works1,2 that are already using DL

methods to guide retorsynthetic planning. While it is a great tool for chemists, it is still

lacking truly generative power when trying to generate novel reactions with unseen reaction

centers and precursors. Part of the problem is a lengthy language model describing chemical

reactions in textual form, such as SMARTS/SMIRKS atom-mapped reaction representa-

tion. Only recently3 with the introduction of Condensed Graph of Reaction (CGR)4 has

complex reaction information (reactants/products, bond formation/breaking) been success-

fully encoded into a simple textual representation. In CGR, both reactants and products

are combined into one single graph with bond creation and bond breaking incorporated,

then expressed via SMILES-like strings. We will refer to CGR as CGRSmiles throughout

the paper. One can tackle the task of generating SMILES-like strings via Recurrent Neural

Networks(RNN)5 with Long-Short-Term-Memory (LSTM) cells used to avoid problems of

vanishing/exploding gradient.

On the other hand, a similar approach but with the use of Convolutional Neural Net-

works(CNN), namely Temporal Convolutional Neural Networks (TCN)6 which use causal

and dilated convolutions, have been used mostly in classification (prediction) applications.

These applications include text classification,7 State-of-Charge battery estimation,8 time

series forecasting,9–11 and protein-binding predictions.12 By itself, TCNs have a generative

power which comes from causal convolutions, and in this sense can be thought of as an

alternative to RNN. Given that fact, there are virtually no applications of novel DL archi-

tectures such as TCN in generative applied chemistry. By utilizing the TCN’s generative

power, we show that a combination of RNN-LSTM with TCN results in a better generative

model compared to pure RNN models expressed as baseline models. In addition to this,

2


SMILES generation previously and usually implied that the model learns SMILES indepen-

dent of the context. Context was only introduced via fine tuning a given dataset on which

the model was subsequently refined. We show that in-context SMILES generation exhibits

more diverse structural motiff based on Tanimoto similarity scores, compared to the pure

RNN-LSTM model without context.

Another contribution of this paper is the utilization of a novel transfer learning protocol,

which again is widely used in image/video applications13 suitable for low-shot learning .

With a traditional transfer learning protocol, where all the weights are "re-learned" on a

particular dataset, the model seems to "memorize" the particular dataset with application

of learned grammar rules in initial training. As a result, it will try to generate reactions

with a particular reaction template (reactants+products) as seen in a fine tuned dataset.

We have observed this phenomena in other3 research, as well as our own. One can alleviate

this problem by introducing an exhaustive reaction dataset for a particular problem, but

this solution would not work in low-shot learning. With a variant of the weight freezing,

we show that our fine tuned model, trained on our own dataset, significantly outperforms

models that learned from an all-weights optimization transfer learning approach.

Computational details

The original work of Gupta et al.14 has been utilized and modified for RNN in parts of the

code. As for TCN, we used a custom implementation of Remy.15 All of the code was written

in the Keras16 framework with a Tensorflow17 backend. Custom vocabulary for CGRSmiles

was incorporated with one-hot encodings.

RNNs with 2 to 3 Layers of stacked LSTM cells were used as outlined in the results

section. 512 hidden units were used for each RNN(LSTM) layer. For the TCN network, 1

residual block was used without Normalization. We used a dilation vector of

d = [1, 2, 4, 8, 16, 32].

3


A kernel size of 2 was used in 1D convolutional layers. 256 convolutional filters were used in

the TCN residual block. For regularization, we have used a dropout of 0.5 for each of the

LSTM and TCN layers. A softmax layer was used as the final layer for classification with

categorical cross-entropy loss. No batch normalization was used in the TCN residual block.

For Seq2seq fingerprints (length of 768), an RNN of 3 stacked layers of (256) GRU cells

with attention mechanism was used.18 Additionally, Seq2Seq fingerprints were processed with

Principal Component Analysis for dimensionality reduction. We kept 50 dimensions from

PCA before it was fed into a t-distributed stochastic neighbor embedding (tSNE)19 analyzer.

All SMILES manipulations, such as getting Tanimoto similarity scores and performing valid-

ity checks, were done using RDKit.20 For CGRSmiles generation, reaction center acquisition,

and to/from reaction SMILES conversions along with aromaticity and valence checks, the

CGRtools package21 was used. For the BiLSTM model,22 the entire USPTO Dataset was

cast as a classification problem by dividing the entire corpus into strings of 80 characters

with a sliding window of 3, along with the 81st character being used as ground truth for

the classifier. We have tried longer character strings and shorter sliding windows, but this

resulted in lengthy training times.

Architectural design

Language Model

When doing generative modeling in the context of Natural Language Processing (NLP), it

is crucial to choose a proper language model. Usually RNN(LSTM) implies unidirectional

context from past to future, whereas BiDirectional LSTM(BiLSTM) becomes inappropriate

to use in NLP process. However, there are methods that use the BiLSTM22 language model

with certain adaptations. In one case, the entire dataset is represented as a single entire

corpus, with the model trained to predict the next character given an N-length sequence in a

sliding window manner. The problem with this approach is that CGRSmiles have relatively

4


long sequence lengths. Datasets created from the corpus of the original CGRSmiles reaction

strings in the form of X, y, where X = x1, , . . . xi, xt and label y = xt+1, becomes prohibitively

large. As a result, training time also scales proportionally. Another Bidirectional adaptation

involves interleaving BiDirectional sampling on the left and right of the sequence starting

from the center character(BiMODAL).23 Additional augmentations of the adapted dataset,

in this case, helps increase accuracy of the model. This again is at the expense of increased

computational time. In general, the problem with Bidirectional language models, without

proper adaptation, lies in the fact that the model has seen the whole context. If one, on the

other hand, tries to use a vanilla BiLSTM with RNN-like training (where the target sequence

is an original input sequence shifted 1 position to the right), the model will not learn to

generate novel reactions, but instead just shift the input sequence to the right by 1 position.

These non-standard adaptations might require usage of dynamic graph neural computation

rather than static graph computation due to graph modifications during runtime.23 As a

result, we adopt a standard language model suited for generating one token at a time given

the left context, along with a refined fine tuning protocol.

Model Training Protocol

We used the Adam optimization algorithm24 for training our models with cross-entropy loss

as the objective for optimization. For Seq2Seq and RNN, the original dataset split was 80%

for training and 20% for test. Learning rate was set to 10−3. Models were trained for 50

epochs using the original dataset and for 10 epochs in the ase of fine tuning. Batch size was

64 in the case of the original dataset, and 1 in the case of fine tuning.

Fine Tuning Protocol

A variety of fine tuning protocols were used. The original fine tuning protocol allowed all

weights to be adjusted under the transfer learning approach. Another protocol involved

freezing all layers except the last softmax layer. Additionally, we tried a decaying learning

5


protocol for different layers25 using different learning rates for different layers, along with a

different number of epochs trained for each layer.

Sampling protocol

Usually sampling involves the sampling softmax function:

P(yi) =
exp(yi)∑
j exp(yj)

.

This gives the more syntactically correct, but less diverse, structures/reactions compared to

the temperature controlled softmax function

P(yi) =
exp(yi/T )∑
j exp(yj/T )

,

which has more diversity in generated structures/reactions but a smaller number of CGRSmiles

strings. This sampling protocol has been described elsewhere.14,23 GRSmiles strings were

sampled at a sampling temperature of T = 0.7. For each analysis task, 30,000 CGRSmiles

were generated.

Datasets

For the larger corpus we used the training dataset of Jin’s26 USPTO atom-mapped dataset

derived from Lowe’s grant dataset.27 In the case of the fine tuning dataset, we used web

scraped reactions involving hydrogen peroxide(H2O dataset) from PubChem28 which later

were preprocessed via Atom Mapper29 for proper atom mapping. Jin’s dataset and the H2O

dataset were processed via the CGRtools library21 to obtain CGRSmiles strings. A max

string of 156 characters was considered for both datasets. After aromaticity and valence

checks along with applying a max length of 156 characters for CGRSmiles strings, the larger

dataset was reduced to 216,308 reactions. In the case of the smaller, fine tuning dataset, we

6


acquired 166 atom-mapped reactions . It should be noted that most reactions in the smaller

dataset are oxidation reactions (80% oxidation reactions), meaning they contain O=O as

part of the reactants.

Reaction Center and In-Context SMILES Analysis

The most crucial part of any chemical reaction is the reaction center, i.e. atoms directly

involved in bond creation/breaking. To analyze novelty in terms of how the model performs,

one needs a means of categorizing novelty, i.e. reaction centers. Fortunately, there is a way

in the CGRtools library that encodes each substructural motif by hash function whose value

is a unique key. This value used in categorizing known reaction centers within a dataset.

Additionally, the hash value is used for the detection of novel reaction centers and comparison

between known and unknown reaction centers. We do not categorize reactions based on

known reaction centers, but instead with different 1st closest neighbor. This is because the

original dataset was not curated based on the presence of certain types of reaction centers,

and all were considered. For in-context SMILES generation, each CGRSmile string was

converted back to the reaction SMILES representation. SMILES from the product and

reactant parts were extracted for subsequent analysis using Tanimoto similarity scores.

Results and discussion

Figure 1 shows the Deep Learning architectures used. Baseline 1 - 3 are homogenous Deep

Learning architectures with either TCN or LSTM layers used. The proposed architecture in

Figure 1(d) is, on the other hand, a combination of both LSTM and TCN. This architecture

has the ability to learn from two latent representations. Figure 1(e) shows a residual block,

which is the core of TCN. Basically, TCN is a stack of residual blocks which in turn consists

of Dilated convolutional layers combined with Weight Normalization and Dropout layers.

This is shown in Figure 2. In our case, no Weight Normalization was used and a dilation

7


vector of

d = [1, 2, 4, 8, 16, 32]

was utilized. 1x1 convolution is used in the case of depth mismatch between the input and

output of the last dropout layer.

(a) (b) (c)

(d) (e)

Figure 1: Baseline proposed architecture used (1) Baseline 1 (b) Baseline 2 (c) Baseline3 (d)
Proposed (e) internals of TCN layer

One could just utilize causal convolutions alone. However, by introducing dilated convo-

lutions, one has greatly enhanced the receptive field (i.e. past history) that TCN can look

8


into. In other words, the last convolutional layer can see much further in the past when

compared to plain causal convolutions, as seen in Figure 2. The receptive field, compared to

vanilla convolutional operation (d=1) grows as

F TCN

Fconv

∝ 2n

n+ 1
,

where n is the number of equivalent convolutional and residual layers of TCN with expo-

nentially growing dilation (2, 4, 8....). Clearly, TCN has the advantage. As demonstrated in

Figure 2, this advantage is 2. In our case, the actual advantage is roughly 9 - fold.

(a)

Figure 2: TCN with residual blocks and Receptive field of original convolution(light violet
shading) and dilate convolution(light green shading)

Next, we considered how well the models perform when generating novel reactions based

on initial training. From table 1, one can see that all of the models give a relatively high num-

ber of unique and valid CGRSmiles strings. However, both TCN and TCN+RNN(LSTM)

both gave significantly higher numbers of novel reaction centers. For comparison, we also

compared our results with the BiLSTM language model22 that has 2 BiLSTM layers with

128 hidden units each. Results are shown in table 1. One can see that the number of valid

9


CGRSmiles string is significantly lower for the BiLSTM model, while the amount of reaction

centers is higher compared to the TCN and TCN+RNN models. One aspect to note is that

the training time required for BiLSTM is significantly higher compared to both the TCN and

TCN+RNN models. A good compromise is achieved via the combination of TCN+RNN,

where the number of valid CGRSmiles strings was the highest among all models and the

number of novel reaction centers was relatively high, albeit not the highest.

Table 1: Generative properties of a variety of models.

Model Valid(%) Unique(%) N(RC)
Baseline1 93.42 98.84 877
Baseline2 94.40 99.3 873
TCN 85.52 98.2 1943
TCN+RNN(LSTM) 94.71 98.66 1239
BiLSTM(80,3) 78.52 99.87 2606
Dataset N/A N/A 12308

The next step was to explore a variety of fine tuning protocols along with in-context

SMILES generation. Table 2 shows that if one is to allow the model to freely adapt to a

smaller dataset with all weights being adjustable (all unfrozen, AU in Table 2), "memoriza-

tion" or overfitting of the model on the novel dataset will occur. On the other hand and while

keeping only the last layer unfrozen (LL in Table 2), the model is capable of transferring

its knowledge from previous learning more efficiently. Other fine tuning protocols have been

tried as well, one of which is shown in table 2 (P1, or Protocol1) and provides results lying

between the 2 extremes from the other two protocols. Interestingly enough, the model with

the LL-transfer learning approach was able to sample CGRSmiles with Na, Pt, and Se ions

in them. These ions were not a part of the smaller dataset, but the model has seen some ex-

amples of such reactions in the larger dataset with Na, Pt, Se ions and applied its knowledge

during the fine-tuning phase. We also computed tSNE plots of generated CGRSMiles as

shown in Figure 3. One can see that the results are in agreement with Table 2 as expected,

with Last Layer(LL), unfrozen upon transfer-learning, giving the highest amount of different

reaction centers. This indicates the possibility of few-shot learning with only a few samples

10


Table 2: Fine Tuning Properties of a variety of Fine-Tuning Protocols for TCN+RNN. AU
- All Unfrozen, P1 - Protocol1 where layers= [[1,2], 4, 5] were trained for different numbers
of epochs [2,5,10] with learning rates=[10−6, 10−5, 5 ∗ 10−4] in sequential order starting from
last layer. LL - only Last Layer (SoftMax) was unfrozen, while all other layers were frozen.

Model Valid(%) Unique(%) N(RC)
AU 98.65 9.98 63
P1 94.95 21.12 97
LL 91.92 60.40 288
H2O2 Dataset N/A N/A 64

from the smaller dataset.

- 4 0 - 3 0 - 2 0 - 1 0 0 1 0 2 0 3 0 4 0

- 4 0

- 3 0

- 2 0

- 1 0

0

1 0

2 0

3 0

4 0  D a t a s e t
 A U
 L L

tSN
E2

t S N E 1
(a)

Figure 3: tSNE plots of generated CGRSmiles compared to fine-tuned dataset and LL pro-
tocol where all layers are unfrozen.

To explore the SMILES of reactants and products that participated in a given reaction

, we performed the conversion of CGRSmiles back to reaction SMIRKS consisting of the

participating compounds. In this case however, the information about atom mapping is lost

during the conversion and only reactants and products are preserved. Figure 4(a) shows a

typical example of such conversion. In this case, each reactant/product is highlighted with

11


different colors. By analyzing these participating chemical formulations, one can look into

the context that created those SMILES strings. In other words, SMILES extracted from

SMIRKS representation are not mere SMILES, but rather SMILES with context in them,

since each SMILES string in this case is tied with reaction. In a broader sense, it means

that some reactants and products are more common than others in certain reactions. Figure

4(b) and Figure 4(c) show a difference in in-context SMILES similarity scores in the case of

RNN and TCN+RNN. One can see that TCN+RNN gives a more diverse SMILES scaffold

compared to RNN by having a smaller mean Tanimoto score. Please note that both were

compared at different fine-tuning protocols: one is done at AU, and another is done at LL.

However, Figure 4(c) shows that fine tuning protocol has a little effect on in-context-SMILES

generated strings and as a result the shift towards lower Tanimoto scores could be attributed

to architectural choice.

12


(a)

0 . 2 0 . 4 0 . 6 0 . 8 1 . 0
0 . 0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

No
rm

aliz
ed

 dis
trib

utio
n

S i m i l a r i t y  S c o r e

 R N N  ( A U )
 T C N + R N N  ( L L )
 G a u s s  f i t  o n  R N N
 G a u s s  f i t  o n  T C N + R N N

(b)

0 . 2 0 . 4 0 . 6 0 . 8 1 . 0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

No
rm

aliz
ed

 dis
trib

utio
n

S i m i l a r i t y  S c o r e

 L L
 P 1
 A U
 G a u s s  f i t  o f  L L
 G a u s s  f i t  o f  P 1
 G a u s s  f i t  o f  A U

(c)

Figure 4: (a) Mapping between CGRSmiles and SMIRKS representation. Please note that
during conversion, atom mapping is lost. (b) InContext Tanimoto Scores for RNN and
TCN+RNN. (c) Dependence of Tanimoto score in case of TCN+RNN with respect to fine-
tuning protocols.

Finally, to explore generative capabilities, we studied some of the reactions generated by

the models. Out of all the Fine-Tuning protocols, LL gave the most reactions with novel

reaction centers. In total, LL gave on the order of 800 reactions with novel RC, whereas

all the other protocols gave only reactions on the order of 100 with novel RC. Figure 5(a)

shows an example of an unseen reaction, not present in the dataset, with novel RC. This is

glycidol hydrolysis, with no mistakes.30,31 Another reaction with novel RC is shown in Figure

5(b). The closest reaction to this one is diketene hydrolysis.32,33 Interestingly enough, the

model is able to learn how to open the ring, albeit with some errors such as the wrong

13


placement of the CH2 group and O. Another reaction, but with known reaction center this

time, is the oxidation of the cyclohexanol derivative shown in Figure 5(c). This is a feasible

pathway of oxidation for the cyclohexanol derivative, as the closest reaction is the oxidation of

cyclohexanol34 with a similar pathway. In our case, oxidation was done in presence of water,

whereas the cited work34 uses tert-Butyl hydroperoxide (TBHP) as an oxidizing agent. This

phenomenon can be attributed to the fact that most reactions in the fine-tuning dataset are

oxidation reactions (80% oxidation reactions as mentioned previously), meaning that they

contain O=O in the reactants. For generated CGRSmiles, the number of oxidation reactions

is 92%, whereas the rest of the reactions contain mainly H2O and H2O2 as precursors. In

addition to this, the collected dataset involving H2O2 did not contain metadata attributed

to reactions such as catalyzers, oxidizing agents etc. Instead, only plain SMIRKS reaction

representations were collected from open sources. In addition to this, reaction 5(c) has an

invalid stoichiometry. Most errors of this type involve a wrong number of implicit hydrogens.

We have observed phenomena of reactions being unbalanced in line with other work,3 mostly

in the imbalance of implicit hydrogens. This has been attributed3 to the USPTO reaction

database being imbalanced in the first place. Additionally, a small portion (2.5% of all

generated smiles) of errors, such as copying the reactant part directly into the product part,

was also observed. The reaction shown in Figure 5(d) is a reaction with novel RC - an initial

pathway for gold reduction by 2-pyrrolidinone35 with the wrong placement of the O − OH

group. However, one should keep in mind that the original work36 cited by Li35 et al.

uses Nuclear Magnetic Resonance (NMR) shifts of 13C in determining the structure of the

intermediate compound and the N atom has a methyl group. This is in contrast to the work

of Li35 et. al and our work, where there is no methyl group attached to N , and resulting

shifts do not necessarily correspond to the NH group. In other words, the placement of

the −OOH group does not necessarily have to be on a carbon atom as no rigorous NMR

analysis was done in the case of Li35 et al.

14


(a)

(b)

(c)

(d)

Figure 5: A variety of Generated atom-mapped reactions with novel and known reaction
centers.

15


Conclusion

This work presents a step forward towards unsupervised de-novo reaction generation. The

contribution of this work is threefold: First, it explores an alternative TCN Deep Learning

architecture in comparison with RNN by itself. Second, it is shown that this approach allows

for context-aware SMILES generation. Lastly, it is shown that fine-tuning protocols have a

significant contribution towards chemical domain adaptation in the chemical space, which in

turn enables few-shot learning on smaller datasets upon transfer learning. The model with

the best fine-tuning protocol was able to discover reactions which it has not seen, but was

present in previous published work. This leads to the possibility of gaining reaction insight

before the synthesis stage.

Data and Software Availability

Availability of data and materials CGRSmiles Dataset and Collected Hydrogen Peroxide

datasets, along with Generated CGRSmiles and Python scripts are available at: https:

//github.com/phquanta/CGRSmiles.git

Acknowledgement

References

(1) Schreck, J. S.; Coley, C. W.; Bishop, K. J. M. Learning Retrosynthetic Planning through

Simulated Experience. ACS Central Science 2019, 5, 970–981.

(2) Zheng, S.; Rao, J.; Zhang, Z.; Xu, J.; Yang, Y. Predicting retrosynthetic reactions

using self-corrected transformer neural networks. Journal of Chemical Information and

Modeling 2019, 60, 47–55.

16

https://github.com/phquanta/CGRSmiles.git
https://github.com/phquanta/CGRSmiles.git


(3) Bort, W.; Baskin, I. I.; Gimadiev, T.; Mukanov, A.; Nugmanov, R.; Sidorov, P.; Mar-

cou, G.; Horvath, D.; Klimchuk, O.; Madzhidov, T., et al. Discovery of novel chemical

reactions by deep generative recurrent neural network. Scientific reports 2021, 11, 1–15.

(4) Hoonakker, F.; Lachiche, N.; Varnek, A.; Wagner, A. A REPRESENTATION

TO APPLY USUAL DATA MINING TECHNIQUES TO CHEMICAL REAC-

TIONS—ILLUSTRATION ON THE RATE CONSTANT OF SN 2 REACTIONS IN

WATER. International Journal on Artificial Intelligence Tools 2011, 20, 253–270.

(5) Segler, M. H.; Kogej, T.; Tyrchan, C.; Waller, M. P. Generating focused molecule

libraries for drug discovery with recurrent neural networks. ACS central science 2018,

4, 120–131.

(6) Bai, S.; Kolter, J. Z.; Koltun, V. An empirical evaluation of generic convolutional and

recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 2018,

(7) Conneau, A.; Schwenk, H.; Barrault, L.; Lecun, Y. Very deep convolutional networks

for natural language processing. arXiv preprint arXiv:1606.01781 2016, 2, 1.

(8) Song, X.; Yang, F.; Wang, D.; Tsui, K.-L. Combined CNN-LSTM network for state-of-

charge estimation of lithium-ion batteries. IEEE Access 2019, 7, 88894–88902.

(9) Wan, R.; Mei, S.; Wang, J.; Liu, M.; Yang, F. Multivariate temporal convolutional

network: A deep neural networks approach for multivariate time series forecasting.

Electronics 2019, 8, 876.

(10) Zhao, W.; Gao, Y.; Ji, T.; Wan, X.; Ye, F.; Bai, G. Deep temporal convolutional

networks for short-term traffic flow forecasting. IEEE Access 2019, 7, 114496–114507.

(11) Liu, Y.; Dong, H.; Wang, X.; Han, S. Time series prediction based on temporal con-

volutional network. 2019 IEEE/ACIS 18th International Conference on Computer and

Information Science (ICIS). 2019; pp 300–305.

17


(12) Cui, Y.; Dong, Q.; Hong, D.; Wang, X. Predicting protein-ligand binding residues with

deep convolutional neural networks. BMC bioinformatics 2019, 20, 1–12.

(13) Qi, H.; Brown, M.; Lowe, D. G. Low-shot learning with imprinted weights. Proceedings

of the IEEE conference on computer vision and pattern recognition. 2018; pp 5822–

5830.

(14) Gupta, A.; Müller, A. T.; Huisman, B. J.; Fuchs, J. A.; Schneider, P.; Schneider, G.

Generative recurrent networks for de novo drug design. Molecular informatics 2018,

37, 1700111.

(15) Remy, P. Temporal Convolutional Networks for Keras. https://github.com/

philipperemy/keras-tcn, 2020.

(16) Chollet, F., et al. Keras. https://github.com/fchollet/keras, 2015.

(17) Abadi, M. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.

2015; https://www.tensorflow.org/, Software available from tensorflow.org.

(18) Xu, Z.; Wang, S.; Zhu, F.; Huang, J. Seq2seq fingerprint: An unsupervised deep molecu-

lar embedding for drug discovery. Proceedings of the 8th ACM international conference

on bioinformatics, computational biology, and health informatics. 2017; pp 285–294.

(19) Maaten, L. v. d.; Hinton, G. Visualizing data using t-SNE. Journal of machine learning

research 2008, 9, 2579–2605.

(20) Landrum, G. RDKit: Open-source cheminformatics. http://www.rdkit.org.

(21) Nugmanov, R. I.; Mukhametgaleev, R. N.; Akhmetshin, T.; Gimadiev, T. R.; Afon-

ina, V. A.; Madzhidov, T. I.; Varnek, A. CGRtools: Python Library for Molecule,

Reaction, and Condensed Graph of Reaction Processing. Journal of chemical informa-

tion and modeling 2019, 59, 2516–2521.

18

https://github.com/philipperemy/keras-tcn
https://github.com/philipperemy/keras-tcn
https://github.com/fchollet/keras
https://www.tensorflow.org/
http://www.rdkit.org


(22) van Deursen, R.; Ertl, P.; Tetko, I. V.; Godin, G. GEN: highly efficient SMILES ex-

plorer using autodidactic generative examination networks. Journal of Cheminformatics

2020, 12, 1–14.

(23) Grisoni, F.; Moret, M.; Lingwood, R.; Schneider, G. Bidirectional molecule generation

with recurrent neural networks. Journal of chemical information and modeling 2020,

60, 1175–1183.

(24) Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint

arXiv:1412.6980 2014,

(25) Howard, J.; Ruder, S. Universal language model fine-tuning for text classification. arXiv

preprint arXiv:1801.06146 2018,

(26) Jin, W.; Coley, C.; Barzilay, R.; Jaakkola, T. Predicting organic reaction outcomes with

weisfeiler-lehman network. Advances in Neural Information Processing Systems. 2017;

pp 2607–2616.

(27) Lowe, D. Chemical reactions from US patents (1976-Sep2016). 2018.

(28) Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B. A.;

Thiessen, P. A.; Yu, B., et al. PubChem 2019 update: improved access to chemical

data. Nucleic acids research 2019, 47, D1102–D1109.

(29) Jaworski, W.; Szymkuć, S.; Mikulak-Klucznik, B.; Piecuch, K.; Klucznik, T.;

Kaźmierowski, M.; Rydzewski, J.; Gambin, A.; Grzybowski, B. A. Automatic mapping

of atoms across both simple and complex chemical reactions. Nature communications

2019, 10, 1–11.

(30) Saito, A.; Shirasawa, T.; Tanahashi, S.; Uno, M.; Tatsumi, N.; Kitsuki, T. An efficient

synthesis of glyceryl ethers: catalyst-free hydrolysis of glycidyl ethers in water media.

Green Chemistry 2009, 11, 753–755.

19


(31) Wang, Z.; Cui, Y.-T.; Xu, Z.-B.; Qu, J. Hot water-promoted ring-opening of epoxides

and aziridines by water and other nucleopliles. The Journal of organic chemistry 2008,

73, 2270–2274.

(32) Clemens, R. J. Diketene. Chemical Reviews 1986, 86, 241–318.

(33) Gómez-Bombarelli, R.; González-Pérez, M.; Pérez-Prior, M. T.; Manso, J. A.; Calle, E.;

Casado, J. Kinetic study of the neutral and base hydrolysis of diketene. Journal of

Physical Organic Chemistry 2009, 22, 438–442.

(34) Bhaumik, C.; Stein, D.; Vincendeau, S.; Poli, R.; Manoury, É. Oxidation of alcohols

by TBHP in the presence of sub-stoichiometric amounts of MnO2. Comptes Rendus

Chimie 2016, 19, 566–570.

(35) Li, C. C.; Chen, L. B.; Li, Q. H.; Wang, T. H. Seed-free, aqueous synthesis of gold

nanowires. CrystEngComm 2012, 14, 7549–7551.

(36) Drago, R. S.; Riley, R. Oxidation of N-alkyl amides to novel hydroperoxides by dioxy-

gen. Journal of the American Chemical Society 1990, 112, 215–218.

20