30. Al- Lazikani, B. etal. in Bioinformatics — From
Genomes to Therapies Ch. 36 (Wiley- VCH, 2008).
31. Nayal, M. & Honig, B. On the nature of cavities on
protein surfaces: application to the identification of
drug- binding sites. Proteins 63, 892–906 (2006).
This article describes a classifier to identify drug-
binding cavities on the basis of physicochemical,
structural and geometric attributes of proteins.
32. Li, Q. & Lai, L. Prediction of potential drug
targets based on simple sequence properties.
BMC Bioinformatics 8, 353 (2007).
33. Bakheet, T. M. & Doig, A. J. Properties and
identification of human protein drug targets.
Bioinformatics 25, 451–457 (2009).
34. Wang, Q., Feng, Y., Huang, J., Wang, T. & Cheng, G.
A novel framework for the identification of drug target
proteins: combining stacked auto- encoders with a
biased support vector machine. PLOS ONE 12,
e0176486 (2017).
35. Kandoi, G., Acencio, M. L. & Lemke, N. Prediction
of druggable proteins using machine learning and
systems biology: a mini- review. Front. Physiol. 6,
366–366 (2015).
36. Nelson, M. R. etal. The support of human genetic
evidence for approved drug indications. Nat. Genet.
47, 856–860 (2015).
37. Morgan, P. etal. Impact of a five- dimensional
framework on R&D productivity at AstraZeneca.
Nat. Rev. Drug Discov. 17, 167–181 (2018).
38. Rouillard, A. D., Hurle, M. R. & Agarwal, P.
Systematic interrogation of diverse Omic data reveals
interpretable, robust, and generalizable transcriptomic
features of clinically successful therapeutic targets.
PLOS Comput. Biol. 14, e1006142 (2018).
39. Kumar, V., Sanseau, P., Simola, D. F., Hurle, M. R. &
Agarwal, P. Systematic analysis of drug targets confirms
expression in disease- relevant tissues. Sci. Rep. 6,
36205 (2016).
40. Ramsundar, B. etal. Is multitask deep learning
practical for pharma? J. Chem. Inf. Model. 57,
2068–2076 (2017).
41. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V.
Deep neural nets as a method for quantitative
structure–activity relationships. J. Chem. Inf. Model.
55, 263–274 (2015).
42. Barati Farimani, A., Feinberg, E. & Pande, V. Binding
pathway of opiates to μ- opioid receptors revealed by
machine learning. Biophys. J. 114, 62a–63a (2018).
43. Wu, Z. etal. MoleculeNet: a benchmark for molecular
machine learning. Chem. Sci. 9, 513–530 (2018).
44. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning
chemical syntheses with deep neural networks and
symbolic AI. Nature 555, 604 (2018).
This seminal paper describes a very thorough
approach to retrosynthetic analysis. The authors
show that their method can compete with
retrosynthesis done by experienced chemists who
are experts in this field.
45. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H.
Molecular de- novo design through deep reinforcement
learning. J. Cheminform. 9, 48 (2017).
46. Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A.
& Zhavoronkov, A. druGAN: an advanced generative
adversarial autoencoder model for denovo generation
of new molecules with desired molecular properties in
silico. Mol. Pharm. 14, 3098–3104 (2017).
47. Smith, J. S., Roitberg, A. E. & Isayev, O. Transforming
computational drug discovery with machine learning
and AI. ACS Med. Chem. Lett. 9, 1065–1069 (2018).
48. Lenselink, E. B. etal. Beyond the hype: deep neural
networks outperform established methods using a
ChEMBL bioactivity benchmark set. J. Cheminform. 9,
45 (2017).
49. Gaulton, A. etal. The ChEMBL database in 2017.
Nucleic Acids Res. 45, D945–D954 (2017).
50. Ramsundar, B. etal. Massively multitask networks
for drug discovery. Preprint at arXiv https://arxiv.org/
abs/1502.02072 (2015).
51. Gutlein, M. & Kramer, S. Filtered circular fingerprints
improve either prediction or runtime performance
while retaining interpretability. J. Cheminform. 8, 60
(2016).
52. Mayr, A. etal. Large- scale comparison of machine
learning methods for drug target prediction on
ChEMBL. Chem. Sci. 9, 5441–5451 (2018).
This research paper describes the methodology
being used by the winners of almost all categories
of the Tox21 Challenge.
53. Keiser, M. J. etal. Relating protein pharmacology by
ligand chemistry. Nat. Biotechnol. 25, 197 (2007).
54. Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S. &
Klambauer, G. Fréchet ChemNet Distance: a metric for
generative models for molecules in drug discovery.
J. Chem. Inf. Model. 58, 1736–1741 (2018).
55. Unterthiner, T., Mayr, A., Klambauer, G. & Hochreiter, S.
Toxicity prediction using deep learning. Preprint at
arXiv https://arxiv.org/abs/1503.01445 (2015).
56. Li, B. etal. Development of a drug- response modeling
framework to identify cell line derived translational
biomarkers that can predict treatment outcome to
erlotinib or sorafenib. PLOS ONE 10, e0130700
(2015).
In this paper, a translational predictive biomarker
is used to demonstrate that predictive models can
be generated from preclinical training data sets
and then be applied to clinical patient samples to
stratify patients, infer the mechanism of action of a
drug and select appropriate disease indications.
57. van Gool, A. J. etal. Bridging the translational
innovation gap through good biomarker practice.
Nat. Rev. Drug Discov. 16, 587–588 (2017).
58. Kraus, V. B. Biomarkers as drug development tools:
discovery, validation, qualification and use. Nat. Rev.
Rheumatol. 14, 354–362 (2018).
59. Shi, L. etal. The MicroArray Quality Control (MAQC)-II
study of common practices for the development and
validation of microarray- based predictive models.
Nat. Biotechnol. 28, 827–838 (2010).
60. Zhan, F. etal. The molecular classification of multiple
myeloma. Blood 108, 2020–2028 (2006).
61. Shaughnessy, J. D. Jr. etal. A validated gene
expression model of high- risk multiple myeloma is
defined by deregulated expression of genes mapping
to chromosome 1. Blood 109, 2276–2284 (2007).
62. Zhan, F., Barlogie, B., Mulligan, G., Shaughnessy, J. D.
Jr & Bryant, B. High- risk myeloma: a gene expression
based risk- stratification model for newly diagnosed
multiple myeloma treated with high- dose therapy is
predictive of outcome in relapsed disease treated with
single- agent bortezomib or high- dose dexamethasone.
Blood 111, 968–969 (2008).
63. Decaux, O. etal. Prediction of survival in multiple
myeloma based on gene expression profiles reveals
cell cycle and chromosomal instability signatures in
high- risk patients and hyperdiploid signatures in low-
risk patients: a study of the Intergroupe Francophone
du Myelome. J. Clin. Oncol. 26, 4798–4805 (2008).
64. Mulligan, G. etal. Gene expression profiling and
correlation with outcome in clinical trials of the
proteasome inhibitor bortezomib. Blood 109,
3177–3188 (2007).
65. Costello, J. C. etal. A community effort to assess
and improve drug sensitivity prediction algorithms.
Nat. Biotechnol. 32, 1202–1212 (2014).
This paper is an effort to collect and objectively
evaluate various ML approaches by teams around
the world on multi- omics data sets and various
compounds. The data sets and results are
continuously used as benchmarks for new method
developments and validation.
66. Rahman, R., Otridge, J. & Pal, R. IntegratedMRF:
random forest- based framework for integrating
prediction from different data types. Bioinformatics
33, 1407–1410 (2017).
67. Bunte, K., Leppäaho, E., Saarinen, I. & Kaski, S.
Sparse group factor analysis for biclustering of
multiple data sources. Bioinformatics 32, 2457–2463
(2016).
68. Huang, C., Mezencev, R., McDonald, J. F. & Vannberg, F.
Open source machine- learning algorithms for the
prediction of optimal cancer drug therapies. PLOS ONE
12, e0186906 (2017).
69. Hejase, H. A. & Chan, C. Improving drug sensitivity
prediction using different types of data. CPT
Pharmacometrics Syst. Pharmacol. 4, e2 (2015).
70. Kim, E. S. etal. The BATTLE trial: personalizing
therapy for lung cancer. Cancer Discov. 1, 44–53
(2011).
71. Boyiadzis, M. M. etal. Significance and implications of
FDA approval of pembrolizumab for biomarker-defined
disease. J. Immunother. Cancer 6, 35 (2018).
72. Tasaki, S. etal. Multi- omics monitoring of drug
response in rheumatoid arthritis in pursuit of
molecular remission. Nat. Commun. 9, 2755 (2018).
This work identifies molecular signatures that are
resistant to drug treatments and illustrates a multi-
omics approach to understanding drug response.
73. Paré, G., Mao, S. & Deng, W. Q. A machine- learning
heuristic to improve gene score prediction of polygenic
traits. Sci. Rep. 7, 12665 (2017).
74. Khera, A. V. etal. Genome- wide polygenic scores
for common diseases identify individuals with risk
equivalent to monogenic mutations. Nat. Genet. 50,
1219–1224 (2018).
75. Ding, J., Condon, A. & Shah, S. P. Interpretable
dimensionality reduction of single cell transcriptome
data with deep generative models. Nat. Commun. 9,
2002 (2018).
76. Rashid, S., Shah, S., Bar- Joseph, Z. & Pandya, R.
Project Dhaka: variational autoencoder for unmasking
tumor heterogeneity from single cell genomic data.
Preprint at bioRxiv https://www.biorxiv.org/content/
10.1101/183863v4 (2018).
77. Wang, D. & Gu, J. VASC: dimension reduction and
visualization of single- cell RNA- seq data by deep
variational autoencoder. Genomics Proteomics
Bioinformatics 16, 320–331 (2017).
78. Pierson, E. & Yau, C. ZIFA: dimensionality reduction
for zero- inflated single- cell gene expression analysis.
Genome Biol. 16, 241 (2015).
79. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D. &
Batzoglou, S. Visualization and analysis of single- cell
RNA- seq data by kernel- based similarity learning.
Nat. Methods 14, 414 (2017).
80. Tan, J., Hammond, J. H., Hogan, D. A. & Greene, C. A.-O.
ADAGE- based integration of publicly available
Pseudomonas aeruginosa gene expression data with
denoising autoencoders illuminates microbe-host
interactions. mSystems 1, e00025–15 (2016).
81. Way, G. P. & Greene, C. S. Extracting a biologically
relevant latent space from cancer transcriptomes with
variational autoencoders. Pac. Symp. Biocomput. 23,
80–91 (2018).
82. Casanova, R. etal. Morphoproteomic characterization
of lung squamous cell carcinoma fragmentation, a
histological marker of increased tumor invasiveness.
Cancer Res. 77, 2585–2593 (2017).
83. Nirschl, J. J. etal. A deep- learning classifier identifies
patients with clinical heart failure using whole- slide
images of H&E tissue. PLOS ONE 13, e0192726
(2018).
84. Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O.
Deep learning for computational biology. Mol. Syst. Biol.
12, 878 (2016).
85. Finnegan, A. & Song, J. S. Maximum entropy
methods for extracting the learned features of deep
neural networks. PLOS Comput. Biol. 13, e1005836
(2017).
86. Hutson, M. Artificial intelligence faces reproducibility
crisis. Science 359, 725–726 (2018).
87. Veltri, R. W., Partin, A. W. & Miller, M. C. Quantitative
nuclear grade (QNG): a new image analysis- based
biomarker of clinically relevant nuclear structure
alterations. J. Cell. Biochem. Suppl. 35, S151–S157
(2000).
88. Beck, A. H. etal. Systematic analysis of breast cancer
morphology uncovers stromal features associated with
survival. Sci. Transl Med. 3, 108ra113 (2011).
89. Lee, G. etal. Nuclear shape and architecture in benign
fields predict biochemical recurrence in prostate
cancer patients following radical prostatectomy:
preliminary findings. Eur. Urol. Focus 3, 457–466
(2017).
90. Lu, C. etal. An oral cavity squamous cell carcinoma
quantitative histomorphometric- based image classifier
of nuclear morphology can risk stratify patients
fordisease- specific survival. Mod. Pathol. 30,
1655–1665 (2017).
91. Lu, C. etal. Nuclear shape and orientation features
from H&E images predict survival in early- stage
estrogen receptor- positive breast cancers. Lab. Invest.
98, 1438–1448 (2018).
92. Mani, N. L. etal. Quantitative assessment of the
spatial heterogeneity of tumor- infiltrating lymphocytes
in breast cancer. Breast Cancer Res. 18, 78 (2016).
93. Giraldo, N. A. etal. The differential association of
PD-1, PD- L1, and CD8 + cells with response to
pembrolizumab and presence of Merkel cell
polyomavirus (MCPyV) in patients with Merkel cell
carcinoma (MCC). Cancer Res. 77, 662 (2017).
94. Janowczyk, A. & Madabhushi, A. Deep learning for
digital pathology image analysis: a comprehensive
tutorial with selected use cases. J. Pathol. Informat. 7,
29 (2016).
This article is the first comprehensive review of DL
in the context of digital pathology images. The
paper also systematically explains and presents
approaches for training and validating DL
classifiers for a number of image- based problems
in digital pathology, including cell detection,
segmentation and tissue classification.
95. Sharma, H., Zerbe, N., Klempert, I., Hellwich, O. &
Hufnagl, P. Deep convolutional neural networks for
automatic classification of gastric carcinoma using
whole slide images in digital histopathology. Comput.
Med. Imaging Graph. 61, 2–13 (2017).
www.nature.com/nrd
Reviews
476
|
JuNe 2019
|
volume 18