Wójcikowski, Maciej and Ballester, Pedro J. and Siedlecki, Paweł (2017) Performance of machine-learning scoring functions in structure-based virtual screening. Scientific Reports, 7 . p. 46710. ISSN 2045-2322
|
PDF
1MB |
Official URL: http://doi.org/10.1038/srep46710
Abstract
Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and -0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).
Item Type: | Article |
---|---|
Subjects: | Q Science > QC Physics Q Science > QD Chemistry |
Divisions: | Department of Bioinformatics |
ID Code: | 1410 |
Deposited By: | Maciej Wójcikowski |
Deposited On: | 07 Nov 2017 12:59 |
Last Modified: | 01 Mar 2018 11:41 |
Repository Staff Only: item control page