Pawłowski, Piotr H. and Zielenkiewicz, Piotr (2025) Predicting the S. cerevisiae Gene Expression Score by a Machine Learning Classifier. Life, 15 (5). p. 723. ISSN 2075-1729
|
PDF
1MB |
Official URL: http://doi.org/10.3390/life15050723
Abstract
The topic of this work is gene expression and its score according to various factors analyzed globally using machine learning techniques. The expression score (ES) of genes characterizes their activity and, thus, their importance for cellular processes. This may depend on many different factors (attributes). To find the most important classifier, a machine learning classifier (random forest) was selected, trained, and optimized on the Waikato Environment for Knowledge Analysis WEKA platform, resulting in the most accurate attribute-dependent prediction of the ES of Saccharomyces cerevisiae genes. In this way, data from the Saccharomyces Genome Database (SGD), presenting ES values corresponding to a wide spectrum of attributes, were used, revised, classified, and balanced, and the significance of the considered attributes was evaluated. In this way, the novel random forest model indicates the most important attributes determining classes of low, moderate, and high ES. They cover both the experimental conditions and the genetic, physical, statistical, and logistic features. During validation, the obtained model could classify the instances of a primary unknown test set with a correctness of 84.1%.
Item Type: | Article |
---|---|
Subjects: | Q Science > QH Natural history Q Science > QH Natural history > QH301 Biology Q Science > QH Natural history > QH426 Genetics Q Science > QR Microbiology |
Divisions: | Department of Bioinformatics |
ID Code: | 2545 |
Deposited By: | Mr. Piotr H. Pawlowski |
Deposited On: | 06 May 2025 10:26 |
Last Modified: | 06 May 2025 10:26 |
Repository Staff Only: item control page