To screen Alzheimer’s disease (AD) biomarkers using GEO (Gene Expression
Omnibus) database combined with machine learning. Methods: A total of 339 samples were included, including
168 AD samples and 171 samples from normal healthy people. The GEO database screened the datasets to derive
the differentially expressed genes, screened the predictive gene models by two algorithms: least absolute
shrinkage and selection operator (LASSO) logistic regression and random forest (RF), and plotted ROC curves to
evaluate the models. Clinical datasets (including multiple groups of AD patients and healthy control samples)
were used to validate the predicted genes. RT-qPCR quantitatively analyzed the expression of the predicted genes
in the normal and model groups of the AD cell model. Result: LASSO yielded 84 key markers, and RF
algorithm identified 7 genes. Venn diagram screening yielded overlapping genes for 2 algorithms, including
PLSCR4, GLIS3, PHYHD1, and HVCN1. ROC curves in the test set showed that the area under the curve of
these 4 candidate genes was greater than 0.7, and ROC curves in the validation set also showed that the area
under the curve of 3 of these candidates was greater than 0.7, among which GLIS3 (AUC=0.891) and HVCN1
(AUC=0.953) exhibited excellent diagnostic performance (AUC>0.89). The RT-qPCR method revealed that the
relative expression of PLSCR4, GLIS3, PHYHD1 and HVCN1 was elevated in the AD cell model compared
with the normal control group (all P<0.01), and the results were consistent with the bioinformatic predictions.
Conclusion: PLSCR4, GLIS3, PHYHD1 and HVCN1 may be used as molecular markers for clinical diagnosis
of AD.