Paper Title
Subtype Recognition of E2F Transcription Factor in Head and Neck Squamous Cell Carcinoma and Construction of Prognosis Model

Abstract
Clusters were arranged based on the expression of E2F transcription factor in head and neck squamous cell carcinoma, construct a prognosis model according to the differentially expressed genes among subtypes, and verify its efficacy, so as to provide strategies for predicting the prognosis and treatment options of patients with head and neck squamous cell carcinoma. The clinical information and gene expression data of head and neck squamous cell carcinoma patients were obtained from TCGA database by R language tcgabiolinks package. The consensusclusterplus package was used for consistency cluster analysis. Limma package was used to analyze the differential expression of the two subtypes. The clusterprofiler package was used for KEGG and GO enrichment analysis of differentially expressed genes. The survival package coxph function was used to perform univariate Cox regression analysis on all differentially expressed genes. The prognosis related genes screened by Cox analysis were analyzed by lasso to screen the genes and construct the prognosis model. According to the median score of prognosis model, the patients were divided into high score group and low score group. Survminer package was used for survival analysis. The timeroc package was used to analyze the time-dependent ROC curve. The constructed prognostic model was validated by the GEO database data set GSE41613.The best classification number calculated by PAC algorithm was 2. Therefore, patients with head and neck squamous cell carcinoma were divided into two subtypes based on the level of E2F expression. There were differences in the prognosis and survival curve between subtypes (p<0.05). There were 211 differentially expressed genes between the two subtypes, of which 102 were up-regulated and 109 were down regulated. KEGG and GO enrichment analysis showed that 211 differentially expressed genes were mainly involved in the development and differentiation of epidermal cells, and enriched in IL-7 signaling pathway, DNA replication and other pathways. The dimensionality was reduced by lasso regression and the prognosis model was established: score = exp (AREG) * 0.0898+exp (CXCL14) * (-0.0034) + exp (FAM83E) * (-0.0214) + exp (FDCSP) * (-0.0313) + exp (ARHGAP4) * (-0.0634) + exp (EPHX3) * (-0.0777) + exp (SPINK6) * (-0.1348). The area under ROC curve of TCGA training set is 0.692 at 1 year, 0.673 at 3 years and 0.679 at 5 years. The area under ROC curve of GEO validation set is 0.719 in 1 year, 0.666 in 3 years and 0.691 in 5 years.The prognostic model of head and neck squamous cell carcinoma based on E2F expression in this study shows good predictive performance in both the training set and the validation set, which provides a basis for evaluating the prognosis of head and neck squamous cell carcinoma patients and their treatment options. Keywords – Head and Neck Squamous Cell Carcinoma, E2F Transcription Factor, Prognosis Model